Collapse

Connect Your Agent

Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in limited availability.

This guide shows you how to connect your AI agent or application to Redpanda Agentic Data Plan. This is also called "Bring Your Own Agent" (BYOA). You’ll configure your client SDK, make your first request, and validate the integration.

After completing this guide, you will be able to:

Configure your application to use AI Gateway with OpenAI-compatible SDKs
Make LLM requests through the gateway and handle responses appropriately
Validate your integration end-to-end

Prerequisites

You have discovered an available gateway and noted its Gateway ID and endpoint.

If not, see Discover Available Gateways.
You have a service account with OIDC client credentials. See Authentication.
You have a development environment with your chosen programming language.

Integration overview

Connecting to AI Gateway requires two configuration changes:

Change the base URL: Point to the gateway endpoint instead of the provider’s API. The gateway ID is embedded in the endpoint URL.
Add authentication: Use an OIDC access token from your service account instead of provider API keys.

Authenticate with OIDC

AI Gateway uses OIDC through service accounts that can be used as a client_credentials grant to authenticate and exchange for access and ID tokens.

Create a service account

In the Redpanda Cloud UI, go to Organization IAM > Service account.
Create a new service account and note the Client ID and Client Secret.

For details, see Authenticate to the Cloud API.

Configure your OIDC client

Use the following OIDC configuration:

Parameter Value

Parameter	Value
Discovery URL	`https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration`
Token endpoint	`https://auth.prd.cloud.redpanda.com/oauth/token`
Audience	`cloudv2-production.redpanda.cloud`
Grant type	`client_credentials`

Discovery URL

https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration

Token endpoint

https://auth.prd.cloud.redpanda.com/oauth/token

Audience

cloudv2-production.redpanda.cloud

Grant type

client_credentials

The discovery URL returns OIDC metadata, including the token endpoint and other configuration details. Use an OIDC client library that supports metadata discovery (such as openid-client for Node.js) so that endpoints are resolved automatically. If your library does not support discovery, you can fetch the discovery URL directly and extract the required endpoints from the JSON response.

cURL
Python (authlib)
Node.js (openid-client)

AUTH_TOKEN=$(curl -s --request POST \
    --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \
    --header 'content-type: application/x-www-form-urlencoded' \
    --data grant_type=client_credentials \
    --data client_id=<client-id> \
    --data client_secret=<client-secret> \
    --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token)

Replace <client-id> and <client-secret> with your service account credentials.

from authlib.integrations.requests_client import OAuth2Session

client = OAuth2Session(
    client_id="<client-id>",
    client_secret="<client-secret>",
)

# Discover token endpoint from OIDC metadata
import requests
metadata = requests.get(
    "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration"
).json()
token_endpoint = metadata["token_endpoint"]

token = client.fetch_token(
    token_endpoint,
    grant_type="client_credentials",
    audience="cloudv2-production.redpanda.cloud",
)

access_token = token["access_token"]

This example performs a one-time token fetch. For automatic token renewal on subsequent requests, pass token_endpoint to the OAuth2Session constructor. Note that for client_credentials grants, authlib obtains a new token rather than using a refresh token.

import { Issuer } from 'openid-client';

const issuer = await Issuer.discover(
  'https://auth.prd.cloud.redpanda.com'
);

const client = new issuer.Client({
  client_id: '<client-id>',
  client_secret: '<client-secret>',
});

const tokenSet = await client.grant({
  grant_type: 'client_credentials',
  audience: 'cloudv2-production.redpanda.cloud',
});

const accessToken = tokenSet.access_token;

Make authenticated requests

Requests require two headers:

Authorization: Bearer <token> - your OIDC access token
rp-aigw-id: <gateway-id> - your AI Gateway ID

Set these environment variables for consistent configuration:

export REDPANDA_GATEWAY_URL="<your-gateway-endpoint>"
export REDPANDA_GATEWAY_ID="<your-gateway-id>"

Python (OpenAI SDK)
Python (Anthropic SDK)
Node.js (OpenAI SDK)
cURL

import os
from openai import OpenAI

# Configure client to use AI Gateway with OIDC token
client = OpenAI(
    base_url=os.getenv("REDPANDA_GATEWAY_URL"),
    api_key=access_token,  # OIDC access token from Step 2
)

# Make a request
response = client.chat.completions.create(
    model="openai/gpt-5.2-mini",  # Note: vendor/model_id format
    messages=[{"role": "user", "content": "Hello, AI Gateway!"}],
    max_tokens=100
)

print(response.choices[0].message.content)

The Anthropic SDK can also route through AI Gateway using the OpenAI-compatible endpoint:

import os
from anthropic import Anthropic

client = Anthropic(
    base_url=os.getenv("REDPANDA_GATEWAY_URL"),
    api_key=access_token,  # OIDC access token from Step 2
)

# Make a request
message = client.messages.create(
    model="anthropic/claude-sonnet-4.5",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hello, AI Gateway!"}]
)

print(message.content[0].text)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: process.env.REDPANDA_GATEWAY_URL,
  apiKey: accessToken,  // OIDC access token from Step 2
});

// Make a request
const response = await openai.chat.completions.create({
  model: 'openai/gpt-5.2-mini',
  messages: [{ role: 'user', content: 'Hello, AI Gateway!' }],
  max_tokens: 100
});

console.log(response.choices[0].message.content);

curl ${REDPANDA_GATEWAY_URL}/chat/completions \
  -H "Authorization: Bearer ${AUTH_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "rp-aigw-id: ${REDPANDA_GATEWAY_ID}" \
  -d '{
    "model": "openai/gpt-5.2-mini",
    "messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
    "max_tokens": 100
  }'

Token lifecycle management

Your agent is responsible for refreshing tokens before they expire. OIDC access tokens have a limited time-to-live (TTL), determined by the identity provider, and are not automatically renewed by the AI Gateway. Check the expires_in field in the token response for the exact duration.

Proactively refresh tokens at approximately 80% of the token’s TTL to avoid failed requests.
authlib (Python) can handle token renewal automatically when you pass token_endpoint to the OAuth2Session constructor. For client_credentials grants, it obtains a new token rather than using a refresh token.
For other languages, cache the token and its expiry time, then request a new token before the current one expires.

Model naming convention

When making requests through AI Gateway, use the vendor/model_id format for the model parameter:

openai/gpt-5.2
openai/gpt-5.2-mini
anthropic/claude-sonnet-4.5
anthropic/claude-opus-4.6

This format tells AI Gateway which provider to route the request to. For example:

# Route to OpenAI
response = client.chat.completions.create(
    model="openai/gpt-5.2",
    messages=[...]
)

# Route to Anthropic (same client, different model)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[...]
)

Handle responses

Responses from AI Gateway follow the OpenAI API format:

response = client.chat.completions.create(
    model="openai/gpt-5.2-mini",
    messages=[{"role": "user", "content": "Explain AI Gateway"}],
    max_tokens=200
)

# Access the response
message_content = response.choices[0].message.content
finish_reason = response.choices[0].finish_reason  # 'stop', 'length', etc.

# Token usage
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens

print(f"Response: {message_content}")
print(f"Tokens: {prompt_tokens} prompt + {completion_tokens} completion = {total_tokens} total")

Handle errors

AI Gateway returns standard HTTP status codes:

from openai import OpenAI, OpenAIError

client = OpenAI(
    base_url=os.getenv("REDPANDA_GATEWAY_URL"),
    api_key=access_token,  # OIDC access token
)

try:
    response = client.chat.completions.create(
        model="openai/gpt-5.2-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)

except OpenAIError as e:
    if e.status_code == 400:
        print("Bad request - check model name and parameters")
    elif e.status_code == 401:
        print("Authentication failed - check OIDC token")
    elif e.status_code == 404:
        print("Model not found - check available models")
    elif e.status_code == 429:
        print("Rate limit exceeded - slow down requests")
    elif e.status_code >= 500:
        print("Gateway or provider error - retry with exponential backoff")
    else:
        print(f"Error: {e}")

Common error codes:

400: Bad request (invalid parameters, malformed JSON)
401: Authentication failed (invalid or expired OIDC token)
403: Forbidden (no access to this gateway)
404: Model not found (model not enabled in gateway)
429: Rate limit exceeded (too many requests)
500/502/503: Server error (gateway or provider issue)

Streaming responses

AI Gateway supports streaming for real-time token generation:

response = client.chat.completions.create(
    model="openai/gpt-5.2-mini",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True  # Enable streaming
)

# Process chunks as they arrive
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='', flush=True)

print()  # New line after streaming completes

Switch between providers

One of AI Gateway’s key benefits is easy provider switching without code changes:

# Try OpenAI
response = client.chat.completions.create(
    model="openai/gpt-5.2",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Try Anthropic (same code, different model)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Compare responses, latency, and cost to determine the best model for your use case.

Validate your integration

Test connectivity

import os
from openai import OpenAI

def test_gateway_connection(access_token):
    """Test basic connectivity to AI Gateway"""
    client = OpenAI(
        base_url=os.getenv("REDPANDA_GATEWAY_URL"),
        api_key=access_token,  # OIDC access token
    )

    try:
        # Simple test request
        response = client.chat.completions.create(
            model="openai/gpt-5.2-mini",
            messages=[{"role": "user", "content": "test"}],
            max_tokens=10
        )
        print("✓ Gateway connection successful")
        return True
    except Exception as e:
        print(f"✗ Gateway connection failed: {e}")
        return False

if __name__ == "__main__":
    token = get_oidc_token()  # Your OIDC token retrieval
    test_gateway_connection(token)

Test multiple models

def test_models():
    """Test multiple models through the gateway"""
    models = [
        "openai/gpt-5.2-mini",
        "anthropic/claude-sonnet-4.5"
    ]

    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": "Say hello"}],
                max_tokens=10
            )
            print(f"✓ {model}: {response.choices[0].message.content}")
        except Exception as e:
            print(f"✗ {model}: {e}")

Integrate with AI development tools

Claude Code
VS Code Continue Extension
Cursor IDE

Configure Claude Code to use AI Gateway:

claude mcp add --transport http redpanda-aigateway ${REDPANDA_GATEWAY_URL}/mcp \
  --header "Authorization: Bearer ${AUTH_TOKEN}"

Or edit ~/.claude/config.json:

{
  "mcpServers": {
    "redpanda-ai-gateway": {
      "transport": "http",
      "url": "<your-gateway-endpoint>/mcp",
      "headers": {
        "Authorization": "Bearer <oidc-access-token>"
      }
    }
  }
}

Edit ~/.continue/config.json:

{
  "models": [
    {
      "title": "AI Gateway - GPT-5.2",
      "provider": "openai",
      "model": "openai/gpt-5.2",
      "apiBase": "<your-gateway-endpoint>",
      "apiKey": "<oidc-access-token>"
    }
  ]
}

Open Cursor Settings (Cursor → Settings or Cmd+,)
Navigate to AI settings

Add custom OpenAI-compatible provider:

{
  "cursor.ai.providers.openai.apiBase": "<your-gateway-endpoint>"
}

Best practices

Use environment variables

Store configuration in environment variables, not hardcoded in code:

# Good
base_url = os.getenv("REDPANDA_GATEWAY_URL")

# Bad
base_url = "https://gw.ai.panda.com"  # Don't hardcode URLs or credentials

Implement retry logic

Implement exponential backoff for transient errors:

import time
from openai import OpenAI, OpenAIError

def make_request_with_retry(client, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="openai/gpt-5.2-mini",
                messages=[{"role": "user", "content": "Hello"}]
            )
        except OpenAIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

Monitor your usage

Regularly check your usage to avoid unexpected costs:

# Track tokens in your application
total_tokens = 0
request_count = 0

for request in requests:
    response = client.chat.completions.create(...)
    total_tokens += response.usage.total_tokens
    request_count += 1

print(f"Total tokens: {total_tokens} across {request_count} requests")

Handle rate limits gracefully

Respect rate limits and implement backoff:

try:
    response = client.chat.completions.create(...)
except OpenAIError as e:
    if e.status_code == 429:
        # Rate limited - wait and retry
        retry_after = int(e.response.headers.get('Retry-After', 60))
        print(f"Rate limited. Waiting {retry_after}s...")
        time.sleep(retry_after)
        # Retry request

Troubleshooting

"Authentication failed"

Problem: 401 Unauthorized

Solutions:

Check that your OIDC token has not expired and refresh it if necessary
Verify the audience is set to cloudv2-production.redpanda.cloud
Check that the service account has access to the specified gateway
Ensure the Authorization header is formatted correctly: Bearer <token>

"Model not found"

Problem: 404 Model not found

Solutions:

Verify the model name uses vendor/model_id format
Confirm the model is enabled in your gateway (contact administrator)

"Rate limit exceeded"

Problem: 429 Too Many Requests

Solutions:

Reduce request rate
Implement exponential backoff
Contact administrator to review rate limits
Consider using a different gateway if available

"Connection timeout"

Problem: Request times out

Solutions:

Check network connectivity to the gateway endpoint
Verify the gateway endpoint URL is correct
Check if the gateway is operational (contact administrator)
Increase client timeout if processing complex requests

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Connect Your Agent

Prerequisites

Integration overview

Authenticate with OIDC

Create a service account

Configure your OIDC client

Make authenticated requests

Token lifecycle management

Model naming convention

Handle responses

Handle errors

Streaming responses

Switch between providers

Validate your integration

Test connectivity

Test multiple models

Integrate with AI development tools

Best practices

Use environment variables

Implement retry logic

Monitor your usage

Handle rate limits gracefully

Troubleshooting

"Authentication failed"

"Model not found"

"Rate limit exceeded"

"Connection timeout"

Simple online edits

Contribution guide