Agentic Data Plane

Connect Your App to AI Gateway

This guide shows how to connect your AI agent or application to the AI Gateway. You construct the proxy URL for a provider you have already created, authenticate (with the rpk ai CLI for local development or with OIDC client credentials for CI and application code), and send your first request with the SDK of your choice.

The provider’s Connect tab in Agentic Data Plane generates this configuration for you: a gateway-token step, setup instructions for popular clients, and code examples with the provider’s proxy URL prefilled. Copy from the tab for a quick start, or follow this page for the full flow.

After completing this guide, you will be able to:

Construct the proxy URL for an LLM provider you have configured
Authenticate to AI Gateway with the rpk ai CLI for local development or with OIDC client credentials for CI and programmatic clients
Send requests through the proxy URL with the SDK of your choice

Prerequisites

A configured LLM provider. If you haven’t created one yet, see Configure an LLM provider.
For local development, nothing else. You’ll install rpk ai in the next section.
For CI or programmatic clients: A Redpanda service account with OIDC client credentials. Service accounts are managed in Organization IAM. See Authenticate to Redpanda Cloud.
A development environment with your chosen programming language.

Proxy URL anatomy

Every provider you create in AI Gateway gets its own proxy URL:

<gateway-base>/llm/v1/providers/<provider-name>/<upstream-path>

<gateway-base>: The AI Gateway base URL for your dataplane. Cluster-specific subdomain on clusters.rdpa.co (for example, https://aigw.<cluster-id>.clusters.rdpa.co). Copy the exact value from the Proxy URL field on any provider’s Connection card.
<provider-name>: The name you gave the provider when you created it, for example my-openai or prod-anthropic.
<upstream-path>: The upstream provider’s native API path (for example, v1/chat/completions for OpenAI, v1/messages for Anthropic).

AI Gateway forwards the request to the upstream provider, attaches the configured credentials, and records the request for observability. Your application never sees the upstream API key.

The provider detail page generates ready-to-run snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the Connect your app section there.

Use `rpk ai` for local development

The rpk ai command is the Redpanda AI CLI. Use it to manage AI Gateway resources (LLM providers, MCP servers, OAuth providers) and call MCP tools from the command line. rpk ai is self-contained: it has its own login and its own Agentic Data Plane environment selection, independent of any rpk cloud session.

Install rpk ai:
```
rpk ai install
```
Update later with rpk ai upgrade; remove with rpk ai uninstall.
Sign in. This runs an OAuth device-authorization flow in your browser, caches credentials in ~/.rpai/credentials (readable only by you), then lists the Agentic Data Plane environments in your organization so you can select one:
```
rpk ai auth login
```
Select the Agentic Data Plane environment whose AI Gateway you want to target. The rpk ai env use command accepts an environment name or ID and repoints the active profile in place:
```
rpk ai env list
rpk ai env use <environment>
```
Inspect the resolved environment and token state at any time with rpk ai env show and rpk ai auth status.
Verify the connection:
```
rpk ai llm-provider list
```

If the cached token has expired, rpk ai returns a 401; rerun rpk ai auth login to refresh it.

rpk ai help, rpk ai version, and unknown subcommands run without prompting for authentication, so you can browse the CLI surface offline before signing in. Authentication is only required for commands that hit AI Gateway.

To target a specific AI Gateway URL for a single invocation (for example, a local gateway, or a staging environment the environments list does not include), pass --rpai-endpoint:

rpk ai --rpai-endpoint http://localhost:8090 llm-provider list

This overrides the selected environment’s AI Gateway URL for that one command, and the flag is not bound to an environment variable. For a manual or local gateway you use repeatedly, define it once as an environment instead:

rpk ai env add local --ai-gateway-url http://localhost:8090 --auth-mode none
rpk ai env use local

Environment variables

The rpk ai command honors the following environment variables:

Variable Purpose

Variable	Purpose
`RPAI_TOKEN`	Static bearer token for the gateway. `rpk ai` normally manages its own token through `rpk ai auth login`; set this (or pass `--token`) to override, for example in a headless or CI shell.
`RPAI_CONFIG`, `RPAI_VERBOSE`, `RPAI_FORMAT`	Map to `--rpai-config`, `--rpai-verbose`, `--format` (short flags `-c`, `-v`, `-o`). Long flag names are renamed under `rpk ai` to avoid collision with rpk’s globals. There is no environment variable for environment selection (use `--rpai-environment or `rpk ai env use`) or for the AI Gateway URL override (use the `--rpai-endpoint` flag).

RPAI_TOKEN

Static bearer token for the gateway. rpk ai normally manages its own token through rpk ai auth login; set this (or pass --token) to override, for example in a headless or CI shell.

RPAI_CONFIG, RPAI_VERBOSE, RPAI_FORMAT

Map to --rpai-config, --rpai-verbose, --format (short flags -c, -v, -o). Long flag names are renamed under rpk ai to avoid collision with rpk’s globals. There is no environment variable for environment selection (use `--rpai-environment or rpk ai env use) or for the AI Gateway URL override (use the --rpai-endpoint flag).

Authenticate with OIDC client credentials (CI and programmatic)

For application code, CI runners, server-side processes, and headless agents, use the OIDC client_credentials grant directly. This is the canonical authentication path for SDK-style usage; rpk ai is for command-line workflows, not for embedding in application code. Values are surfaced on the provider’s Connection card; defaults at the time of writing are below.

Parameter Value (today)

Parameter	Value (today)
Discovery URL	`https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration`. Also surfaced as the `Discovery` field on the provider’s Connection card.
Token endpoint	`https://auth.prd.cloud.redpanda.com/oauth/token`
Audience	`cloudv2-production.redpanda.cloud`
Grant type	`client_credentials`

Discovery URL

https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration. Also surfaced as the Discovery field on the provider’s Connection card.

Token endpoint

https://auth.prd.cloud.redpanda.com/oauth/token

Audience

cloudv2-production.redpanda.cloud

Grant type

client_credentials

cURL
Python (authlib)
Node.js (openid-client)

AUTH_TOKEN=$(curl -s --request POST \
    --url 'https://auth.prd.cloud.redpanda.com/oauth/token' \
    --header 'content-type: application/x-www-form-urlencoded' \
    --data grant_type=client_credentials \
    --data client_id=<client-id> \
    --data client_secret=<client-secret> \
    --data audience=cloudv2-production.redpanda.cloud | jq -r .access_token)

Replace <client-id> and <client-secret> with your service account credentials.

from authlib.integrations.requests_client import OAuth2Session
import requests

# Discover token endpoint from OIDC metadata
metadata = requests.get(
    "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration"
).json()
token_endpoint = metadata["token_endpoint"]

client = OAuth2Session(
    client_id="<client-id>",
    client_secret="<client-secret>",
    token_endpoint=token_endpoint,
)

token = client.fetch_token(
    grant_type="client_credentials",
    audience="cloudv2-production.redpanda.cloud",
)

access_token = token["access_token"]

Passing token_endpoint to the OAuth2Session constructor lets authlib handle renewal automatically. For client_credentials grants, it fetches a new token rather than using a refresh token.

import { Issuer } from 'openid-client';

const issuer = await Issuer.discover(
  'https://auth.prd.cloud.redpanda.com'
);

const client = new issuer.Client({
  client_id: '<client-id>',
  client_secret: '<client-secret>',
});

const tokenSet = await client.grant({
  grant_type: 'client_credentials',
  audience: 'cloudv2-production.redpanda.cloud',
});

const accessToken = tokenSet.access_token;

Token lifecycle management

Your client is responsible for refreshing tokens before they expire. OIDC access tokens have a limited TTL set by the identity provider and are not automatically renewed by AI Gateway. Check the expires_in field in the token response for the exact duration.

Proactively refresh at ~80% of the token’s TTL to avoid failed requests.
authlib (Python) handles renewal automatically when you pass token_endpoint to OAuth2Session.
For other languages, cache the token and its expiry, then request a new token before the current one expires.
For SDK code, refresh OIDC client-credentials tokens through your client library (see the authlib example above).

Send requests with your SDK

The examples in this section assume you’ve set:

export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
export AUTH_TOKEN="<oidc-access-token>"   # from the client_credentials flow above

OpenAI SDK
Anthropic SDK
Google Gemini SDK
AWS Bedrock
OpenAI-compatible

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["PROXY_URL"],       # .../llm/v1/providers/my-openai
    api_key=os.environ["AUTH_TOKEN"],        # OIDC access token
)

response = client.chat.completions.create(
    model="gpt-4o",                          # native OpenAI model ID
    messages=[{"role": "user", "content": "Hello from AI Gateway"}],
)
print(response.choices[0].message.content)

The OpenAI SDK calls the proxy’s /v1/chat/completions path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different base_url, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter).

import os
from anthropic import Anthropic

client = Anthropic(
    base_url=os.environ["PROXY_URL"],       # .../llm/v1/providers/my-anthropic
    auth_token=os.environ["AUTH_TOKEN"],     # OIDC access token
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello from AI Gateway"}],
)
print(message.content[0].text)

The Anthropic SDK hits v1/messages on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with Auth passthrough, send your own Anthropic Authorization header instead of an auth_token. AI Gateway forwards it unchanged.

import os
from google import genai

client = genai.Client(
    api_key=os.environ["AUTH_TOKEN"],        # forwarded as x-goog-api-key
    http_options={"base_url": os.environ["PROXY_URL"]},  # .../llm/v1/providers/my-google
)

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Hello from AI Gateway",
)
print(response.text)

Gemini authenticates with the x-goog-api-key header, not Authorization: Bearer. Most Google SDKs set x-goog-api-key automatically from the api_key parameter. If you hand-roll the request, set the header yourself.

Bedrock is different: SigV4 signing is performed server-side by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC access token.

import os, httpx

# Bedrock 4.6+ Anthropic models require an inference profile (us./eu./apac./global.).
# Replace with the inference profile your provider exposes.
response = httpx.post(
    f"{os.environ['PROXY_URL']}/model/us.anthropic.claude-sonnet-4-6/invoke",
    headers={"Authorization": f"Bearer {os.environ['AUTH_TOKEN']}"},
    json={
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [{"role": "user", "content": "Hello"}],
        "max_tokens": 1024,
    },
)
print(response.json())

See the Bedrock provider reference for inference-profile selection guidance.

Bedrock’s Converse API works the same way: send to /model/{MODEL_ID}/converse with a Converse-shaped body. Or use the AWS SDK’s bedrockruntime client and set its BaseEndpoint to the proxy URL; the SDK signs the request, AI Gateway re-signs server-side with the provider’s credentials, and your client never sees AWS keys.

Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes:

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["PROXY_URL"],       # .../llm/v1/providers/my-vllm
    api_key=os.environ["AUTH_TOKEN"],
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",  # as exposed by your upstream
    messages=[{"role": "user", "content": "Hello"}],
)

The provider detail page also has client guides for Claude Code, Codex, and Gemini (the desktop client). Open Connect your app on the provider’s page to see the per-client setup instructions.

Group requests into transcripts

AI Gateway groups a session’s LLM calls and MCP tool calls into a single transcript using the X-Redpanda-Genai-Conversation request header. Stamp this header with your framework’s session or thread ID, using the same value on every request in the session, and the gateway maps it to the gen_ai.conversation.id attribute on each span so the spans group into one conversation.

This header is required for the agent’s Transcripts tab to populate. Without it, the gateway drops the spans and the tab stays empty. The header doesn’t affect authentication or whether requests succeed.

Set it through your SDK’s default-headers mechanism so it rides along with both the LLM call and each MCP tool call:

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["PROXY_URL"],
    api_key=os.environ["AUTH_TOKEN"],
    default_headers={"X-Redpanda-Genai-Conversation": session_id},
)

Replace session_id with the session or thread identifier your framework already tracks, and stamp the same value on the session’s MCP tool calls so the whole turn groups into one transcript. For how transcripts read this attribute, see View agent transcripts.

Streaming responses

Streaming passes through unchanged. Use the SDK’s native streaming API; the proxy forwards the stream byte-for-byte.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Handle errors

AI Gateway returns standard HTTP status codes. The upstream provider’s error body passes through, so your existing SDK error handling works:

Status	Meaning
400	Bad request. Invalid parameters or malformed JSON.
401	Authentication failed. Token invalid, expired, or (for Gemini) sent in the wrong header.
403	Forbidden. The service account lacks the required role, or the provider is disabled.
404	Provider or model not found. Verify the provider name in the URL and the model identifier.
429	Rate limited by the upstream provider. AI Gateway does not enforce its own rate limits today. Respect `Retry-After` if present.
5xx	Upstream or gateway error. Retry with exponential backoff.

Best practices

Use environment variables for the proxy URL and token. Never hard-code them.
Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (authlib for Python, openid-client for Node.js, and so on).
Implement retry with exponential backoff for 5xx and timeout conditions.
Respect Retry-After on 429 responses.
Rotate service account credentials on a schedule your organization accepts.
Observe usage in Redpanda Agentic Data Plane on each provider’s detail page.

Troubleshooting

401 Unauthorized

If you’re using rpk ai: Rerun rpk ai auth login to refresh the credentials. Token expiry surfaces as a 401.
If you’re using OIDC client credentials: Check the token hasn’t expired and refresh it. Verify the audience is cloudv2-production.redpanda.cloud and the Authorization header is formatted Bearer <token>.
For Gemini: Ensure the token is sent as x-goog-api-key, not Authorization.
For Anthropic with passthrough: Ensure the client is sending a valid Anthropic Authorization header.

404 Not found

Re-check the provider name in the proxy URL. The segment after /providers/ must match the provider’s Name exactly.
For model-not-found: Confirm the model identifier is one your provider’s catalog actually serves. OpenAI-compatible endpoints accept whatever model IDs the upstream exposes.

403 Forbidden

The service account may lack the required roles. Ask an admin to grant dataplane_adp_llmprovider_get at minimum to read provider config, and dataplane_adp_llmprovider_invoke to proxy LLM requests through AI Gateway. See LLM provider permissions or assign the LLMProviderInvoker built-in role for runtime-only access.
The provider may be disabled. Check the Status field on its Connection card.

Connection timeout or reset

Verify the proxy URL is correct (copy directly from the provider’s Connection card).
Check that the provider isn’t pointing at a private base URL your client can’t reach (OpenAI-compatible providers only).
Confirm the upstream provider’s status page.

Next steps

Configure an LLM provider

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Connect Your App to AI Gateway

Prerequisites

Proxy URL anatomy

Use rpk ai for local development

Environment variables

Authenticate with OIDC client credentials (CI and programmatic)

Token lifecycle management

Send requests with your SDK

Group requests into transcripts

Streaming responses

Handle errors

Best practices

Troubleshooting

401 Unauthorized

404 Not found

403 Forbidden

Connection timeout or reset

Next steps

Simple online edits

Contribution guide

Use `rpk ai` for local development