Set Up Budgets
The Agentic Data Plane caps LLM spend with budgets and records every LLM call as a spending event. Set a budget to enforce a hard spending cap per agent, then read what you actually spend on the Cost & Usage page under Governance, through individual transcripts, and through breakdown queries by provider, model, user, agent, or provider type.
After completing these steps, you will be able to:
-
Set a per-agent budget that caps LLM spend and warns before the cap
-
Identify what spending data the Agentic Data Plane records automatically
-
View spend breakdowns by agent, model, and provider
Set a budget
A budget caps LLM spend over a recurring period. When an agent’s spend for the period reaches the budget’s hard limit, AI Gateway rejects that agent’s next LLM request with HTTP 429 until the period resets. A separate warning threshold fires before the cap, so you can react before AI Gateway cuts the agent off.
Redpanda ADP enforces budgets per agent. A budget identifies an agent by its resource name in the form agents/<slug>, the same identity that appears as agent_name in spend data.
How a budget works
| Setting | What it does |
|---|---|
Limit |
The hard cap on per-period spend. You set it in dollars in the UI; the API stores it in USD microcents (1 cent = 1,000,000 microcents). When a matching agent’s accrued spend for the period reaches this value, the agent’s next LLM request through the gateway gets |
Warning threshold |
A spend level, lower than the limit, at which ADP warns: gateway responses carry a |
Period |
How often the spend pool resets: daily, weekly, or monthly. Periods are calendar-aligned in UTC: daily at 00:00 UTC, weekly at 00:00 UTC Monday, monthly at 00:00 UTC on the first of the month. |
Target agent |
Which agent the budget applies to. Leave it unset to create the tenant default budget. Set it to an agent’s resource name ( |
Default and per-agent override budgets
You can have one default budget per tenant and at most one override per agent:
-
The default budget (no target agent) gives every agent its own independent pool of the limit per period. One agent reaching its cap doesn’t affect another.
-
A per-agent override targets a single agent by resource name (
agents/<slug>) and replaces the default for that agent. Use an override to give a specific agent a higher or lower cap than the fleet default.
The target agent is immutable. To move an override to a different agent, delete it and create a new one. An override matches on the agent’s resource name, so recreating an agent with the same name keeps the override pointed at it; the new instance counts as a continuation for budget purposes. When you read a budget, it also reports the current period’s spend, when the period started, when it resets, and (for the default budget) the agent currently closest to its cap.
Set a budget in the UI
Open Budgets under Governance in the sidebar. The page shows the tenant default budget as a card: its cap and period, the warn threshold, and how many agents are doing fine, getting close, or over the limit. Per-agent overrides appear in a table below the card, with each override’s Agent override target, Period, current Usage against the cap, Warn at threshold, and when it was last Updated.
To create the tenant default budget:
-
Click Create default budget.
-
Under
Budget, setCap usage atto a dollar amount and choose a period (day, week, or month). Use the quick-set chips ($25, $100, $500, $1,000) for common values. -
Drag the
Warn atslider to set the warning threshold as a percentage of the cap (80% by default). -
Review the Configuration preview panel, which summarizes the budget, period, and the warn and block thresholds in dollars, then click Create default budget.
Open the default budget from the Budgets page to see its detail view: the per-agent spending limit (each agent gets its own limit, not one shared pool), the reset schedule, the warn threshold, and how much each agent has spent so far toward its limit.
To give one agent a different cap, click Add override, pick the agent (each agent can have at most one override; agents that already have one are grayed out), then set the Budget and Warn at controls the same way. The Resource name is auto-derived from the picked agent and is immutable; the Display name is editable.
Manage budgets through the API
BudgetService exposes standard create, read, update, and delete operations:
| Method | Use it to |
|---|---|
|
Create the tenant default or a per-agent override. |
|
Read one budget, or list all budgets, each with current-period spend status. |
|
Change the limit, warning threshold, or period. Send a field mask naming the fields you change. |
|
Remove a budget. Deleting the default removes the per-agent pools; deleting an override falls that agent back to the default. |
A service account needs the matching dataplane_adp_budget_* permission for each operation (create, get, list, update, or delete). See Budget permissions.
What ADP records automatically
Every LLM call routed through AI Gateway becomes a spending event. Each event captures:
-
Input tokens, output tokens, and cached tokens.
-
Total cost (in microcents).
-
Request count.
-
The provider, model, user, and organization context the call ran under.
No setup required: the gateway captures spending the moment your first agent runs through it.
ADP tracks streaming and non-streaming requests the same way, and attributes cache-write tokens (Anthropic 4.x, OpenAI 4.x prompt caches) correctly on streaming responses, so cost rollups stay accurate when an agent reuses long system prompts.
|
ADP reports cost in microcents. 1 cent = 1,000,000 microcents, so $1 = 100,000,000 microcents. Divide |
Per-request pricing variations
A few request- or response-time signals change the rate ADP applies to a single call. You don’t configure these; the spending pipeline picks them up from the upstream response or request and bills accordingly.
-
Anthropic fast mode: Anthropic exposes a fast-mode option on some models (for example, Opus 4.6 fast) that carries a per-token premium over the default rate. ADP reads the
speedfield on each Anthropic response and bills fast-mode calls at the model’s fast-mode rate. Requests without aspeedfield fall back to the default rate. -
Context-tier pricing: A few models charge a different rate once a request crosses a context-length threshold. Gemini Pro, for example, prices requests above a 128K-token context at a higher tier than shorter requests. ADP uses the call’s context-token count so requests at or above the threshold bill at the tiered rate automatically.
Where to view your spend
You don’t view spend on the Budgets page. The Cost & Usage page, transcripts, and breakdown queries are the read surfaces:
| Surface | Use it for |
|---|---|
Cost & Usage page (Governance sidebar group) |
Time-series spend, request, and token charts across providers and models. Use it to group by provider, model, or token type, then filter by provider, model, cost type, token type, user, or agent. See View cost and usage. |
Transcripts |
Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See Read a transcript. |
Breakdown queries |
Aggregated spend by provider, model, user, agent, or provider type, available through |
Every breakdown and time-series query reads from the same SpendingFilter shape: a time range plus optional provider_name, model_id, user_email, agent_name, agent_uid, or organization_id filters. Combine filters to scope a query (for example, "all spend on Anthropic for user alice in April"). You can break results down by provider, model, user, agent, or provider type; organization_id is a filter only, not a breakdown dimension.
For more expressive queries, SpendingFilter also accepts an AIP-160 filter expression that lets you combine and negate dimensions in a single string (for example, provider_name="anthropic" AND model_id!="claude-sonnet-4-6"). The convenience fields and the filter expression compose; populate one or both.
user_email and organization_id are populated automatically from the request’s authenticated identity (the caller’s email and organization), so spend is attributed without any setup on your part.
Query spend programmatically
SpendingService.GetSpendingBreakdown is the canonical RPC for pulling spend out of ADP. Use it for chargeback reporting, scheduled emails, internal cost dashboards, or any workflow the built-in UI doesn’t cover.
Authenticate
SpendingService uses the same OIDC client-credentials grant as the rest of AI Gateway. Mint a service-account access token using the flow in Authenticate with OIDC client credentials, then pass the token in the Authorization: Bearer <token> header on every call. The service account needs dataplane_adp_spending_get on the resource you’re querying. See Spending permissions.
Request shape
GetSpendingBreakdown takes a SpendingFilter plus a dimension. The filter accepts:
| Field | Meaning |
|---|---|
|
RFC 3339 timestamps bracketing the window. Required. |
|
Restrict to one LLM provider (matches the |
|
Restrict to one model identifier ( |
|
Restrict to one identified user, matched on the caller’s email. Anonymous traffic is excluded. |
|
Restrict to one agent by its resource name ( |
|
Restrict to one agent instance, identified by an opaque UUID. Only valid when |
|
Restrict to one organization. Multi-tenant deployments only. |
|
AIP-160 expression that combines and negates dimensions in a single string (for example, |
The dimension value chooses the breakdown dimension. Valid values are the BreakdownDimension enum: BREAKDOWN_DIMENSION_PROVIDER, BREAKDOWN_DIMENSION_MODEL, BREAKDOWN_DIMENSION_USER, BREAKDOWN_DIMENSION_AGENT, and BREAKDOWN_DIMENSION_PROVIDER_TYPE. A breakdown on BREAKDOWN_DIMENSION_AGENT keys on the agent’s resource name (agents/<slug>) and excludes rows with no agent (direct user calls), the same way the other dimensions skip empty keys. Spend is summed across every instance that has used the name, so an agent that was deleted and recreated appears as a single entry.
cURL example
Pull per-user spend for the last 7 days against an Anthropic provider:
ACCESS_TOKEN="<oidc-access-token>" # from the client_credentials flow
DATAPLANE_BASE="https://aigw.<cluster-id>.clusters.rdpa.co"
curl -s --request POST \
--url "${DATAPLANE_BASE}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown" \
--header "Authorization: Bearer ${ACCESS_TOKEN}" \
--header 'Content-Type: application/json' \
--data '{
"filter": {
"start_time": "2026-05-17T00:00:00Z",
"end_time": "2026-05-24T00:00:00Z",
"provider_name": "prod-anthropic"
},
"dimension": "BREAKDOWN_DIMENSION_USER"
}' | jq
The response carries one entries row per user in the window. Each entry has a key (the user) and a stats object with total_cost_microcents, total_requests, total_tokens (server-derived), and per-bucket input, output, and cached usage. Divide total_cost_microcents by 100,000,000 to convert to dollars.
Python example
Generated client code lives in the proto bundle; if your project doesn’t already import it from cloudv2, drive SpendingService over plain HTTPS:
import os, requests
from datetime import datetime, timedelta, timezone
token = os.environ["ACCESS_TOKEN"] # from the client_credentials flow
base = os.environ["DATAPLANE_BASE"] # https://aigw.<cluster-id>.clusters.rdpa.co
end = datetime.now(timezone.utc)
start = end - timedelta(days=7)
body = {
"filter": {
"start_time": start.isoformat().replace("+00:00", "Z"),
"end_time": end.isoformat().replace("+00:00", "Z"),
"filter": 'provider_name="prod-anthropic"',
},
"dimension": "BREAKDOWN_DIMENSION_USER",
}
r = requests.post(
f"{base}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown",
headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
json=body,
)
r.raise_for_status()
for entry in r.json().get("entries", []):
stats = entry["stats"]
dollars = int(stats["total_cost_microcents"]) / 100_000_000
print(f"{entry['key']}: ${dollars:,.2f} ({stats['total_requests']} requests)")
The proto-generated client (Connect-Go or grpc-python) is the long-term recommendation; the cURL and requests examples are for quick scripting.
Related methods
SpendingService exposes additional methods that follow the same SpendingFilter shape:
-
GetSpendingSummary: Total spend, tokens, and requests for the range, with no breakdown. Also returns the previous comparable period so you can show a trend. -
GetSpendingTimeSeries: Spend bucketed over the time range (hourly or daily), for chart-style consumers. -
GetSpendingTimeSeriesByDimension: Time-series buckets split by a breakdown dimension (top-N keys by cost), for stacked charts. Reportstruncated_key_countwhen more keys matched than were returned.
Guardrail cost
AWS bills guardrail evaluation directly to the AWS account whose credentials the guardrail’s backend uses. This cost does not appear in ADP cost reporting and is not counted against budgets. For current rates, see AWS Bedrock pricing.
For what each policy does, see How guardrails work and Guardrail policy reference.
Override per-model pricing
The Agentic Data Plane ships with default per-model pricing per provider, covering input, output, and cache-read prices for every model in the built-in catalog. Cost reporting uses these prices when it computes per-call spend, which is why every dollar value on the Cost & Usage page, in transcripts, and in SpendingService queries works without any setup.
If your organization negotiates non-standard pricing, or you want to track spend against an internal chargeback rate, override the rates as part of configuring an LLM provider. Overrides are scoped to a single provider, where you edit the rate per model. See Override per-model pricing.