Agentic Data Plane

Preview

How Guardrails Work

A guardrail is a set of safety and policy filters that inspect and control how agents and gateways use models. A guardrail can block prompt injection, deny off-topic content, redact or block personally identifiable information (PII), and check responses for factual grounding. You attach a guardrail to an AWS Bedrock LLM provider, and Agentic Data Plane applies it to every request and response that flows through that provider.

Agentic Data Plane guardrails run on AWS Bedrock Guardrails. Each guardrail syncs its policy configuration to the Bedrock control plane, so it needs an AWS region and Bedrock credentials. Guardrails attach to AWS Bedrock providers only; other provider types don’t expose a guardrail setting.

After reading this page, you will be able to:

Describe what a guardrail does and how you attach one to an AWS Bedrock LLM provider
Identify the available policy types and the situations each fits
Recognize how a blocked request surfaces and which page to read next

Policy types

A guardrail bundles a set of policies. Each is optional, but a guardrail must enforce at least one. Turn on the ones you need.

Policy	What it does
Content filters	Classify prompts and responses against harmful-content categories: hate, insults, sexual, violence, misconduct, and prompt attacks. Prompt-attack detection applies to input only.
Word filters	Block or detect exact words and phrases, using your own lists and platform-managed lists such as profanity.
Denied topics	Block content by meaning rather than exact words, so it catches paraphrases and misspellings.
Sensitive information	Detect PII using built-in entity types and your own regex patterns, then detect, block, or anonymize it.
Contextual grounding	For RAG-style applications, check model output for factual grounding against a source and relevance to the query. Output only.
Automated reasoning	Mathematically verify model output against formal Bedrock Automated Reasoning policies. Detect only: it never blocks, and findings appear in the trace.

Policy

What it does

Content filters

Classify prompts and responses against harmful-content categories: hate, insults, sexual, violence, misconduct, and prompt attacks. Prompt-attack detection applies to input only.

Word filters

Block or detect exact words and phrases, using your own lists and platform-managed lists such as profanity.

Denied topics

Block content by meaning rather than exact words, so it catches paraphrases and misspellings.

Sensitive information

Detect PII using built-in entity types and your own regex patterns, then detect, block, or anonymize it.

Contextual grounding

For RAG-style applications, check model output for factual grounding against a source and relevance to the query. Output only.

Automated reasoning

Mathematically verify model output against formal Bedrock Automated Reasoning policies. Detect only: it never blocks, and findings appear in the trace.

For each policy’s full configuration, see Guardrail policy reference.

Where a guardrail runs

A guardrail evaluates both sides of an LLM call:

Input: Agentic Data Plane evaluates the prompt before forwarding it upstream. Use input evaluation to stop sensitive or malicious content from reaching the model.
Output: Agentic Data Plane evaluates the model’s response before returning it to the caller. Use output evaluation to control what the model generates.

Some policies are direction-specific. Content-filter prompt-attack detection runs on input only, contextual grounding runs on output only, and automated reasoning runs on output and only reports findings.

For streaming responses, Agentic Data Plane skips output evaluation in this release. Input evaluation still applies.

Where you attach a guardrail

You attach a guardrail by setting an AWS Bedrock LLM provider’s guardrail. The guardrail setting appears in the provider’s Bedrock settings, and only Bedrock providers can reference a guardrail. Each provider references at most one guardrail, and a single guardrail can be reused across many providers. The guardrail’s detail page lists the providers that use it. You create and configure the guardrail first, then attach it from the provider. See Create a guardrail.

Agentic Data Plane enforces this reference in both directions. A provider can only point at a guardrail that exists, and you cannot delete a guardrail while a provider still references it. To remove a guardrail that is in use, detach it from each provider first.

What happens when a guardrail blocks

When a policy blocks an input, Agentic Data Plane stops the request and returns your configured blocked input message to the caller. When a policy blocks an output, Agentic Data Plane returns your configured blocked output message instead of the model’s response.

The sensitive-information policy can also anonymize matched PII in place rather than block, replacing each match with its entity type, such as {EMAIL}. The two directions differ: on output, Agentic Data Plane delivers the redacted response to the caller. On input, Agentic Data Plane does not forward the redacted prompt to the model. Agentic Data Plane short-circuits the request like a block and returns your configured blocked input message.

Agentic Data Plane records guardrail activity as attributes on the request’s OpenTelemetry trace, which you read in transcripts. This release does not include a separate violations dashboard. See Review blocked requests for how a blocked request surfaces.

Next steps

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback: