Agentic Data Plane

How Guardrails Work

A guardrail is a set of safety and policy filters that inspect and control how agents and gateways use models. A guardrail can block prompt injection, deny off-topic content, redact or block personally identifiable information (PII), and check responses for factual grounding. You attach a guardrail to an AWS Bedrock LLM provider, and ADP applies it to every request and response that flows through that provider.

ADP guardrails run on AWS Bedrock Guardrails. Each guardrail syncs its policy configuration to the Bedrock control plane, so it needs an AWS region and Bedrock credentials. Guardrails attach to AWS Bedrock providers only; other provider types don’t expose a guardrail setting.

After reading this page, you will be able to:

  • Describe what a guardrail does and how you attach one to an AWS Bedrock LLM provider

  • Identify the available policy types and the situations each fits

  • Recognize how a blocked request surfaces and which page to read next

Policy types

A guardrail bundles a set of policies. Each is optional, but a guardrail must enforce at least one. Turn on the ones you need.

Policy What it does

Content filters

Classify prompts and responses against harmful-content categories: hate, insults, sexual, violence, misconduct, and prompt attacks. Prompt-attack detection applies to input only.

Word filters

Block or detect exact words and phrases, using your own lists and platform-managed lists such as profanity.

Denied topics

Block content by meaning rather than exact words, so it catches paraphrases and misspellings.

Sensitive information

Detect PII using built-in entity types and your own regex patterns, then detect, block, or anonymize it.

Contextual grounding

For RAG-style applications, check model output for factual grounding against a source and relevance to the query. Output only.

Automated reasoning

Mathematically verify model output against formal Bedrock Automated Reasoning policies. Detect only: it never blocks, and findings appear in the trace.

For each policy’s full configuration, see Guardrail policy reference.

Where a guardrail runs

A guardrail evaluates both sides of an LLM call:

  • Input: ADP evaluates the prompt before forwarding it upstream. Use input evaluation to stop sensitive or malicious content from reaching the model.

  • Output: ADP evaluates the model’s response before returning it to the caller. Use output evaluation to control what the model generates.

Some policies are direction-specific. Content-filter prompt-attack detection runs on input only, contextual grounding runs on output only, and automated reasoning runs on output and only reports findings.

For streaming responses, ADP skips output evaluation in this release. Input evaluation still applies.

Where you attach a guardrail

You attach a guardrail by setting an AWS Bedrock LLM provider’s guardrail. The guardrail setting appears in the provider’s Bedrock settings, and only Bedrock providers can reference a guardrail. Each provider references at most one guardrail, and a single guardrail can be reused across many providers. The guardrail’s detail page lists the providers that use it. You create and configure the guardrail first, then attach it from the provider. See Create a guardrail.

ADP enforces this reference in both directions. A provider can only point at a guardrail that exists, and you cannot delete a guardrail while a provider still references it. To remove a guardrail that is in use, detach it from each provider first.

What happens when a guardrail blocks

When a policy blocks an input, ADP stops the request and returns your configured blocked input message to the caller. When a policy blocks an output, ADP returns your configured blocked output message instead of the model’s response.

The sensitive-information policy can also anonymize matched PII in place rather than block, replacing each match with its entity type, such as {EMAIL}. The two directions differ: on output, ADP delivers the redacted response to the caller. On input, ADP does not forward the redacted prompt to the model. ADP short-circuits the request like a block and returns your configured blocked input message.

ADP records guardrail activity as attributes on the request’s OpenTelemetry trace, which you read in transcripts. This release does not include a separate violations dashboard. See Review blocked requests for how a blocked request surfaces.