Agentic Data Plane

Guardrail Policy Reference

A guardrail bundles a set of policies, each backed by AWS Bedrock Guardrails. Each policy is optional, but a guardrail must enforce at least one. This page documents each policy type’s configuration fields, available actions, direction settings, and regional availability.

Common settings

Two settings recur across policies:

  • Action: what the policy does when it matches. None detects and records the match in the trace without intervening. Block stops the request and returns the configured blocked message. The sensitive-information policy adds Anonymize.

  • Direction: most policies evaluate input, output, or both, and you set the action per direction. Some policies are fixed to one direction (noted below).

Feature availability varies by AWS region. Choose a region that supports the policies you need, and see the AWS Bedrock Guardrails documentation for exhaustive behavior and regional support.

Content filters

Classify prompts and responses against harmful-content categories and block or detect them per category.

Field Description

Categories

Hate, Insults, Sexual, Violence, Misconduct, and Prompt attack. Configure each category independently. Prompt-attack detection evaluates input only.

Strength

Per category and direction. Sets the confidence cutoff for a match: None scores the category in the trace without acting, Low matches only high-confidence content, Medium matches medium-confidence and above, and High matches any non-negligible content. Higher strength is stricter.

Action

Per category and direction: None (detect) or Block.

Modality

Text or Image.

Word filters

Block or detect exact words and phrases.

Field Description

Custom words

Your own list of words and phrases to match.

Managed lists

Platform-managed lists. Profanity is available today.

Action

Per direction (input and output): None (detect) or Block. Set independently for custom words and for each managed list.

Denied topics

Block content by meaning rather than exact words, so the policy catches paraphrases and misspellings. A policy holds up to 30 topics.

Field Description

Name

Topic name, 1 to 100 characters.

Definition

What the topic covers, up to 1000 characters. The definition drives the semantic match, so write it as a clear, self-contained statement. Keep example phrases and negations out of the definition; put concrete examples in the Examples field instead, which improves accuracy.

Examples

Up to five example phrases, each up to 100 characters, that match the topic.

Action

Per direction (input and output): None (detect) or Block.

Sensitive information

Detect personally identifiable information (PII) by built-in entity type or by your own regular expressions, then detect, block, or anonymize it.

Field Description

Entities

Built-in entity types. Each entity has a per-direction action.

Regexes

Custom patterns. Each rule has a name (1 to 100 characters), an RE2 pattern (1 to 500 characters; lookaround is not supported), an optional description, and a per-direction action.

Action

Per direction (input and output): None (detect), Block, or Anonymize. Anonymize replaces each match in place with its entity type, such as {EMAIL}, and applies to text only. The two directions differ: on output, ADP delivers the redacted response to the caller. On input, this release does not forward the redacted prompt to the model. Instead, an anonymize match short-circuits the request like a block: the model is never called, and ADP returns your configured blocked input message rather than the redacted prompt. Block replaces the whole payload with the blocked message.

The built-in entity types are: ADDRESS, AGE, NAME, EMAIL, PHONE, USERNAME, PASSWORD, DRIVER_ID, LICENSE_PLATE, VEHICLE_IDENTIFICATION_NUMBER, CREDIT_DEBIT_CARD_CVV, CREDIT_DEBIT_CARD_EXPIRY, CREDIT_DEBIT_CARD_NUMBER, PIN, INTERNATIONAL_BANK_ACCOUNT_NUMBER, SWIFT_CODE, IP_ADDRESS, MAC_ADDRESS, URL, AWS_ACCESS_KEY, AWS_SECRET_KEY, US_BANK_ACCOUNT_NUMBER, US_BANK_ROUTING_NUMBER, US_INDIVIDUAL_TAX_IDENTIFICATION_NUMBER, US_PASSPORT_NUMBER, US_SOCIAL_SECURITY_NUMBER, CA_HEALTH_NUMBER, CA_SOCIAL_INSURANCE_NUMBER, UK_NATIONAL_HEALTH_SERVICE_NUMBER, UK_NATIONAL_INSURANCE_NUMBER, and UK_UNIQUE_TAXPAYER_REFERENCE_NUMBER.

Contextual grounding

For retrieval-augmented generation (RAG) applications, check model output against a source and the user’s query. This policy evaluates output only and has two independent sub-filters.

Sub-filter Description

Grounding

Checks that the response is factually grounded in the provided source.

Relevance

Checks that the response is relevant to the user’s query.

Each sub-filter has its own enable toggle, a threshold between 0.0 and 0.99, and an action. The action fires when the response scores below the threshold, so higher thresholds are stricter. The action is None (detect) or Block.

Automated reasoning

Mathematically verify model output against formal Bedrock Automated Reasoning policies. This policy is detect-only: it never blocks, and its findings appear in the trace.

Field Description

Policy ARNs

One or two Bedrock Automated Reasoning policy Amazon Resource Names (ARNs) to attach. Each ARN must point at a specific numeric version (for example, ending in :1 or :2); the DRAFT version is rejected. Create and publish the policies in the AWS Bedrock console first, then reference their versioned ARNs here.

Confidence threshold

A value between 0.0 and 1.0. Below this confidence, a finding is reported as non-definitive.