Guardrails

Module 5 · ~11 min read

Power RAG implements two distinct guardrail layers: an input guardrail that uses Gemini 2.5 Flash (API model id gemini-2.5-flash — Google’s fast 2.5-tier model) to classify the user’s message before any retrieval or main LLM call, and an output guardrail that scans the LLM’s response with regex patterns to detect and redact PII.

Input Guardrail: Gemini 2.5 Flash

Input safety runs through a dedicated @Qualifier("geminiGuard") ChatClient with no default system prompt, so the guardrail instructions are not mixed with the main RAG assistant preamble. The model id comes from powerrag.guardrails.input-model-id (default gemini-2.5-flash, overridable with POWERRAG_GUARDRAIL_MODEL). The LLM is asked to reply with safe or unsafe plus a category line; GuardrailService parses that response.

GuardrailService.java — checkInput() (excerpt) View source ↗

String response = geminiGuardClient.prompt()
        .user(buildGuardrailPrompt(text))
        .options(GoogleGenAiChatOptions.builder()
                .model(inputModelId)
                .temperature(0.0)
                .build())
        .call()
        .content();
return parseGuardrailResponse(response);

Fail-Open Design

On any exception, checkInput returns GuardrailResult.safe() — if the Gemini API is unavailable, misconfigured, or times out, the request proceeds rather than blocking all traffic. This is a deliberate fail-open choice: availability is prioritised over perfect safety coverage during an outage.

Fail-open means: when the guardrail fails, allow the request through. Fail-closed would block all requests when the guardrail is unavailable. For a production deployment, consider your risk tolerance: fail-open maintains availability; fail-closed maximises safety enforcement.

Output Guardrail: PII Regex Detection

After the LLM produces its answer, the response is scanned for common PII patterns. If found, the PII is replaced with tokens such as [EMAIL REDACTED] before the answer is returned to the user.

GuardrailService.java — PII regex patterns View source ↗

private static final Pattern EMAIL =
    Pattern.compile("[a-zA-Z0-9._%+\\-]+@[a-zA-Z0-9.\\-]+\\.[a-zA-Z]{2,}");
private static final Pattern SSN =
    Pattern.compile("\\b\\d{3}-\\d{2}-\\d{4}\\b");
private static final Pattern CREDIT_CARD =
    Pattern.compile("\\b(?:\\d{4}[\\s-]?){3}\\d{4}\\b");

Guardrail Flag Logging

When a guardrail fires (input blocked or output redacted), a record is written to the guardrail_flags PostgreSQL table. This uses a REQUIRES_NEW transaction so the flag record commits even if the parent transaction rolls back.

GuardrailService.java — flag logging with REQUIRES_NEW View source ↗

@Transactional(propagation = Propagation.REQUIRES_NEW)
public GuardrailFlag logFlag(...) {
    // Commits even if the parent transaction rolls back
}

Summary: Two Guardrail Layers

Layer	What it checks	Technology	On violation
Input	Safety of user's message	Gemini 2.5 Flash (`gemini-2.5-flash`)	Block request, log flag
Output	PII in LLM response	Regex (email, SSN, credit card)	Redact PII, log flag, return redacted answer

The output PII patterns cover common formats but are not exhaustive. For high-compliance environments (healthcare, finance), consider a dedicated PII detection service in addition to the regex patterns.

← Previous Next →