Guardrails
gemini-2.5-flash — Google’s fast 2.5-tier model) to classify the user’s message before any retrieval or main LLM call, and an output guardrail that scans the LLM’s response with regex patterns to detect and redact PII.
Input Guardrail: Gemini 2.5 Flash
Input safety runs through a dedicated @Qualifier("geminiGuard") ChatClient with no default system prompt, so the guardrail instructions are not mixed with the main RAG assistant preamble. The model id comes from powerrag.guardrails.input-model-id (default gemini-2.5-flash, overridable with POWERRAG_GUARDRAIL_MODEL). The LLM is asked to reply with safe or unsafe plus a category line; GuardrailService parses that response.
String response = geminiGuardClient.prompt()
.user(buildGuardrailPrompt(text))
.options(GoogleGenAiChatOptions.builder()
.model(inputModelId)
.temperature(0.0)
.build())
.call()
.content();
return parseGuardrailResponse(response);
Fail-Open Design
On any exception, checkInput returns GuardrailResult.safe() — if the Gemini API is unavailable, misconfigured, or times out, the request proceeds rather than blocking all traffic. This is a deliberate fail-open choice: availability is prioritised over perfect safety coverage during an outage.
Output Guardrail: PII Regex Detection
After the LLM produces its answer, the response is scanned for common PII patterns. If found, the PII is replaced with tokens such as [EMAIL REDACTED] before the answer is returned to the user.
private static final Pattern EMAIL =
Pattern.compile("[a-zA-Z0-9._%+\\-]+@[a-zA-Z0-9.\\-]+\\.[a-zA-Z]{2,}");
private static final Pattern SSN =
Pattern.compile("\\b\\d{3}-\\d{2}-\\d{4}\\b");
private static final Pattern CREDIT_CARD =
Pattern.compile("\\b(?:\\d{4}[\\s-]?){3}\\d{4}\\b");
Guardrail Flag Logging
When a guardrail fires (input blocked or output redacted), a record is written to the guardrail_flags PostgreSQL table. This uses a REQUIRES_NEW transaction so the flag record commits even if the parent transaction rolls back.
@Transactional(propagation = Propagation.REQUIRES_NEW)
public GuardrailFlag logFlag(...) {
// Commits even if the parent transaction rolls back
}
Summary: Two Guardrail Layers
| Layer | What it checks | Technology | On violation |
|---|---|---|---|
| Input | Safety of user's message | Gemini 2.5 Flash (gemini-2.5-flash) |
Block request, log flag |
| Output | PII in LLM response | Regex (email, SSN, credit card) | Redact PII, log flag, return redacted answer |