Text-to-SQL
Text-to-SQL lets users query the database in plain English. The pipeline converts the natural language question to a validated SQL SELECT statement, executes it, and returns the results as structured data — without exposing any SQL to the end user.
The Full Pipeline
User question (plain English)
│
▼
SchemaIntrospector.describe()
(table names, column names/types, enum hint values)
│
▼
LLM prompt: schema + question → generate SQL
│
▼
SqlValidator.validate(sql)
- Reject non-SELECT statements
- Reject disallowed tables
- Reject subquery injection patterns
│
▼
JdbcTemplate.queryForList(sql)
│
▼
Return rows as List<Map<String, Object>>
Schema Introspection
The LLM cannot generate correct SQL without knowing the schema. SchemaIntrospector queries information_schema at runtime to build a schema description string:
SchemaIntrospector.java — describe()
View source ↗
// Queries information_schema for table/column names and types
// Adds enum hints: queries DISTINCT values for status/type columns
String schema = schemaIntrospector.describe();
The schema string includes enum hints — for columns named status or type, the introspector queries DISTINCT values and includes them in the schema description. This tells the LLM that status values are 'APPROVED', 'PENDING', etc., preventing case mismatches.
SQL Generation Prompt
Text-to-SQL prompt template
View source ↗
private String buildPrompt(String question, String schema, String language) {
return """
You are a PostgreSQL expert. Given the schema and a question,
generate a valid SQL SELECT query.
DATABASE SCHEMA:
%s
RULES:
- Return ONLY the SQL, no explanation
- Only SELECT queries (never INSERT/UPDATE/DELETE)
- Status values are UPPERCASE ('APPROVED', 'OPEN', etc.)
- Use ILIKE for case-insensitive text matching
QUESTION: %s
SQL:""".formatted(schema, question);
}
Prompt Engineering Notes
- "Return ONLY the SQL" — prevents the LLM from wrapping the SQL in prose or code fences that break parsing
- "Never INSERT/UPDATE/DELETE" — belt-and-suspenders instruction; the validator also enforces this
- "Status values are UPPERCASE" — prevents queries like
WHERE status = 'approved'that miss'APPROVED' - "Use ILIKE" — prevents case-sensitive
LIKEfor text searches
JSqlParser Validation
SqlValidator.java — validate()
View source ↗
// Rejects: non-SELECT, disallowed tables, subquery attacks
sqlValidator.validate(sql);
The validator uses JSqlParser to parse the generated SQL into an AST and checks:
- The statement is a
SELECT— notINSERT,UPDATE,DELETE,DROP, etc. - All referenced tables are in the allowed-list (e.g., users, documents are excluded)
- No stacked statements (semicolon injection:
SELECT 1; DROP TABLE foo;)
Never execute LLM-generated SQL without validation. Even with a well-engineered prompt, the LLM can produce harmful statements. JSqlParser validation is a mandatory safety layer — never skip it.