Text-to-SQL

Module 6 · ~12 min read
Text-to-SQL lets users query the database in plain English. The pipeline converts the natural language question to a validated SQL SELECT statement, executes it, and returns the results as structured data — without exposing any SQL to the end user.

The Full Pipeline

User question (plain English) │ ▼ SchemaIntrospector.describe() (table names, column names/types, enum hint values) │ ▼ LLM prompt: schema + question → generate SQL │ ▼ SqlValidator.validate(sql) - Reject non-SELECT statements - Reject disallowed tables - Reject subquery injection patterns │ ▼ JdbcTemplate.queryForList(sql) │ ▼ Return rows as List<Map<String, Object>>

Schema Introspection

The LLM cannot generate correct SQL without knowing the schema. SchemaIntrospector queries information_schema at runtime to build a schema description string:

SchemaIntrospector.java — describe() View source ↗
// Queries information_schema for table/column names and types
// Adds enum hints: queries DISTINCT values for status/type columns
String schema = schemaIntrospector.describe();

The schema string includes enum hints — for columns named status or type, the introspector queries DISTINCT values and includes them in the schema description. This tells the LLM that status values are 'APPROVED', 'PENDING', etc., preventing case mismatches.

SQL Generation Prompt

Text-to-SQL prompt template View source ↗
private String buildPrompt(String question, String schema, String language) {
    return """
        You are a PostgreSQL expert. Given the schema and a question,
        generate a valid SQL SELECT query.

        DATABASE SCHEMA:
        %s

        RULES:
        - Return ONLY the SQL, no explanation
        - Only SELECT queries (never INSERT/UPDATE/DELETE)
        - Status values are UPPERCASE ('APPROVED', 'OPEN', etc.)
        - Use ILIKE for case-insensitive text matching

        QUESTION: %s
        SQL:""".formatted(schema, question);
}

Prompt Engineering Notes

JSqlParser Validation

SqlValidator.java — validate() View source ↗
// Rejects: non-SELECT, disallowed tables, subquery attacks
sqlValidator.validate(sql);

The validator uses JSqlParser to parse the generated SQL into an AST and checks:

Never execute LLM-generated SQL without validation. Even with a well-engineered prompt, the LLM can produce harmful statements. JSqlParser validation is a mandatory safety layer — never skip it.