Image Generation
Intent Detection
Image generation requests are detected by looking for the co-occurrence of a generation verb and an image noun in the query:
private static final Set<String> GEN_VERBS = Set.of(
"generate", "create", "draw", "paint", "make", "produce", "design");
private static final Set<String> IMAGE_NOUNS = Set.of(
"image", "picture", "photo", "illustration", "artwork", "diagram");
public boolean isImageGenerationRequest(String question) {
String lower = question.toLowerCase();
boolean hasVerb = GEN_VERBS.stream().anyMatch(lower::contains);
boolean hasNoun = IMAGE_NOUNS.stream().anyMatch(lower::contains);
return hasVerb && hasNoun;
}
Both conditions must be true. "Create a summary" has a verb but no image noun → not detected. "Show me an image" has a noun but no generation verb → not detected. "Draw a diagram of the architecture" → detected.
Imagen 3 via Google GenAI SDK
Power RAG uses the Google GenAI Java SDK directly (not Spring AI) for Imagen 3, as Spring AI 1.1.2 does not yet wrap Imagen natively.
com.google.genai.Client client = com.google.genai.Client.builder()
.apiKey(apiKey).build();
GenerateImagesResponse resp = client.models.generateImages(
"imagen-3.0-generate-002", prompt,
GenerateImagesConfig.builder().numberOfImages(1).build());
byte[] imageBytes = resp.generatedImages().get(0)
.image().get().imageBytes().get().toByteArray();
return "data:image/png;base64," + Base64.getEncoder().encodeToString(imageBytes);
Gemini Flash Fallback
If Imagen 3 is unavailable or the API key lacks Imagen permissions, the service falls back to Gemini's multimodal output capability:
GenerateContentConfig config = GenerateContentConfig.builder()
.responseModalities(List.of("IMAGE", "TEXT")).build();
GenerateContentResponse resp = client.models.generateContent(
"gemini-2.0-flash-preview-image-generation", prompt, config);
// Extract from resp.parts() → Part.inlineData → blob bytes
Pipeline Integration
Image generation detection happens at Stage 1.5, after the cache miss but before hybrid retrieval. This ordering is intentional:
- Cache lookup first — a previously generated image for the same prompt can be served from cache
- Before retrieval — no document retrieval is needed for image generation requests
- The generated image is returned in the
generatedImageBase64field of the response