AI Answer Mode
The AI Answer Mode feature is a Retrieval-Augmented Generation (RAG) system that combines real-time website data, indexed documents (PDFs), and Google's Gemini LLM to provide synthesized answers.
The goal is to implement a Retrieval-Augmented Generation (RAG) based AI search for the website using the provided Google Gemini API key. This will allow users to ask natural language questions and receive intelligent, synthesized answers based directly on your website's actual content.
Proposed Architecture
Vector Database (Typesense)
You already have Typesense running with an index named pages. We
will seamlessly extend this setup by updating the schema to include a 768-dimension
embedding vector field (optimized for Gemini), enabling deep semantic search.
Embedding Generation (Gemini)
When content is updated in Craft CMS, we leverage the
text-embedding-004 model to convert text into vector embeddings. These are stored
alongside regular data for instantaneous hybrid retrieval.
Search Workflow (RAG) Multi-stage pipeline
- 1. User Query A visitor interacts with the front-end search interface, submitting a natural language question, technical query, or specific prompt. This interaction initiates the RAG lifecycle by capturing the user's semantic intent in real-time.
-
2. Real-time Embedding
The system processes the raw query through our custom local endpoint, leveraging the
text-embedding-001model to generate a high-dimensional vector that represents the core semantic intent of the query. - 3. Hybrid Search We execute a sophisticated hybrid search against Typesense, combining vector similarity with traditional keyword matching to pinpoint the most relevant content entries from your indexed website data.
- 4. Context Injection Retrieved content is injected as privileged 'ground truth' context into the LLM prompt. Gemini 2.5 Flash then synthesizes an answer based strictly on this data to ensure accuracy and low latency.
- 5. Synthesized Answer The intelligent, AI-generated response is returned to the frontend along with standard search results, providing the user with a comprehensive and contextually accurate answer instantly.
High-Level Architecture
A multi-stage Retrieval Augmented Generation (RAG) pipeline powered by Gemini 2.0 Flash.
1. High-Dimensional Query Vectorization
When a visitor submits a query, the system instantly transforms the raw natural
language into a high-dimensional mathematical vector using the state-of-the-art
text-embedding-004 model.
- Deep Intent Recognition Moving beyond simple keywords to capture semantic intent—automatically mapping queries like "starting issues" to relevant "alternator and battery technical specifications."
- High-Resolution Precision Leveraging optimized 768-dimensional embeddings to ensure maximum accuracy in identifying technical automotive part relationships and contextual relevance.
- Real-Time Transformation The vectorization process is optimized for sub-100ms latency, providing an instantaneous foundation for the hybrid retrieval pipeline.
2. Semantic & High-Performance Hybrid Retrieval
We leverage Typesense as our high-performance vector search engine, executing a sophisticated hybrid retrieval strategy that combines semantic intent with keyword precision.
- Neural Vector Matching High-speed vector similarity search compares the query's embedding against stored 768-dimension vectors to identify contextually relevant technical data.
- Full-Text Keyword Ranking Simultaneously executes traditional text matching to isolate technical part numbers, SKUs, and exact terminology that might be lost in pure semantic space.
- Weighted Result Fusion Integrates vector scores with keyword weights to provide a unified, re-ranked set of the most authoritative context snippets for the reasoning engine.
collections/pages
collections/ai_documents_vuk
3. Multimodal Ingestion & Contextual Intelligence
For complex technical documentation like the ACDelco Battery Warranty, the architecture employs advanced multimodal processing to transcribe and structure unstructured data.
- Structural Ingestion Gemini processes PDF scans and manuals through multimodal vision, transcribing tables and diagrams into clean Markdown while strictly preserving the original structural hierarchy.
- Dynamic Knowledge Assembly The most relevant snippets are programmatically joined to form a unique, query-specific "Privileged Knowledge Base" that serves as the ground truth for answer generation.
- Granular Chunking Documents are decomposed into highly focused segments to ensure maximum search precision and avoid overwhelming the reasoning engine with irrelevant noise.
4. Strategic Answer Generation (Gemini)
In this final stage, the system passes the curated knowledge snippets and the original query to Gemini 2.5 Flash (latest version) for context-grounded reasoning and synthesis.
- Contextual Grounding Gemini acts as a reasoning engine, synthesizing multi-source search results into a cohesive, technical answer that is strictly anchored in the retrieved data.
-
Deterministic Accuracy
Temperature is maintained at exactly
0.2to ensure ultra-low creativity and high factual reliability, effectively eliminating the risk of AI hallucinations. - Flash Inference Leveraging the Flash model's low-latency architecture, the system provides sub-second synthesized answers directly to the end-user interface.