AI Answer | Premium Technical Breakdown

AI Answer Mode

The AI Answer Mode feature is a Retrieval-Augmented Generation (RAG) system that combines real-time website data, indexed documents (PDFs), and Google's Gemini LLM to provide synthesized answers.

Strategic Objective

The goal is to implement a Retrieval-Augmented Generation (RAG) based AI search for the website using the provided Google Gemini API key. This will allow users to ask natural language questions and receive intelligent, synthesized answers based directly on your website's actual content.

Proposed Architecture

Core Storage

Vector Database (Typesense)

You already have Typesense running with an index named pages. We will seamlessly extend this setup by updating the schema to include a 768-dimension embedding vector field (optimized for Gemini), enabling deep semantic search.

AI Processing

Embedding Generation (Gemini)

When content is updated in Craft CMS, we leverage the text-embedding-004 model to convert text into vector embeddings. These are stored alongside regular data for instantaneous hybrid retrieval.

Search Workflow (RAG) Multi-stage pipeline

  • 1. User Query A visitor interacts with the front-end search interface, submitting a natural language question, technical query, or specific prompt. This interaction initiates the RAG lifecycle by capturing the user's semantic intent in real-time.
  • 2. Real-time Embedding The system processes the raw query through our custom local endpoint, leveraging the text-embedding-001 model to generate a high-dimensional vector that represents the core semantic intent of the query.
  • 3. Hybrid Search We execute a sophisticated hybrid search against Typesense, combining vector similarity with traditional keyword matching to pinpoint the most relevant content entries from your indexed website data.
  • 4. Context Injection Retrieved content is injected as privileged 'ground truth' context into the LLM prompt. Gemini 2.5 Flash then synthesizes an answer based strictly on this data to ensure accuracy and low latency.
  • 5. Synthesized Answer The intelligent, AI-generated response is returned to the frontend along with standard search results, providing the user with a comprehensive and contextually accurate answer instantly.

High-Level Architecture

A multi-stage Retrieval Augmented Generation (RAG) pipeline powered by Gemini 2.0 Flash.

Submit Query Contextual Data Generated Response Content Retrieval Content Retrieval User Query Embedding API (Gemini) Gemini Hybrid Search (Typesense) Hybrid Search Gemini 2.0 Flash AI Gemini 2.0 Flash Model Inference Synthesized Answer Website Content Technical PDFs

1. High-Dimensional Query Vectorization

When a visitor submits a query, the system instantly transforms the raw natural language into a high-dimensional mathematical vector using the state-of-the-art text-embedding-004 model.

  • Deep Intent Recognition Moving beyond simple keywords to capture semantic intent—automatically mapping queries like "starting issues" to relevant "alternator and battery technical specifications."
  • High-Resolution Precision Leveraging optimized 768-dimensional embeddings to ensure maximum accuracy in identifying technical automotive part relationships and contextual relevance.
  • Real-Time Transformation The vectorization process is optimized for sub-100ms latency, providing an instantaneous foundation for the hybrid retrieval pipeline.

2. Semantic & High-Performance Hybrid Retrieval

We leverage Typesense as our high-performance vector search engine, executing a sophisticated hybrid retrieval strategy that combines semantic intent with keyword precision.

  • Neural Vector Matching High-speed vector similarity search compares the query's embedding against stored 768-dimension vectors to identify contextually relevant technical data.
  • Full-Text Keyword Ranking Simultaneously executes traditional text matching to isolate technical part numbers, SKUs, and exact terminology that might be lost in pure semantic space.
  • Weighted Result Fusion Integrates vector scores with keyword weights to provide a unified, re-ranked set of the most authoritative context snippets for the reasoning engine.
collections/pages collections/ai_documents_vuk

3. Multimodal Ingestion & Contextual Intelligence

For complex technical documentation like the ACDelco Battery Warranty, the architecture employs advanced multimodal processing to transcribe and structure unstructured data.

  • Structural Ingestion Gemini processes PDF scans and manuals through multimodal vision, transcribing tables and diagrams into clean Markdown while strictly preserving the original structural hierarchy.
  • Dynamic Knowledge Assembly The most relevant snippets are programmatically joined to form a unique, query-specific "Privileged Knowledge Base" that serves as the ground truth for answer generation.
  • Granular Chunking Documents are decomposed into highly focused segments to ensure maximum search precision and avoid overwhelming the reasoning engine with irrelevant noise.

4. Strategic Answer Generation (Gemini)

In this final stage, the system passes the curated knowledge snippets and the original query to Gemini 2.5 Flash (latest version) for context-grounded reasoning and synthesis.

  • Contextual Grounding Gemini acts as a reasoning engine, synthesizing multi-source search results into a cohesive, technical answer that is strictly anchored in the retrieved data.
  • Deterministic Accuracy Temperature is maintained at exactly 0.2 to ensure ultra-low creativity and high factual reliability, effectively eliminating the risk of AI hallucinations.
  • Flash Inference Leveraging the Flash model's low-latency architecture, the system provides sub-second synthesized answers directly to the end-user interface.

Caching & Performance

7 Days
Embedding Cache
24 Hours
Answer Cache
Auto
Folder Sync