Let’s have AI read The Godfather

In this post, we’ll explore a complete Retrieval-Augmented Generation (RAG) system designed to query books using AI. This system ingests an EPUB book, stores its content with vector embeddings in SQLite, and allows semantic search using Ollama.

Architecture Overview

The system consists of several key components working together:

Database Layer - SQLite with vector support for storing text chunks and embeddings
Ingestion Pipeline - Parses EPUB, extracts text, chunks it, and generates embeddings
Query System - Performs semantic search and generates answers using AI
AI Integration - Connects to Ollama for embeddings and text generation

File-by-File Breakdown

Database Setup (`src/db.ts`)

// Dependencies: npm install @libsql/client
import { createClient } from "@libsql/client";
import sql from "./sql";

export const db = createClient({ url: "file:rag.db" });

export const initDB = async () => {
  await db.execute(sql.createTable);
  await db.execute(sql.createIndex);
};

The database layer uses @libsql/client to connect to a local SQLite database. The initDB function creates the chunks table and a vector index for efficient similarity search.

SQL Queries (`src/sql/index.ts`)

export default {
  createTable: `
    CREATE TABLE IF NOT EXISTS chunks (
      id        INTEGER PRIMARY KEY AUTOINCREMENT,
      text      TEXT NOT NULL,
      embedding F32_BLOB(1024)
    )
  `,
  createIndex: `
    CREATE INDEX IF NOT EXISTS chunks_vec
    ON chunks (libsql_vector_idx(embedding))
  `,
  insert: `INSERT INTO chunks (text, embedding) VALUES (?, vector(?))`,
  query: `
    SELECT chunks.text FROM vector_top_k('chunks_vec', vector(?), ?) AS top
    JOIN chunks ON chunks.id = top.id
  `,
};

This file contains all SQL queries used throughout the application. The table stores text chunks alongside their 1024-dimensional embeddings. The query uses LibSQL’s vector search extension to find the most similar chunks.

Text Processing Utilities (`src/utils/index.ts`)

// (No external dependencies - pure TypeScript)
export const stripHTML = (text: string): string => {
  return text.replace(/<[^>]+>/g, " ").replace(/\s+/g, " ").trim();
};

export const chunkText = (text: string, chunkSize = 1500, overlap = 100): string[] => {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += chunkSize - overlap) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
};

Two utility functions process raw text:

stripHTML - Removes HTML tags from EPUB chapter content
chunkText - Splits text into overlapping chunks (1500 chars with 100 char overlap) for better context preservation

AI Integration (`src/ai.ts`)

// Dependencies: npm install ai @ai-sdk/openai
import { createOpenAI } from "@ai-sdk/openai";
import { embed, generateText } from "ai";

const QUERY_PROMPT = "Answer ONLY using the provided context...";

const ollama = createOpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama",
});

const embeddingModel = ollama.embeddingModel("mxbai-embed-large");
const queryModel = ollama.languageModel("qwen3:8b");

export const embedder = async (text: string): Promise<number[]> => {
  const { embedding } = await embed({ model: embeddingModel, value: text });
  return embedding;
};

export const generate = async (context: string, question: string): Promise<string> => {
  const { text } = await generateText({
    model: queryModel,
    temperature: 0.1,
    system: QUERY_PROMPT,
    messages: [
      {
        role: "user",
        content: `
          Context: ${context}
          Question: ${question}
        `
      }
    ]
  });

  return text;
}

This module connects to a local Ollama instance running on port 11434. It uses two models:

mxbai-embed-large - For generating text embeddings
qwen3:8b - For generating answers to questions

The embedder function generates vector embeddings from text, while generate creates answers using retrieved context.

Ingestion Pipeline (`src/ingest.ts`)

// Dependencies: npm install epub2
import Epub from "epub2";
import { db } from "./db";
import { embedder } from "./ai";
import { stripHTML, chunkText } from "./utils";
import sql from "./sql";

const EPUB_PATH = "./src/source/The_Godfather_Mario_Puzo.epub";

export const ingest = async () => {
  console.log("Ingesting book...");
  const epub = await Epub.createAsync(EPUB_PATH);
  const chapters = epub.flow;

  for (const chapter of chapters) {
    let html: string;
    try {
      html = await epub.getChapterRawAsync(chapter.id);
    } catch (error) {
      console.log("Error reading chapter", chapter.id, error);
      continue;
    }

    const text = stripHTML(html);
    const chunks = chunkText(text);

    for (const chunk of chunks) {
      const embedding = await embedder(chunk);
      await db.execute(sql.insert, [chunk, `[${embedding.join(",")}]`]);
    }
  }

  console.log("Done!");
};

The ingestion pipeline:

Opens the EPUB file (The Godfather by Mario Puzo)
Iterates through all chapters
Extracts raw HTML from each chapter
Strips HTML tags and chunks the text
Generates embeddings for each chunk
Inserts text and embeddings into the database

Query System (`src/query.ts`)

// Dependencies: (uses db from db.ts, embedder/generate from ai.ts, sql from sql/index.ts)
import { db } from "./db";
import { embedder, generate } from "./ai";
import sql from "./sql";

const TOP_K = 20;

export const query = async (question: string): Promise<string> => {
  const vector = await embedder(question);

  const { rows } = await db.execute({
    sql: sql.query,
    args: [`[${vector.join(",")}]`, TOP_K],
  });

  const context = rows.map(r => r.text).join("\n\n");

  return await generate(context, question);

};

When a user asks a question:

The question is converted to a vector embedding
The database performs similarity search to find the top 20 most relevant chunks
Retrieved chunks are joined as context
The AI model generates an answer using the retrieved context

Main Entry Points

Setup (src/setup.ts) - Initializes the database and ingests the book:

// Dependencies: (imports from db.ts and ingest.ts)
await initDB();
await ingest();

Query (src/index.ts) - CLI interface for asking questions:

// Dependencies: npm install -D @types/node (for process.argv)
const question = process.argv[2] ?? process.exit(1);
const answer = await query(question);

How It All Fits Together

First Run: Execute setup.ts to initialize the database and ingest The Godfather
Query Time: Run index.ts with a question like “Who is Michael Corleone?”

The system retrieves relevant passages from the book and uses the AI model to generate a context-aware answer - demonstrating a complete, working RAG pipeline for book content.

Dependencies

Install all required packages:

npm install @libsql/client epub2 ai @ai-sdk/openai
npm install -D @types/node typescript

Technologies Used

@libsql/client - SQLite with vector support
epub2 - EPUB parsing
ai - Vercel AI SDK for embeddings and text generation
@ai-sdk/openai - OpenAI-compatible API for Ollama

Build a local RAG system with Ollama, libSQL, and an EPUB

Let’s have AI read The Godfather

Architecture Overview

File-by-File Breakdown

Database Setup (src/db.ts)

SQL Queries (src/sql/index.ts)

Text Processing Utilities (src/utils/index.ts)

AI Integration (src/ai.ts)

Ingestion Pipeline (src/ingest.ts)

Query System (src/query.ts)