Technical Opinion

PageIndex in the Enterprise RAG Arena

A vectorless, reasoning-based retrieval framework that challenges the orthodoxy of embeddings and chunking — and what it means for production RAG pipelines.

May 2026 · ~6 min read · Source: VectifyAI/PageIndex

TL;DR

PageIndex replaces vector similarity search with a hierarchical JSON tree index that LLMs traverse via multi-step reasoning. It eliminates chunking artefacts, follows in-document cross-references, and achieved 98.7% on FinanceBench. Enterprise teams should evaluate it seriously for document-heavy, high-accuracy use cases — but should also be aware of its higher per-query LLM cost, latency profile, and reliance on well-structured source documents.

The Premise: Similarity Is Not Relevance

The central thesis of PageIndex is deceptively simple: semantic similarity and true relevance are not the same thing. Traditional vector-based RAG encodes document chunks into an embedding space and retrieves the top-k nearest neighbours at query time. This works well for surface-level information needs, but falls apart when the answer requires cross-referencing, domain reasoning, or navigating deeply nested report structures — exactly the kind of work enterprise knowledge workers do daily.

PageIndex proposes a two-phase alternative:

Index construction — an LLM generates a hierarchical, ToC-style JSON tree from a PDF (or Markdown), preserving logical section boundaries.
Reasoning-based retrieval — at query time, the LLM reads the tree index, reasons about which branch to explore, fetches raw content for that node, and iterates until it has enough context to answer.

No embedding model. No vector database. No fixed-size chunking. The index lives inside the LLM's context window and the retrieval is driven by inference, not cosine distance.

What PageIndex Gets Right

No Chunking Artefacts

Content is organized into semantically coherent sections (pages, chapters), not arbitrary 512-token slices. This preserves context across paragraph and table boundaries — critical for regulatory filings, contracts, and financial reports.

Cross-Reference Resolution

When a passage says "see Appendix G", the tree-search agent can follow the reference by navigating the index. Vector RAG would need a pre-built knowledge graph or ignore the reference entirely.

Explainable Retrieval

Every retrieval step is a reasoning trace: which section was selected, why, and what was extracted. This audit trail is a significant advantage for compliance-sensitive industries like finance, healthcare, and legal.

Multi-Turn Context

Because retrieval is driven by an LLM with conversation history, follow-up questions like "What about the liabilities?" naturally resolve against prior context, without re-embedding anything.

The 98.7% accuracy on FinanceBench (via Mafin 2.5) is the headline number, and it is impressive. FinanceBench is specifically designed to test retrieval over complex financial filings — exactly the kind of structured, reference-heavy document where vector search struggles most.

Where It Falls Short

Latency & Cost

Each retrieval step requires an LLM inference call. A single query may trigger 3–5 reasoning steps before convergence. At enterprise scale (>10k queries/day), this compounds into significantly higher cost and latency compared to a vector lookup that completes in milliseconds.

Index Construction Overhead

Building the tree index itself requires an LLM pass over the entire document. For a 200-page SEC filing this is manageable; for a continuously updated corpus of 100k+ documents, the index rebuild cost is non-trivial and the open-source PDF parser may not suffice.

Document Structure Dependency

The quality of the tree index hinges on the document having a discernible logical structure. Messy OCR scans, free-form emails, chat logs, and unstructured data streams are poor candidates. The Markdown mode explicitly requires well-formatted headings.

Scale Story Is Still Early

PageIndex File System extends the concept to multi-document corpora, but the agentic tree traversal across millions of documents is architecturally different from a vector index that scales horizontally. Production-grade benchmarks at that scale are still scarce.

How It Stacks Up

Dimension	Vector RAG	PageIndex
Retrieval mechanism	Cosine similarity on embeddings	LLM reasoning over tree index
Query latency	~50–200 ms (vector lookup)	~2–10 s (multi-step LLM calls)
Per-query cost	Low (embedding + DB lookup)	Higher (multiple LLM inferences)
Chunk integrity	Fixed-size; often broken	Semantic sections; preserved
Cross-references	Ignored without knowledge graph	Followed via tree navigation
Explainability	Opaque similarity scores	Traceable reasoning chain
Horizontal scaling	Mature (Pinecone, Weaviate, etc.)	Emerging (PageIndex File System)
Best document fit	Short, uniform, high-volume text	Long, structured, reference-heavy

New Use Cases This Opens Up

PageIndex is not just a better retriever — it enables categories of enterprise workflows that vector RAG handles poorly or not at all:

Regulatory Filing Analysis

Analysts can query SEC 10-K/10-Q filings, Basel III disclosures, or FDA submissions and get answers that follow internal cross-references ("per Note 14") without manual lookup. Compliance teams get a reasoning trace they can audit.

Multi-Turn Contract Review

Legal teams can conduct iterative, conversational review of 300-page master agreements. The system retains prior context ("Now what does Section 8.2 say about indemnification?") and navigates directly to the relevant clause.

Technical Manual Q&A

Engineering and field-service teams working with deeply nested equipment manuals, specification sheets, or standard operating procedures get precise, section-aware answers instead of fragmented chunk hits.

Agentic Document Workflows

Because retrieval is already agentic, PageIndex slots naturally into broader agent pipelines (e.g., OpenAI Agents SDK). An agent can autonomously pull data from a report, cross-check against a second document, and synthesize a summary — all without a vector DB in the loop.

Practical Recommendations for Enterprise Teams

Don't treat this as an either/or. Treat it as a routing decision.

1. Use PageIndex for high-value, low-volume queries over structured, long-form documents where accuracy matters more than latency — financial analysis, legal due diligence, compliance reporting.

2. Keep vector RAG for high-throughput, latency-sensitive workloads like customer support, knowledge-base search, and code retrieval where sub-second response times are non-negotiable.

3. Invest in index quality. The tree index is only as good as the source parsing. Use the cloud OCR pipeline (or an equivalent) for complex layouts with tables, figures, and multi-column text. The open-source standard PDF parser is a starting point, not a production answer for messy documents.

4. Monitor LLM cost carefully. Each reasoning step burns tokens. Profile your average steps-to-answer and model the cost at your expected query volume before committing.

5. Evaluate hybrid architectures. A vector pre-filter that narrows candidates to a small document set, followed by PageIndex tree search for deep retrieval within the shortlisted documents, could give you the best of both worlds.

The Verdict

PageIndex represents a genuine paradigm shift in how we think about retrieval. By moving from "find text that looks similar" to "reason about where the answer lives," it opens the door to a class of retrieval quality that vector databases structurally cannot achieve. The FinanceBench result is not a fluke — it reflects a fundamentally better retrieval strategy for documents that have logical structure and internal references.

That said, it is not a drop-in replacement for every RAG pipeline. The cost and latency profile is materially different, the scaling story is still maturing, and the approach assumes documents with reasonable structure. Enterprise teams should adopt it surgically: identify the highest-accuracy, document-heavy use cases in their portfolio and run a head-to-head pilot against their existing vector pipeline. The results in those niches are likely to be compelling.

The future of enterprise RAG is probably not one approach. It is an intelligent routing layer that dispatches queries to the retrieval strategy — vector, reasoning-based, or hybrid — best suited to the specific document and question at hand. PageIndex gives us a powerful new option in that toolkit.