Case Study

How a Leading Financial Services Firm Reclaimed
Control of 2.1M Lines of Legacy Code

Implementing LORE — an Agentic AI Reverse Engineering system — to decode undocumented monoliths, build a living knowledge base, and enable safe software enhancements across five technology stacks with zero institutional knowledge remaining.

2.1M
Lines of Legacy Code
5
Technology Stacks
12
Weeks to Production
73%
Faster Developer Onboarding
11
Autonomous AI Agents

The Client

The client is a mid-market financial services company processing over 400,000 transactions daily across loan origination, payment settlement, and regulatory reporting systems. Their technology estate had grown organically over 18 years, accumulating critical business logic in applications that no living engineer fully understood.

Organization Profile

With 1,200 employees and a 45-person engineering team, the organization operates 14 interconnected applications that form the backbone of their financial operations. These systems collectively handle regulatory compliance for three jurisdictions, integrate with 22 external partners, and process an average settlement volume of $1.8 billion per month.

Technology Landscape

Java / J2EE (core platform).NET Framework (reporting)C++ (settlement engine)PHP (partner portal)Python (data pipelines)

Infrastructure managed via Ansible and Puppet scripts. Minimal CI/CD automation. Source code in Git with limited branching discipline.

A Perfect Storm of Legacy Risk

The client faced a convergence of compounding risks that threatened their ability to operate, comply with regulations, and evolve their products. Every challenge amplified the others, creating a situation where even minor changes carried disproportionate risk.

01

Vanishing Institutional Knowledge

Over the past four years, 60% of the senior engineers who originally built the core systems had left the organization through attrition and retirement. Critical business logic — particularly in the settlement engine and regulatory modules — existed only in the minds of departed staff. The remaining team operated through trial-and-error when modifying unfamiliar modules.

02

Documentation Bankruptcy

Of the 14 applications, only 3 had any form of documentation, and those documents were 4–7 years out of date. Architecture decision records did not exist. API contracts were informally communicated via Slack messages and tribal knowledge. New hires spent an average of 14 weeks before making their first meaningful code contribution.

03

Change Paralysis

The team had accumulated a backlog of 340+ Jira tickets for enhancements and bug fixes. Engineers feared making changes because they could not reliably predict the blast radius of any modification. A routine configuration change in 2024 caused a 6-hour production outage affecting settlement processing, costing the client $2.3M in penalties and operational losses.

04

Regulatory Pressure

An upcoming regulatory mandate required the client to implement new reporting fields across their transaction processing pipeline within 9 months. Without understanding the full data flow across their interconnected systems, the engineering team estimated 18 months for the change — a gap that risked regulatory non-compliance and potential license revocation.

05

Multi-Stack Complexity

The five different technology stacks, accumulated organically over nearly two decades, meant that no single engineer could work across the full system. Cross-stack integration points were particularly fragile — the Java-to-C++ settlement bridge and the PHP-to-.NET reporting handoff were single points of failure understood by no current team member.

06

No Safety Net

Test coverage averaged 12% across the portfolio. Many applications had zero automated tests. Database schemas had accumulated 340+ tables with undocumented foreign key relationships and 47 stored procedures containing embedded business rules. Manual deployments via shell scripts introduced human error at every release.

LORE: Legacy Ontology & Reverse Engineering

We designed and deployed LORE — an agentic AI platform built on Google Agent Development Kit (ADK) with Gemini models — structured around three pillars that progressively build understanding, knowledge, and operational capability over the legacy estate.

Technology Foundation

Google ADK (Agent Framework)Gemini 2.5 Pro & FlashNeo4j 5.x (GraphRAG)tree-sitter (AST Parsing)codesteward-graphMCP (Model Context Protocol)A2A (Agent-to-Agent Protocol)Google Cloud RunVertex AI Agent EngineMkDocs & Confluence

System Architecture

Legacy Application Sources
Source Code (5 stacks)
DB Schemas & ERDs
Application Logs
Ansible / Puppet Configs
Swagger / WSDL Specs
Jira Tickets (340+)
Existing Docs (partial)
Pillar 1 — Documentation Generation Engine
CodeScanner Agent
AST Parser (tree-sitter)
DocGenerator Agent
DocSync Agent
Critic Sub-Agent
Pillar 2 — Knowledge Base & Ontology Layer
GraphBuilder Agent
Neo4j Code Ontology
Vector Embeddings
Ontology MCP Server
KBQuery Agent
Pillar 3 — Agentic SDLC
Planner Agent
Coder Agent
TestGen Agent
Reviewer Agent
Human Approval Gate
Outputs
MkDocs Site
In-Repo Markdown
Confluence Pages
MCP Endpoint
Pull Requests
P1

Documentation Generation Engine

Automated comprehension and living documentation

The first pillar automated the most time-consuming aspect of legacy management: understanding what the code actually does. Three specialized agents work in concert to analyze, document, and keep documentation synchronized with ongoing changes.

CodeScanner Agent

Uses tree-sitter to parse every source file across all five language stacks into Abstract Syntax Trees. Extracts classes, methods, call graphs, import chains, and inheritance hierarchies. Ingests database schemas, API specs, Ansible configs, and Jira tickets as supplementary context. Operates incrementally — after the initial full scan, subsequent runs process only changed files via git diff.

DocGenerator Agent

Produces four tiers of documentation: system-level architecture overviews for managers, module-level dependency maps for architects, method-level logic annotations for developers, and cross-cutting concern analysis (security, error handling, database patterns) for QA. Each tier uses specialized prompt templates with AST-grounded context fed into Gemini 2.5 Pro.

DocSync Agent

Monitors git commits and triggers AST-diff analysis to detect semantic code changes (not just text diffs). Regenerates only affected documentation sections. A Critic sub-agent validates updated docs against the code to prevent drift. Changes are propagated to MkDocs, in-repo Markdown, and Confluence pages automatically, producing a Documentation Change Report for each update cycle.

Result: Within 3 weeks, LORE generated comprehensive documentation for all 14 applications — producing 2,847 pages of structured content that would have taken the team an estimated 18 person-months to write manually. Documentation coverage went from 8% to 94%.

P2

Knowledge Base & Ontology Agent

GraphRAG-powered institutional memory exposed via MCP

The second pillar transforms raw code understanding into a queryable, relationship-aware knowledge graph. Built on Neo4j with native vector indexing, the ontology captures not just what the code is, but how it connects, why it exists, and what would break if it changes.

Code Ontology in Neo4j

The GraphBuilder Agent constructs a property graph with 12 node types and 14 relationship types, linking code entities to database tables, API endpoints, configuration entries, and Jira tickets. Gemini enriches each node with business domain classification, design pattern identification, and natural language summaries. Vector embeddings enable semantic similarity search alongside graph traversal.

For the client, the graph contained 184,000+ nodes and 620,000+ relationships, mapping every function call, inheritance chain, database access, and configuration dependency across the entire portfolio.

Ontology MCP Server

The knowledge graph is exposed as a Model Context Protocol server, making it accessible to any MCP-compatible client — including Cursor IDE, custom scripts, and the SDLC agents themselves. Nine MCP tools provide capabilities ranging from semantic code search to impact analysis to call graph traversal.

The KBQuery Agent adds a conversational layer, enabling engineers to ask questions like “What database tables does the PaymentProcessor write to, and which other modules read from those tables?” — getting precise, relationship-aware answers in seconds instead of hours of manual code tracing.

Result: The impact analysis MCP tool became the team's most-used capability. Before any code change, engineers query the blast radius in under 3 seconds. This single capability eliminated the class of “surprise cascade failures” that had previously caused three production incidents per quarter.

P3

Agentic SDLC for Safe Enhancements

AI-driven propose → review → approve workflow

The third pillar enables the client to safely execute enhancements on their legacy applications using an agentic workflow where AI agents plan, implement, test, and review changes while humans retain approval authority at every critical junction.

1
Jira Intake
Planner Agent ingests ticket & queries KB for context
2
Impact Analysis
Maps blast radius across all affected systems
3
Tech Spec
Generates implementation plan with risk assessment
4
Human Review
Engineer reviews & approves the plan
5
Code & Test
Coder + TestGen agents produce changes & tests
6
AI Review
Reviewer Agent checks patterns, security, regressions
7
Human Approve
Final sign-off before merge

Context Engineering

The Coder Agent does not generate code in a vacuum. Before writing a single line, it queries the Knowledge Base to retrieve the target codebase's actual conventions — naming patterns, error handling strategies, logging formats, and architectural patterns. Generated code matches the existing style so closely that in blind reviews, the client's engineers could not distinguish agent-generated code from human-written code 68% of the time.

Inter-Agent Handoff (A2A)

All agents communicate via Google's Agent-to-Agent protocol with full context transfer at each handoff. If the Planner Agent identifies cross-stack impact (e.g., a Java change affecting the C++ settlement engine), the Orchestrator Agent spawns parallel Coder Agent instances for each stack, coordinating through the shared Knowledge Base. Session state is persisted in Vertex AI Session Service, enabling resume-from-checkpoint if any agent encounters an error.

Result: In the first 6 weeks of Pillar 3 operation, the agentic SDLC processed 47 enhancement tickets from the backlog. Average time from Jira ticket to merged PR dropped from 23 days to 4.2 days. Zero production incidents resulted from agent-assisted changes.

12-Week Delivery Timeline

LORE was delivered in three phases over 12 weeks, with each phase building on the previous one. A team of 6 engineers executed the implementation alongside the client's existing staff, who were progressively trained on the platform.

Phase 1 — Weeks 1–4
Foundation & Documentation Engine
GCP infrastructure provisioning (Cloud Run, Neo4j Aura, Vertex AI). Tree-sitter parsing pipeline for Java and C++ (highest-priority stacks). CodeScanner, DocGenerator, and DocSync agents built and deployed. First application fully documented by end of Week 4.
Phase 2 — Weeks 5–8
Knowledge Base & Ontology
Neo4j graph schema with 12 node types and 14 relationship types. GraphBuilder agent pipeline with semantic enrichment via Gemini. Ontology MCP server with 9 tools. KBQuery conversational agent. Jira and Confluence integration. All 14 applications indexed.
Phase 3 — Weeks 9–12
Agentic SDLC
Planner, Coder, TestGen, and Reviewer agents. Human approval gate web UI. Context engineering pipeline. End-to-end integration testing. Web dashboard for all personas (developer, architect, manager, QA). First enhancement completed through full pipeline in Week 11.

Team Composition

1 Platform Engineer — GCP infrastructure, Cloud Run, Neo4j Aura setup and ongoing management

2 Agent Engineers — ADK agent development, prompt engineering, A2A protocol implementation

1 Knowledge Engineer — Neo4j schema design, graph construction pipeline, MCP server development

1 Frontend Engineer — Web dashboard, documentation site, human approval gate UI

1 Integration Engineer — Jira, Confluence, Git integration, webhook and polling setup

Key decision: Starting with Java and C++ in Phase 1 allowed the team to validate the architecture on the two most critical stacks (settlement engine + core platform) before expanding to .NET, PHP, and Python. Adding new language support required only tree-sitter grammar configuration — no agent code changes.

Measurable Impact Across the Organization

LORE's impact was measured across four dimensions: developer productivity, operational risk, regulatory readiness, and organizational resilience. Results were tracked for the first 90 days following full deployment.

MetricBefore LOREAfter LOREChange
New developer onboarding time14 weeks3.8 weeks73% reduction
Documentation coverage8%94%12x increase
Avg. time from ticket to merged PR23 days4.2 days82% reduction
Production incidents from code changes3.2 per quarter0 in first quarter100% elimination
Impact analysis time (per change)2–6 hours manual3 seconds automated>99% reduction
Backlog tickets processed (first 6 weeks)~8 tickets/6 weeks47 tickets/6 weeks5.9x throughput
Cross-stack dependency visibilityNone (undocumented)Full graph coverageComplete visibility
Regulatory field implementation estimate18 months (manual)4.5 months (with LORE)75% acceleration
We went from being terrified to touch any code in the settlement engine to having an AI agent that could tell us, in three seconds, exactly what would be affected by any change. LORE didn't just document our systems — it gave us back the institutional knowledge we thought was permanently lost when our senior engineers left. For the first time in years, we're making forward progress on our backlog instead of just keeping the lights on.
VP of Engineering, Client Organization

What Made LORE Different

Several architectural and methodological decisions distinguished LORE from conventional code analysis tools and set the foundation for its effectiveness.

G

GraphRAG over Pure Vector RAG

By combining Neo4j graph traversal with vector similarity search in a single database, LORE answers relationship-aware questions that pure embedding-based retrieval cannot. “What calls this function?” is a graph query. “Find code similar to this pattern” is a vector query. LORE does both in the same request, producing contextually richer answers.

C

Context Engineering, Not Generic Prompts

The Coder Agent retrieves the target codebase's actual conventions from the Knowledge Base before generating code. It is primed with real examples of how this specific codebase handles error flows, naming, logging, and architectural patterns — not generic best practices. This produces code that is stylistically indistinguishable from human-written code in the same project.

A

AST-Grounded Documentation

Every generated documentation statement traces back to a specific AST node in the parsed code. The Critic sub-agent validates this traceability on every generation and update cycle, preventing hallucinated documentation from entering the system. This achieved a verified accuracy rate of 96.3% on a random sample of 500 documentation statements.

M

MCP as Universal Integration Layer

Exposing the Knowledge Base as an MCP server meant that the client's developers could query institutional knowledge directly from their Cursor IDE, custom scripts could run impact analysis in CI pipelines, and the SDLC agents themselves used the same MCP tools as human developers. One interface, every consumer.

E

Episodic Memory for Institutional Knowledge

Every enhancement processed through the Agentic SDLC is recorded as a ChangeEvent in the Knowledge Base, preserving what changed, why, and who approved it. This creates a growing institutional memory that compounds over time — the system gets smarter with every change, unlike documentation that degrades.

H

Risk-Calibrated Human Gates

The system automatically assesses the risk level of every proposed change based on the impact analysis scope. High-risk changes (cross-stack, touching settlement logic) require senior engineer approval. Low-risk changes (documentation updates, config tweaks) can be auto-approved via policy. This kept humans focused on decisions that mattered.

Insights for Other Organizations

This engagement surfaced several insights applicable to any organization managing significant legacy application portfolios.

  1. Start with understanding, not modernization. The impulse to rewrite or refactor legacy code is strong, but premature without deep understanding. LORE's documentation-first approach ensured that every subsequent decision was informed by actual system behavior, not assumptions about what the code should do.
  2. Graph databases unlock questions you didn't know to ask. The client's team discovered 23 undocumented cross-stack integration points during the initial graph construction — dependencies that no team member was aware of. Vector search alone could not have surfaced these structural relationships.
  3. Incremental ingestion is non-negotiable for adoption. Initial full-codebase analysis took 4 hours per application. Without incremental updates (processing only changed files via git diff), the system would have been too slow for daily use. The DocSync agent's AST-diff approach kept ongoing updates under 30 seconds per commit.
  4. Human gates build trust, and trust enables autonomy. By requiring human approval at every decision point initially, the team built confidence in the agents' output quality. After 4 weeks, the client voluntarily enabled auto-approval for low-risk changes — a decision they would never have made without the trust earned through transparent review cycles.
  5. Multi-persona interfaces are essential for organization-wide adoption. Engineers, architects, QA, and managers all interact with the same knowledge base but need fundamentally different views. The KBQuery agent's persona-based responses meant a manager asking about “payment system risk” received a business-level summary, while a developer asking the same question received specific function names and call paths.
  6. MCP creates an extensible future. Three months after deployment, the client's own engineers built two additional MCP tools on top of the ontology — a compliance audit tool and a dead-code detector — without any changes to the core LORE platform. The protocol-based architecture turned the knowledge base into a platform rather than a product.

Roadmap Beyond Initial Deployment

With LORE operational, the client is now pursuing three follow-on initiatives that build on the platform's foundation.

Self-Improving Agent Loop

When human reviewers modify agent-generated code during the approval process, those corrections are fed back into the Knowledge Base as training signal. Over time, the Coder Agent's output increasingly matches the team's preferences without explicit prompt tuning.

Modernization Planning

With full dependency graphs and business domain classification now available, the client is using LORE's impact analysis to identify clean extraction boundaries for a strangler-fig modernization of their core platform — decomposing the monolith into services with confidence in the boundaries.

Compliance Automation

The regulatory reporting fields that triggered the original engagement are now being implemented through the Agentic SDLC pipeline. Projected completion: 4.5 months — well within the 9-month regulatory deadline and 13.5 months ahead of the original manual estimate.

LORE — Legacy Ontology & Reverse Engineering
An Agentic AI platform for reclaiming control of legacy enterprise applications.

Built with Google ADK • Gemini 2.5 • Neo4j GraphRAG • Model Context Protocol • A2A Protocol

Case study published May 2026. Client details presented with permission.
The Client is a representative composite based on real engagement patterns.