Research

Building the context and governance infrastructure that makes agents deployable inside real organizations.

Enterprise AI does not fail because models lack intelligence. It fails because models arrive at an organization blind — no structured account of what was decided, what changed, who owes what to whom — and ungoverned: no principled answer to what they are permitted to do about any of it. Every serious deployment eventually hits both walls. N71's research program exists to remove them. We study how organizational reality gets structured into a substrate machines can read, and how machine action gets governed once they can.

Our Thesis: Context Is Infrastructure, Not Retrieval

The industry treats organizational context as a search problem: embed everything, retrieve what's similar, hope the model sorts it out. We believe context is an infrastructure problem — closer to a database than a search engine. Meaning must be resolved when information arrives, not guessed when a question is asked. Provenance must be preserved, not discarded at ingestion. Time must be a property of every fact, not a sort order. And permission must travel with the context itself, because an agent that can read everything and is allowed to do anything is not a product — it's an incident report waiting to be written.

These positions have architectural consequences. What follows is the system they produced.

Act I

Why Retrieval Fails as Memory

Similarity search returns what sounds like the question, not what is true in the organization. The failure modes are now well documented across the field: semantically close facts interfere with one another; stale facts sit beside current ones with equal weight; and the effective dimensionality of production embedding spaces is a fraction of their nominal size, so the collisions are worse than the spec sheet suggests. Recent impossibility results formalize what practitioners already knew — within a purely similarity-based memory, interference and false recall cannot be engineered away. Escaping requires explicit symbolic structure on top of the semantic layer.

We agree with the diagnosis. We disagree that the conclusion is novel: it is the reason N71 was built as a knowledge graph from the first commit. The question that actually matters is not whether to add structure — it's what the structure is made of, how it stays current as the organization changes, and whether you can trust what comes out of it. Acts II through IV are our answers.

Act II

The Evidence-Anchored Organizational Graph

N71's substrate is a five-layer stack in which every layer is derived from — and permanently linked back to — the layer below it. Nothing in the graph floats free of its source.

Layer 1 — Raw events.

Every meeting, email thread, message, document, calendar event, and code commit enters as an immutable raw event with source, actor, and timestamp. This layer is never edited. It is the organization's ground truth.

Layer 2 — Evidence.

Raw events are parsed into evidence documents and paragraph-level evidence chunks — the addressable, citable unit of the entire system. Every claim N71 ever makes resolves to a chunk, and every chunk resolves to the exact passage in the exact source at the exact moment it was captured.

Layer 3 — The graph.

An LLM extraction pipeline (GraphAwareExtractor, running inside the DocumentIngestionService) lifts typed entities and explicit relationships out of the evidence: people, accounts, products, decisions, commitments, and the edges between them — works_on, part_of, authored_by, blocked_by. Extraction is graph-aware: new information is resolved against existing entities rather than duplicated, with confidence scores attached at write time. The ontology is constructed per organization; N71 learns your nouns rather than forcing you into ours.

Layer 4 — Temporal state.

Every entity carries a temporal profile: when it was first and last observed, its lifecycle stage, and its momentum — whether it is accelerating, steady, or going quiet. Superseded facts are not deleted; they are succeeded, with history intact. An account that has gone silent for three weeks is a different object than an active one, and the graph knows the difference before anyone asks.

Layer 5 — Synthesis.

A continuous process reads across the graph and produces Thoughts: proactive intelligence objects — risks, drifting commitments, surfaced opportunities, briefings — each carrying a synthesis chain that links every assertion back through the graph to the evidence chunks that justify it. The brain does not wait to be queried.

The structural consequence: N71 is not a search index with a chat interface. It is a layered system of record in which the answer to "why does the system believe this?" is always a finite, inspectable path ending in a verbatim source.

Act III

Grounded Answers: Refusal Over Fabrication

A memory system's most dangerous output is a confident answer to a question its corpus cannot support. Most systems fail open — they generate something plausible. N71 fails closed.

Every answer passes through a three-stage protocol. Retrieve: candidate evidence is gathered through parallel channels — graph traversal, semantic similarity, lexical match, and a freshness pass that surfaces material too recent to be fully indexed — each channel covering the others' blind spots. Synthesize: an answer is composed strictly from retrieved evidence, with inline citations to specific chunks. Validate: every citation is checked against its anchored source; claims that cannot be verified against the evidence are removed from the answer before it is returned, and the validation record ships with the response.

And when the corpus cannot support an answer at all, N71 refuses — explicitly, with a machine-readable refusal reason — rather than restating stale facts in a confident voice. In enterprise settings, knowing what the system doesn't know is worth as much as what it does. We treat the refusal path as a first-class product surface, not an error state.

Act IV

Governance: The Layer the Memory Industry Skips

Context infrastructure that can read an organization will, inevitably, be asked to act on it. This is where memory products end and where N71's second research thread begins.

Every write and every action available to an agent through N71 passes through a governance layer with exactly three outcomes: executed, approval required, or blocked. The classification is not a global setting — it is computed from who is asking, through which agent, in what scope, against which resources. Permission travels with the context. An agent reading the graph inherits not just the organization's knowledge but its authority structure.

This architecture was not designed in the abstract. It was forged as the design constraint of running in production inside a regulated government environment on sovereign cloud infrastructure — including air-gapped deployment tiers where no data, no telemetry, and no model traffic leaves the perimeter. Most systems treat the hardest deployment conditions as a roadmap item. They were our starting point, and the governance layer is the reason the deployment exists.

Act V

From Memory to Work: The Two-Plane Architecture

Context infrastructure has to serve two kinds of consumers, and they need opposite things. So N71 runs two planes.

The pull plane — MCP. Any agent — Claude, Codex, open agents, your own — connects to N71 over the Model Context Protocol and reads the brain through a small set of composable primitives: navigate for the map, drill for depth, anchor for verbatim sources, answer for validated, citation-checked responses. What one agent learns, every agent can read — under the same governance, with the same evidence trail.

The push plane — the N71 daemon. Reading is half the loop. N71 also dispatches. A supervised daemon runs at the edge — crash-safe under launchd and systemd, lease-based work claims so no task is ever executed twice, heartbeats so the brain always knows the state of every agent it has working — and orchestrates fleets of coding agents (Claude Code, Codex, and compatible open agents) against work the brain itself has proposed. Multi-agent runs that would take a human operator a day to supervise execute under the daemon's supervision, report results back through the same evidence layer, and become part of the graph. Work doesn't just get done; it gets remembered, with receipts.

Riding on the two planes is the loop — N71's current research frontier: an Observer that watches how the organization actually works — the patterns in its events, not the org chart's theory of them; a Proposer that drafts concrete work grounded in graph evidence; an Executor that runs first in sandbox, then live, strictly inside the governance layer; and an Improver that feeds outcomes back into the graph, so the system's model of the organization includes the results of its own actions.

The endpoint is not a chatbot over documents. It is an intelligence layer that watches how a company works and deploys agents to do it better — with every action evidenced, permissioned, and auditable after the fact.

Evaluation & Technical Reports

We measure before we claim. Current program:

MEME benchmark evaluation — first results, June 12–13, 2026. MEME (KAIST AI · Tübingen · NAVER, 2026) is the first benchmark to isolate the two memory tasks that matter most in evolving organizations: Cascade — when an upstream fact changes, do dependent facts update? — and Absence — does the system know when it doesn't know, instead of confidently restating stale facts? Field averages are 3% and 1% respectively. In our June 12, 2026 run of the full 100-episode suite — the benchmark's published episodes and judges, through our production pipeline — N71 (iteration 1) scores 0.512 overall, the highest of any memory system in the study, within 0.04 of the frontier-model reference at ≈70× baseline cost, with Cascade at 0.55 and Absence at 0.35 against those field averages. We publish the losses too, and the trades: a June 13 iteration 2 shipped the fix for our weakest task and more than doubled Aggregation (0.12 → 0.29), but regressed Cascade and Deletion through a field-drift fault now under repair — published in full, including the regression. Tracking trails the best file agent, and the propagation–refusal trade behind Absence is analyzed in TR-2026-01 §5. Every one of the 1,188 questions per iteration is downloadable for validation on the benchmarks page.

The Evidence-Anchored Organizational Graph — technical report, published. Full specification of the five-layer substrate: extraction, identity resolution, temporal lifecycle, and the citation-validation protocol.

Governed Agent Action — technical report, published. The three-state governance model and its deployment under sovereign and air-gapped constraints.

Ontology Induction: Learning the Organization's Nouns — technical report, published. The seed-and-extension model, the type-adoption contract, the noise-control mechanisms that make open-ended induction safe, and an evaluation methodology for ontology quality.

A note on disclosure: our reports specify contracts, protocols, and measured results — the things that make a claim falsifiable. Extraction internals, resolution scoring, and ranking parameters remain ours. We publish what we'd want to be held to, not what we'd hand a copycat.

N71 is context infrastructure for the agent era. The graph remembers. The evidence proves it. The governance layer decides what happens next.