Six tenets that shape every system the lab builds. They are not slogans. Each names a constraint that, if violated, returns the system to a state we have already learned does not work.
Reasoning runs on hardware the operator owns. The cloud is for distribution, not cognition. A system whose intelligence depends on a remote endpoint is a system whose intelligence can be revoked, rate-limited, mispriced, or quietly altered between calls. None of those failure modes are recoverable inside a long-running loop. We build for the case where the laptop is the only thing the operator can trust.
The constraint is not anti-cloud. It is anti-dependence. Models, weights, embeddings, vector stores, traces — all of it lives on the operator's disk. The work product is the operator's. The work itself is observable. When the laptop closes, the system is gone; when it opens, the system returns exactly as it was. There is no other contract that survives a decade.
Every agent has a declared scope. What it reads. What it writes. What it must not touch. Discretion lives in the model; authority lives in the contract. An agent that is not bounded is not an agent — it is a process whose responsibilities are guessed at by the operator and renegotiated each time it fails. That is a system that cannot be reasoned about.
Bounded does not mean small. It means declared. The note-taker may read sources, write cards, and emit a span. It may not edit the spec, modify the manuscript, or mutate the calibration. If it tries, the orchestrator stops it. The contract is enforced not by good intentions but by a typed boundary the agent cannot cross.
Every output is a span event. A system that cannot show its work cannot be debugged, and a system that cannot be debugged cannot improve. Traceability is the difference between a loop that learns and a loop that hopes.
The trace is not a log. It is a structured record: stage name, milliseconds elapsed, contract status, decisions, signals, the typed inputs and outputs of every reasoning step. When a paper produced by Sovereign drifts, the operator does not have to read the paper to find out which stage drifted. They read the trace. The drift is named.
The trace is the paper's birth certificate. — On the loop
Quality is measured against an external reference set, not declared by the model that produced it. A system that grades its own output without an anchor will produce confident grades for output that does not deserve them — every time, in every domain, on every benchmark we have ever inspected. The fix is not better grading prompts. The fix is an external reference.
Sovereign's calibrator scores against twenty-four hundred and one published papers from International Studies Quarterly and the American Political Science Review. Scoring is not a probability the model emits; it is a distance to the reference distribution. If the manuscript falls outside the distribution on an axis we care about, that is the audit's signal — not a number the model invented.
Self-auditing also means refusing to finish. The loop does not stop when the model says it is done. The loop stops when seventy methodology rules pass, eighty adversarial attacks fail to find purchase, and ten calibration axes clear their thresholds. Until then, the loop runs again.
Friction at the right boundaries, not none. The fashionable view is that the best tool is invisible — that the goal of design is to make the seam between operator and machine disappear. We disagree. A seam that disappears is a seam that cannot be governed.
Sovereign is deliberately not a chatbot. There is no persona, no soothing front, no plausible illusion that the manuscript was written by a thinking interlocutor on the other side of the screen. The operator inscribes a paper specification, the system runs a structured procedure, and the operator reads the trace. The act of governing the system is felt, not hidden. The transition between biological cognition and machine cognition is announced.
The point is not friction for its own sake. The point is that the operator must remain acutely aware of which side of the seam they are on. A tool that asks to be forgotten is a tool that is no longer being commanded.
The work is never finished. Every run should be better than the last on axes we can name. There is no version of Sovereign that is "done"; there are only successive runs, each producing a trace, each producing a delta, each producing a new question the next run will try to answer.
This is not aspiration. It is operationally enforced. The calibrator records a baseline on first run; every subsequent run reports its delta. If three consecutive runs show no improvement, that is a signal — either the loop has plateaued and needs a new mechanism, or the axes themselves are wrong and need to be replaced. Both outcomes are productive. Both are visible because the loop is finite at every step but infinite in aggregate.
A finite game is played for the purpose of winning. An infinite game is played for the purpose of continuing the play. — J. Carse, paraphrased
None of these tenets are novel. Local-first is older than the cloud. Bounded contracts are older than software. Traceability is what every empirical science has always demanded of itself. Self-audit is what peer review was supposed to be. Seamful design is what cyberneticians argued for in the nineteen-sixties. Infinite games are what every craftsperson eventually accepts as the shape of their work.
What is novel is the insistence on holding all six at once, in a single system, without trading any of them away for convenience. Most autonomous systems we have inspected give up at least three. The lab's bet is that the next useful generation will be the one that gives up none.