Atlas Firm · Ascendanti Brief · 2026
Project Brief — Active 2026

Sovereign

A multi-agent autonomous research-paper authoring system for political science. Local-first. Span-traced. Self-auditing. Built around the principle that a single model cannot do eight people's jobs in one shot.

§ I

The premise

Sovereign began as the answer to a falsified hypothesis. Its predecessor — a single-call pipeline that asked one large model to draft a research paper end-to-end — failed with a perfect record: every output rejected, every run, every paper. The model was not the problem. The architecture was. A genius asked to be a committee, in one shot, with no memory of having tried before, will refuse the role by producing slop.

So we decomposed. Eight competences were named. Each was given a card to read, a card to write, and a contract that defines what it must not touch. Each emits a span event the moment it completes, so when the pipeline drifts we know which stage drifted, by how much, and why. The orchestrator does not think; it reads the plan, dispatches the next stage, and logs.

A genius asked to be a committee, in one shot, refuses the role by producing slop. — On the predecessor
§ II

The eight

Each agent is a bounded function. It reads a defined input, performs one act of reasoning, writes a typed output, and emits a span. Nothing more.

§ III

The loop

The orchestrator runs each paper through eight stages. Every stage emits a span. The trace is the paper's birth certificate.

  ·
   Spec-Author + Spec-Adversary    paper_specs/{paper}.json
   Source-Curator                  test_retrieval/{paper}.json
   Note-Taker                      cards/{paper}/card_*.json
   Lit-Reviewer                    schools, edges, where-paper-lands
   Debate-Mapper                   live debates, intersections
   Prose-Author (per section)      ProseDraft + cards-cited
   Patch-Assembler                 runs/{paper}/manuscript.md
   Methodologist + Adversary       audit_{paper}.jsonl
                                    calibration_{paper}.jsonl
                                    ⚑ operator ping if near-final
  ·
  trace: task_traces/pipeline_v2_{paper}_{ts}.jsonl
  portfolio: state/pipeline_v2_portfolio.json (atomic)

Eight stages produce one trace and ten artefacts. The trace is the per-stage span ledger — millisecond timing, contract status, decisions, signals. The artefacts are paper-level: the spec, the cards, the manuscript, the calibration, the audit. Together they form a complete, reproducible record of how this paper came to be the way it is.

§ IV

The calibrator

"Is this paper any good" is not a question you answer by vibes. It is a question you answer by distribution. The calibrator scores each manuscript on ten axes against a reference set of twenty-four hundred and one published papers from International Studies Quarterly and the American Political Science Review. A composite is reported alongside the per-axis scores; an axis is flagged if it falls below a threshold derived from the reference distribution.

10
axes scored
2,401
golden-set papers
70
methodology rules
80
adversarial attacks

If a run does not improve on the previous run on axes we can name, the run did not improve. The calibrator's purpose is to make that judgment formal. A baseline snapshot is taken on first run; every subsequent run reports its delta. The operator does not have to read the manuscript to know whether the loop is learning.

§ V

The stack

Sovereign runs entirely on the operator's hardware — an Apple Silicon workstation with sixty-four gigabytes of unified memory. The model rotation is served by llama-swap, mixing MLX-backed servers for the larger reasoners and llama.cpp for selected GGUF quants.

No cloud. No telemetry. The whole stack — about two hundred and forty gigabytes on disk — sits behind a single local router. When the operator closes the laptop, the system is gone. When it opens, it returns exactly as it was.

§ VI

What it isn't

Sovereign is not a chatbot. There is no persona, no helpful assistant, no conversational front. It is not a research copilot; it does not sit beside the operator suggesting edits. It runs as a batch over a paper specification and produces, on the other side, a manuscript that has survived eight rounds of independent review.

It is not a frontier model. It uses small, open, locally-runnable models — none above one hundred billion parameters — and gets where it gets through architecture rather than scale. The thesis is that the next decade of useful AI will be won by tighter loops, not larger weights.

It is not finished. The first integration run is paper-frontier-007. Whatever happens, the trace is the data. The loop will not be near-final on its first attempt; near-final on its first attempt was never the goal. The goal is a loop where the next run is traceably better than the last on axes we can name.

We did not ship a feature. We shipped a loop. — Address at the cutover
§ VII

Status

The system is in plug-and-run readiness as of May 2026. Eight paper specifications have been pre-drafted (frontier-007 through frontier-014), each surviving spec-adversary review with zero fatal findings. Three retrieval caches are warmed. The cutover script — preflight, model launch, smoke-test, dry-run, first live run — is one command.

One hundred and twenty-five tests pass. The gap-audit reports zero actionable findings across twenty-two checks. Six recurring bug patterns were caught and killed during build. The legacy single-call pipeline is queued for retirement on the third successful end-to-end run.

For the engineering log and the address given at the cutover, see the address and reflections.