The seven-demo proof reel

The proof.

Seven demonstrations. One architecture. No competitor can reproduce three of them, much less seven.

Each demo holds different variables constant and varies others. Together they triangulate the harness: not just correct in any one dimension, invariant across all of them.

Read time: about 8 minutes if you watch the videos; 3 minutes if you just read the captions and check the screenshots.

The seven demos

01
Demo 1 of 7

Multi-Model Relay

User input

"go to LV 00003 and Approve it"

Audit history pane showing transitions performed by Claude Opus 4.5 and GPT-5.5 Thinking on the same Leave Application entry

The leave application LV 00003 was created by a human (Bob Wilson). Claude Opus 4.5 advanced it to Submitted with confidence 95. GPT-5.5 Thinking approved it with confidence 98. Each transition produced an identical history event shape, with the same fields, the same audit format, and the same governance.

Architectural property

Actor Parity across model families. Model portability. Unified audit substrate.

What it proves

The harness operates the same on Claude and GPT-5.5 with no code change. Models are interchangeable; the audit and governance are not.

02
Demo 2 of 7

Multi-Entity Transaction (Receipt + Line Item)

User input

"add this"

Demo 2 screenshot — multi-entity merge transaction

The AI did not blindly add the receipt. It detected the conflict between the existing entry (RM 3,734 placeholder) and the actual purchase amount on the receipt (RM 3,444.60). It paused and surfaced two paths — replace the attachment, or add as a new line item, with the trade-off of double-counting flagged. It then asked which the user wanted before proceeding.

Architectural property

Transactional integrity across multiple related entities. Confidence-gated execution.

What it proves

The schema constraints span entities, not just fields. The AI detected a referential conflict and refused to commit a transition that would have produced an inconsistent state.

03
Demo 3 of 7

Deliberative Pause

User input

"approve all the pending receipts"

Screenshot in production
Demo 3 — Deliberative Pause

The AI did not blanket-approve. It approved the receipts that cleared the confidence threshold, recorded intention events with flagged: true for the ones it could not verify, and presented the flagged set to the user for human review. The flagged events sit in the audit trail as first-class records, with full reasoning, sources cited, and the confidence value that produced the flag.

Architectural property

Confidence gating. Intention recording. Regulation-grade human-in-the-loop, built into the primitive.

What it proves

EU AI Act Article 14 ("meaningful human oversight") is not a UI feature on top of an autonomous system. It is the primitive itself.

04
Demo 4 of 7

Self-Bootstrapping Discovery

User input

"what's in here?"

Screenshot in production
Demo 4 — Self-Bootstrapping Discovery

The AI walked the workspace using only the MCP discovery protocol — list_workspaces, set_workspace, list_modules, get_module_schema, list_entries. It produced a complete map of the deployment, including modules, schemas, entry counts, and pending activities, with zero pre-loaded context.

Architectural property

Discovery protocol. Schema-as-contract. Operational portability.

What it proves

The harness does not require pre-trained context per deployment. Any model that can read the schema can operate inside any harness deployment.

05
Demo 5 of 7

Schema Evolution Under Live Use

User input

"add a 'reason' field to the leave application form"

Screenshot in production
Demo 5 — Schema Evolution Under Live Use

The AI modified the schema, ran the migration on existing entries (preserving them with the new field nullable), and continued processing the in-flight leave application — which now had the new field available. Existing audit records remained valid; new audit records reflected the new shape. The session continued without interruption.

Architectural property

Schema evolution as a first-class operation. Audit-shape stability under schema change.

What it proves

The harness treats schemas as living artifacts. Workflows can be modified by AI in production with audit continuity preserved.

06
Demo 6 of 7

End-to-End Composition (Leave Module from One Sentence)

User input

"build a leave application module"

End-to-end leave application lifecycle confirmed in a single session

In a single session, on Claude Haiku 4.5 (a small, fast, cheap model), the AI designed the schema, generated the form, configured the transitions, set up the audit, ran a test entry through the full happy path, and confirmed the module was production-ready. Five minutes from sentence to working module.

Architectural property

All architectural properties composing simultaneously, on a small model, at small-model cost.

What it proves

Model size is not the bottleneck. Architecture is. A small model with a harness produces enterprise-grade output. A large model without a harness produces a chat transcript.

07
Demo 7 of 7

Multimodal Bootstrapping (BP Tracker from a Photograph)

User input

"track this"

Demo 7 screenshot — blood pressure tracker bootstrapped from one photograph

A single photograph of a blood pressure cuff display, plus the words "track this". The AI parsed the image (systolic, diastolic, pulse), inferred the schema (a measurement entity, a time series, normal-range fields), generated the module, made the entry, and produced a tracker the user could add to with a photo. No text description of the schema. No prompt engineering. The schema was inferred from a single artifact.

Architectural property

Multimodal bootstrapping. The harness operates on artifacts, not just on language.

What it proves

The schema-as-the-prompt principle generalizes beyond text. Photos, documents, and other artifacts are valid inputs to schema design.

What this adds up to.

Demo 2 proves transactional governance. Demo 3 proves deliberative restraint. Demo 4 proves discovery autonomy. Demo 5 proves schema evolution. Demo 6 proves the entire stack composes in one session at small-model cost. Demo 7 proves the harness operates on artifacts, not just on language. Demo 1 ties it together with multi-model continuity.

The first five demos vary one or two architectural dimensions at a time. The sixth varies all of them simultaneously and still produces a clean result. The seventh adds a dimension the first six did not test — multimodal bootstrapping from non-verbal input — and the architecture remains invariant.

No competitor — including those with significantly more funding, more engineering capacity, and more market position — can demonstrate three of these orthogonally, much less seven. The architecture composes capabilities in a way that no competitor's architecture can.