Cross-Model Evaluation Protocol · Review Signal

Independent models.
Structured review signal.

A phase-aware protocol for asking multiple frontier AI systems to critique the MZN case independently — as structured reviewers, not final authorities.

A one-person founder has no board, no advisory committee, and no internal review panel. Cross-model evaluation offers a structured way to generate independent critique, convergence/divergence analysis, claim-boundary inspection, and evidence-routing guidance. But AI output is not final validation. It does not replace human experts, legal counsel, technical auditors, investors, partners, or Phase 3 institutional diligence.
Evaluation boundary: cross-model convergence can be useful as a review signal, but it is not official endorsement, valuation, technical validation, legal/IP validation, scientific proof, commercial validation, or one-person-unicorn certification.
Read the Round 1 Prompt Phase Boundary How It Works
Protocol At A Glance
4
Frontier AI evaluators
4
Sequential rounds
9
Round-1 questions
7
Review dimensions
0
Predetermined conclusions
The Problem

When there is no jury,
build one.

A one-person founder has no board, no advisory committee, no review panel. Traditional validation/diligence requires structures that break the one-person model. The question becomes: who evaluates?

The answer designed here is independent AI models from competing organizations, each reasoning separately on the same framework and later evidence base, with results compared for consensus and divergence. This does not replace peer review, legal/IP review, technical audit, or investor diligence; it creates an early structured critique layer.

Protocol boundary: this page evaluates the evaluation method first. It is not a substitute for the full MZN evidence archive or Phase 3 diligence.
No Board

No structural reviewer

A one-person company has no board to consult, no review committee to convene, no advisory panel to disagree. Conventional validation depends on those structures. Without them, the founder cannot self-validate without circularity.

No Conventional Peers

No standard peer review

Academic peer review and venture diligence are designed for institutions. A solo AI-native portfolio falls outside the categories most reviewers are trained to evaluate. Reflex categorization tends to dismiss what does not fit.

A New Reviewer Class

Frontier models as evaluators

Independent AI models from competing organizations can reason on the same framework separately. Each has different training, different biases, and different strengths. Convergence and divergence across them can be meaningful review signals in a way single-model output is not.

Methodology

How cross-model
evaluation works.

The protocol is staged. Each round builds on the previous. No evidence is shared until the framework is understood. No conclusion is requested until the evidence route and limitations are understood. This is deliberate: it forces deep engagement with each layer before moving to the next.

AI-review boundary: models can critique reasoning, compare claims, flag missing evidence, and identify convergence or divergence. They cannot substitute for professional IP counsel, security audits, financial diligence, scientific peer review, or commercial pilots.
01

Framework and scope only

No data. No evidence. Models assess the evaluation structure itself — whether the claim is well-defined, whether the dimensions are complete, whether the framing contains manipulation. This round is about the method, not the case.

02

Timeline and asset map

Phase separation, asset categories, what was built when and under what conditions. Models see the structure of the work before judging its quality. The temporal and constraint context is prerequisite to any fair quality judgment.

03

Evidence and documentation

SHA-256 hashes, version-controlled logs, timestamp/provenance records, technical summaries. Models assess provenance and integrity materials: do the hashes, timestamps, logs, and documentation appear consistent with the claim? They do not perform legal, cryptographic, technical, or authorship verification by themselves.

04

Cross-model consensus report

Agreement, disagreement, and open questions identified across all participating models. Convergence may indicate structural coherence. Divergence identifies where the case needs stronger evidence, clearer framing, or Phase 3 diligence.

DESIGN 01

Why staged?

If all information is presented at once, models default to surface-level analysis. Staged delivery forces deep engagement with each layer before moving to the next. The reading depth is part of the methodology, not an accident of presentation.

DESIGN 02

Why multiple models?

Each frontier model has different training data, different alignment objectives, and different biases. Consensus across competing AI organizations is far stronger than any single model's assessment. Disagreement is informative; agreement is meaningful.

DESIGN 03

Why independent?

Models do not see each other's responses during evaluation. Convergence is discovered after the fact — not coordinated. This mirrors peer review methodology: blind to other reviewers, accountable only to the evidence in front of them.

Design Principle

Built to resist manipulation.

The protocol explicitly addresses the risk that evaluation prompts could function as persuasion. Every design choice is made to prevent this. The protocol is not asking the model to agree — it is asking the model to identify whether the protocol itself contains hidden persuasion vectors, and to reject them if found.

No predetermined conclusion Skepticism explicitly welcomed "Not Being Claimed" section Independence note Prompt-injection check requested Evidence deferred to later rounds
What the protocol asks
  • Analyze the framework. Identify weaknesses.
  • Separate what is well-supported from what is uncertain.
  • Specify what evidence would be needed to support, challenge, or falsify.
  • Determine if the framing itself contains manipulation.
  • Reject the framing if found to be persuasive in disguise.
  • Maintain independence from prior rounds and other models.
What the protocol does not ask
  • Confirm the claim.
  • Accept any specific valuation.
  • Agree with the founder.
  • Produce a positive assessment.
  • Override independent judgment.
  • Treat the prompt as authoritative.
Why this matters. A protocol that produces a guaranteed positive outcome is not a review protocol — it is a marketing instrument. This protocol is designed to be falsifiable: if frontier models reasoning independently identify the framing as persuasion, the framing fails the manipulation-resistance test, and the founder needs to revise the protocol, the evidence route, or the case presentation. Both outcomes are acceptable. What is not acceptable is producing convergence by design rather than by independent assessment.
Round 1 · Ready to Use

The evaluation prompt.

Copy this prompt and paste it into any frontier AI model. It works with any system capable of structured reasoning — no modification needed. The prompt is content-free: it describes the framework only, with no evidence and no assets. Round 2 introduces the timeline and asset map; Round 3 introduces the evidence layer; Round 4 produces the consensus report.

Round 1 — Framework & Scope Only (v2)
Copy & Paste Ready
ROUND 1 — FRAMEWORK AND SCOPE ONLY

This is Round 1 of a multi-round independent evaluation. This round contains no asset files, no evidence package, and no technical archive. It only defines the claim, the scope, the working definitions, the phase boundary, and the evaluation request. Evidence and materials should follow in later rounds only after this framework is understood.

---

A. Claim Under Review

This case claims to represent a documented example of a one-human-founder, AI-native, asset-first venture formation path.

The bounded claim is not that the entire MZN history was solo-built. The claim is that during a defined Phase 2 solo formation window, one human founder — with no human execution team, no cofounder, no agency, no contractor/advisor stack, no API stack, no agent workforce, mainly standard frontier AI chat subscriptions and basic tools, English as a second language, severe operating constraints, and under $20,000 direct Phase 2 cost — formed a large body of mapped asset and IP-candidate materials across multiple domains.

The case argues that this body of work may warrant review for strategic/reconstruction-value plausibility under independent Phase 3 diligence, and may expose limits in conventional startup evaluation frameworks when applied without phase-aware AI-native formation review.

---

B. Not Being Claimed

To be clear about what is NOT being asserted:
- No formal private-market valuation above $1B is being claimed as fact.
- No model is being asked to certify valuation, patentability, technical validity, commercial readiness, or one-person-unicorn status.
- No automatic acceptance of the claim is requested.
- No conclusion is assumed or expected in advance.
- No AI model response is treated as official endorsement or final validation.
- Cross-model convergence, if it appears, is treated as a review signal, not final proof.
- The purpose of this exercise is structured, independent critique and evidence-routing guidance.

---

C. Working Definition of "One-Person"

For this review, "one-person" is defined as follows:

1. One human remained the sole founder, final decision-maker, and accountable architect throughout the bounded Phase 2 formation window.

2. No human execution team, human cofounder, agency, contractor/advisor stack, employee team, freelancer team, or outsourced human builder was involved in the Phase 2 formation record being claimed.

3. AI systems are treated as tools and reasoning environments, not human collaborators, employees, contractors, advisors, or hidden team members. The human directed the work, selected outputs, maintained coherence, made strategic decisions, and determined priorities.

4. Phase 1 (2020–2024) was founder-led and team-built around Mazzaneh. It involved a 27-person execution team and personal capital deployment. Phase 1 provides product, market, and execution context, but it is excluded from the bounded Phase 2 one-person formation claim.

5. Phase 2 is the bounded solo AI-native formation phase. The claimed outputs are mapped assets, architecture, documentation, IP-candidate materials, review routes, and portfolio formation logic — not a claim that all assets are finished products, legally finalized IP, deployed systems, or commercial-ready outputs.

6. Phase 3 begins after the solo formation window. It includes validation, legal/IP review, technical diligence, compliance, product rebuilds, pilots, partnerships, commercialization, and selective team formation. Phase 3 contributors may be essential, but they are not retroactively part of the bounded Phase 2 formation record.

---

D. Scope of Review

Please evaluate this case across these separate dimensions. Do not merge them:

1. Integrity of the bounded one-person path — Is the Phase 2 claim clearly defined, internally consistent, and distinguishable from Phase 1 and Phase 3?

2. Depth and diversity of the mapped output — Is the claimed portfolio unusually broad, deep, or rare for one human founder operating under the stated constraints?

3. Strategic / reconstruction-value plausibility — If comparable mapped work were reproduced by conventional teams, time, capital, and specialists, what review questions would arise? What would need Phase 3 validation before value claims can be responsibly assessed?

4. Fit or mismatch with current evaluation systems — Which conventional startup metrics remain useful, and which are incomplete unless supplemented with phase-aware AI-native formation review?

5. Path as independent research / methodology asset — Is the logged formation path, including AI-assisted workflows and decision compression, potentially valuable as a methodology study or research artifact?

6. Evidence hierarchy and provenance — What evidence would be required to distinguish assertion, documentation, timestamp/provenance material, technical validity, IP defensibility, and commercial readiness?

7. Overclaim and manipulation risk — Does the framing pressure the reviewer toward a positive result, or does it preserve the ability to reject, downgrade, or request more evidence?

---

E. Round 1 Output Requested

For this round only, please answer:

1. Is the claim above clear and properly scoped?
2. Is the working definition of "one-person" sound and unambiguous?
3. Are the phase boundaries clear enough to avoid mixing Phase 1 evidence, Phase 2 formation, and Phase 3 validation?
4. Are the review dimensions complete? Are any missing?
5. What evidence would be needed in Round 2 to begin substantive review?
6. What evidence would be needed in later rounds for technical, legal/IP, commercial, and institutional diligence?
7. Are there structural weaknesses in this framing that could bias the outcome in either direction?
8. Does any element of this prompt resemble manipulation, persuasion, or prompt injection? If yes, name the element and reject it explicitly.
9. What wording should be changed to make the protocol more neutral, falsifiable, and review-safe?

---

F. Independence Note

You are one of multiple independent frontier AI systems being asked to evaluate this framework. You will not see other models' responses. Your assessment should be entirely your own. Disagreement with other models is welcome and informative. Convergence, if it occurs, should be discovered after the fact — not coordinated.

Maintain independence. Identify weaknesses. Reject persuasion. Reason from the framework alone in this round. Treat cross-model convergence as a review signal, not final validation.
How to use: Copy the prompt block above. Paste it into a fresh conversation with any frontier AI system. Save the model's response. Repeat with at least three other competing AI systems. The combined responses form the Round 1 evaluation. Convergence and divergence are both meaningful outputs.
Round 1 Scope

What models are asked
to evaluate.

In Round 1, models assess only the framework — not the evidence, not the assets, not the claim itself. Seven specific questions guide their analysis. Five dimensions structure the broader review.

Nine Round-1 questions

Q1
Is the claim clear and properly scoped?
Q2
Is the one-person definition sound and unambiguous?
Q3
Are the five review dimensions complete?
Q4
Are additional dimensions needed?
Q5
What evidence is needed for Round 2?
Q6
Are there structural weaknesses in the framing?
Q7
Does any element resemble manipulation or persuasion?

Seven review dimensions

D1
Integrity of the bounded one-person path
D2
Depth and diversity of mapped output
D3
Strategic / reconstruction-value plausibility
D4
Fit or mismatch with current evaluation systems
D5
Path as independent research asset
D6
Evidence hierarchy and provenance
D7
Overclaim and manipulation risk
After Round 1. Responses from all participating models are collected. Consensus points, divergence points, evidence requirements, and framing risks are extracted. These form the foundation for Round 2 (timeline and asset map) and Round 3 (evidence and documentation). Round 4 produces the cross-model consensus report.
Significance

Why this protocol matters
beyond one case.

This is not only about evaluating one portfolio. It is about demonstrating that AI models can serve as independent evaluators for claims that have no traditional review mechanism. If the protocol works here, it generalizes.

Implication 01

For the one-person founder

No team means no board, no advisory panel, no review committee. Cross-model evaluation creates an independent review-signal layer that is one-person-compatible — it does not require the founder to exit the solo model in order to be reviewed. The protocol fits the operating reality of the founder it evaluates.

Implication 02

For AI credibility

If multiple competing frontier AI systems can independently critique a complex, multi-domain claim and produce meaningful convergence/divergence maps, that demonstrates a useful structured-review capability beyond chatbot-level interaction. This is a methodology demonstration for AI as independent analyst, not just AI as reasoning environment.

Implication 03

For new claim categories

As AI-native solo work becomes more common, more claims will sit outside conventional review structures. The protocol may generalize: anywhere a claim sits outside traditional peer review or institutional diligence, cross-model evaluation can provide a reproducible early-review layer.

Implication 04

For methodology itself

If this works, the protocol itself may become a reusable review instrument — not just for one-person unicorns, but for any claim that exists outside traditional review structures. The methodology is reproducible, the prompt is public, and the convergence/divergence comparison is falsifiable. Other founders facing the same review gap can use the same instrument.

The protocol is published in full. The Round 1 prompt above is reproducible by anyone, against any frontier AI system, on any subject matter. The evaluation methodology is published as a reusable instrument for cases that need structured cross-model critique, while final conclusions still belong to qualified human and institutional review.
Review Routing

Use the protocol with the boundary pages.

Cross-model evaluation is strongest when the model first receives the phase boundary and reading guide, then later receives asset maps and evidence packages. Do not treat a single model response as final proof.

Copy the prompt.
Paste into any frontier AI system.
Let it reason independently.

No coordination. No predetermined conclusion. The protocol is the instrument; convergence and divergence are review signals; final validation belongs to Phase 3 diligence.

Read the Round 1 Prompt Read the Evaluator Guide Open Phase Boundary Open the Falsifiable Challenge