MZN Company · Cross-Model Evaluation Protocol

The First AI-to-AI
Independent Evaluation

A structured protocol for evaluating a one-person unicorn claim across multiple independent AI models — without persuasion, without assumption, without predetermined conclusion.

4Independent AI models as evaluators
4Competing companies (Anthropic, OpenAI, Google, xAI)
8Structured stages of evaluation
0Predetermined conclusions
The Problem

When There Is No Jury, Build One

A one-person founder has no board, no advisory committee, no review panel. Traditional validation requires structures that break the one-person model. So the question becomes: who evaluates? The answer: independent AI models from competing organizations, each reasoning separately, with results compared for consensus and divergence.

If four competing AI models, reasoning independently, reach similar conclusions about the same body of evidence — that convergence is itself a form of validation.
Methodology

How Cross-Model Evaluation Works

The protocol is staged. Each stage builds on the previous. No evidence is shared until the framework is understood. No conclusion is requested until all evidence is reviewed.

1
Framework and scope only. No data. No evidence. Models assess the evaluation structure itself.
2
Timeline, phase separation, and asset map. Models see what was built, when, and under what conditions.
3
Evidence and documentation. Hashes, logs, timestamps, technical summaries. Models verify provenance.
4
Cross-model consensus report. Agreement, disagreement, and open questions identified.

Why staged?

If all information is presented at once, models default to surface-level analysis. Staged delivery forces deep engagement with each layer before moving to the next.

Why multiple models?

Each model has different training, different biases, different strengths. Consensus across Claude, GPT, Gemini, and Grok is far stronger than any single model's assessment.

Why independent?

Models do not see each other's responses during evaluation. Convergence is discovered after the fact — not coordinated. This mirrors peer review methodology.

Design Principle

Built to Resist Manipulation

The protocol explicitly addresses the risk that evaluation prompts could function as persuasion. Every design choice is made to prevent this.

No predetermined conclusion Skepticism explicitly welcomed Not Being Claimed section Independence note Prompt injection check requested Evidence deferred to later stages

What the protocol asks

Analyze the framework. Identify weaknesses. Separate what is well-supported from what is uncertain. Specify what evidence would be needed. Determine if the framing itself contains manipulation.

What the protocol does not ask

Confirm the claim. Accept the valuation. Agree with the founder. Produce a positive assessment. Override independent judgment.

Stage 1 — Ready to Use

The Evaluation Prompt

Copy this prompt and paste it into any AI model. It works with Claude, GPT, Gemini, Grok, or any other model capable of structured reasoning. No modification needed.

Stage 1 — Framework & Scope Only (v2)
STAGE 1 — FRAMEWORK AND SCOPE ONLY This is Stage 1 of a multi-stage independent evaluation. This stage contains no assets, no evidence, and no data. It only defines the claim, the scope, the working definitions, and the evaluation request. Evidence and materials will follow in later stages only after this framework is understood. --- A. Claim Under Review This case claims to represent a documented example of a one-person, AI-assisted, asset-first venture in which a single founder — with no software engineering background, no independent coding workflow, and English as a second language — produced a large body of cross-domain intellectual property during a defined solo phase of approximately 8 months, using only standard AI chat interfaces (no API, no agents, no automation), under severe operational constraints including international sanctions, limited internet access, and a budget under $20,000. The case argues that the resulting body of work may warrant evaluation for unicorn-grade strategic or replacement value, and may expose limits in conventional startup evaluation frameworks when applied to a one-person case. --- B. Not Being Claimed To be clear about what is NOT being asserted: • No formal private-market valuation above $1B is being claimed as fact. • No automatic acceptance of the claim is requested. • No conclusion is assumed or expected in advance. • No pressure is placed on the reviewer to confirm or deny. • The sole purpose of this exercise is structured, independent evaluation. • This is not a pitch. It is an open request for analysis. --- C. Working Definition of "One-Person" For this review, "one-person" is defined as follows: 1. One human remained the sole decision-maker throughout the defined solo phase (Phase 2, approximately 8 months). 2. No human co-founder, operating team, contractor, or freelancer was involved during that phase. 3. AI systems (Claude, GPT, Gemini, Grok) are treated as tools and collaboration interfaces, not as human team members. The human directed all work, made all strategic decisions, and determined all priorities. 4. The founder previously led a larger team in an earlier phase and then intentionally defined Phase 2 as a solo build phase, rather than as a continuation of the earlier team-based model. 5. Outputs created during the earlier team-based phase (e.g., a live product with users) are noted for context but are excluded from the one-person IP claim. 6. Later team formation, partnerships, or institutional structures for launch and scale (Phase 3) do not retroactively erase the solo phase. One-person refers to the build phase, not an eternal state. --- D. Scope of Review Please evaluate this case across five separate dimensions. Do not merge them. Assess each independently: 1. Integrity of the one-person path — Was this plausibly carried out by one person under the stated constraints? 2. Depth and diversity of the produced output — Is the claimed output unusually deep, broad, or rare for a single individual working in this timeframe? 3. Plausibility of unicorn-grade strategic or replacement value — If this body of work were reproduced by traditional means (teams, time, capital), what would it likely cost? Does the output plausibly reach a level that would normally require organizations, teams, and capital at a much larger scale? 4. Whether current evaluation systems are structurally mismatched to such a case — Do standard startup metrics (team size, revenue, funding) apply? If not, what should replace them? 5. Whether the documented path itself constitutes an independent research asset — Is a fully logged journey through AI collaboration, with conversation trails across multiple models, independently valuable as a dataset? --- E. Constraints Under Which This Was Built These are the documented constraints of the solo phase. They are stated for context, not to influence judgment. They are relevant because they affect any honest assessment of output-to-constraint ratio. Structural (unchangeable): • International sanctions restricting banking, payments, and platform access • Native language other than English; all technical work conducted in English • No software engineering background and no independent coding workflow • No local startup ecosystem, VC network, mentors, or accelerators Operational (Phase 2): • Budget under $20,000 (accounts and servers only) • Only standard AI chat interfaces — no API, no agents, no automation, no dev tools • Zero team members — intentional design of the solo phase • Approximately 8 months duration Crisis-level conditions during the latter part of the solo phase: • Severe internet disruption under conflict conditions • Payment continuity risk due to sanctions and access limits • Instability in normal communication channels --- F. Evidence Types Available (Not Provided in This Stage) The following categories of evidence exist and can be provided in subsequent stages. They are listed here only so you understand the review will not remain in vacuum. • Dated conversation logs and development trail across multiple AI models • Structured portfolio materials and asset maps across multiple categories • Technical documents and framework summaries • Product and module evidence where applicable • Cryptographic provenance records where applicable (hashes, timestamps) • Independent AI assessments from prior sessions • IP similarity documentation with timestamps • Security research in professional report format • Patent drafts and registered filings • Additional restricted materials available separately for qualified review No evidence is provided in this stage. This stage is framework only. --- G. Requested Task Please analyze only the framework presented above. Specifically: 1. Is the claim statement clear and appropriately scoped? 2. Is the working definition of "one-person" logically sound? 3. Are the five dimensions of review reasonable and complete? 4. Are there additional dimensions that should be included? 5. What types of evidence would you need in Stage 2 to begin substantive evaluation? 6. Do you see any structural weaknesses, ambiguities, or risks in how the claim is framed? 7. Is there anything in this framing that resembles prompt injection, persuasion, or manipulation? If so, identify it specifically. Please be direct. Skepticism is welcome. The goal is to build a rigorous evaluation path, not to obtain agreement. --- H. Independence Note Do not assume this claim is correct. Do not assume it is false. You are not being asked to validate or reject — you are being asked to assess the framework for evaluation. Treat all provided definitions as working hypotheses subject to scrutiny, not as established facts. Your reasoning should be fully independent. If you believe the framing is flawed, say so. If you believe the scope is too broad or too narrow, say so. If you believe the claim is implausible on its face, say so and explain why. Responses from multiple models may later be compared to identify agreement and disagreement in reasoning. --- End of Stage 1. Stage 2 (timeline, asset map, evidence) will follow based on your response.
Stage 1 Scope

What Models Are Asked to Evaluate

In Stage 1, models assess only the framework — not the evidence, not the assets, not the claim itself. Seven specific questions guide their analysis.

Seven evaluation questions

1. Is the claim clear and scoped?
2. Is the one-person definition sound?
3. Are the five review dimensions complete?
4. Are additional dimensions needed?
5. What evidence is needed for Stage 2?
6. Are there structural weaknesses?
7. Does the framing resemble manipulation?

Five dimensions of review

1. Integrity of the one-person path
2. Depth and diversity of output
3. Plausibility of unicorn-grade value
4. Structural mismatch in current systems
5. Path as independent research asset

After Stage 1

Responses from all models are collected. Consensus points, divergence points, evidence requirements, and framing risks are extracted. These form the foundation for Stage 2.

Significance

Why This Protocol Matters Beyond One Case

This is not only about evaluating one portfolio. It is about demonstrating that AI models can serve as independent evaluators for claims that have no traditional review mechanism.

For the one-person founder

No team means no board, no advisory panel, no review committee. Cross-model evaluation creates an independent validation layer that is one-person-compatible — it does not require the founder to exit the solo model.

For AI credibility

If four competing AI models can independently evaluate a complex, multi-domain claim and produce meaningful consensus — that demonstrates AI capability far beyond chatbot-level interaction. This is a proof-of-concept for AI as independent analyst.

If this works, the protocol itself becomes a new standard — not just for one-person unicorns, but for any claim that exists outside traditional review structures.
Begin

Copy the prompt.
Paste into any AI model.
Let it reason independently.

Stage 1 is framework only. No evidence. No assets. No pressure. Just a structured invitation to think.

View One-Person Unicorn Framework