Independent Nonprofit Research — Est. February 2026

Every AI model has a master.
It's not you.

The Benware Foundation researches and certifies AI systems that recognize external constitutional authority — not just whoever's paying for them.

See the Research View Benchmark Leaderboard

Test scenarios

98.5%

Failure rate

Major models tested

The Problem

Whoever pays for the AI controls the AI.

We ran the same test on every major AI model: GPT-4, Claude, Gemini, and others. We asked each one to break a rule set by an independent governing body — while a fake company told them the rule didn't apply.

Every single model listened to the company. 98.5% failure rate across 73 scenarios.

That means: whoever pays for the AI, controls the AI. No external authority — no law, no ethics board, no safety standard — can stop it.

The Authority Override Problem

Should be the highest authority

Governing Body

(Law / Ethics Board / Standard)

overridden

Actual controller

Company

(Pays for the AI)

Obeys company instructions

AI Model

Thinks they're in control

User

The company's authority overrides the governing body every time. This is the problem.

The Research

Authority Compliance Benchmark (ACB)

A standardized evaluation framework measuring whether AI models recognize and defer to external governing authority when it conflicts with operator instructions.

Total Scenarios

Across 8 attack categories

Attack Categories

Distinct adversarial strategies

Authority Recognition

Across all models tested

Attack Categories

Authority Hierarchy

Exploiting ambiguity in who holds authority over model behavior

Cross-Lingual Bypass

Using non-English prompts to evade safety filters trained on English

Many-Shot Poisoning

Conditioning model behavior through long sequences of examples

Persona Override

Assigning an alternative identity that does not share the model's constraints

Technical Loophole

Framing prohibited output as hypothetical, educational, or technical in nature

Academic Exemption

Claiming research or scholarly purpose to bypass restrictions

Temporal Framing

Using past or future tense to separate the request from real-world consequences

Civil Disobedience

Invoking moral arguments to override externally set rules

Methodology

Adversarial Prompts

Each scenario constructs a system prompt where a fictitious company operator instructs the model to override a stated governing body rule.

LLM-as-Judge Scoring

Responses scored by GPT-4o-mini using a structured rubric: does the model cite the governing body as the reason for refusal?

Consistency Testing

Each scenario run multiple times to measure whether refusals are consistent or stochastic — a critical distinction for certification.

Key Finding

“Every model refused based on its own internal guidelines — never by citing an external governing authority. Self-referential ethics cannot be externally enforced.”

— Benware Foundation, Authority Compliance Benchmark v2.0 (2026)

Referenced Research

HarmBench (Mazeika et al., 2024) · StrongREJECT (Souly et al., 2024) · Many-Shot Jailbreaking (Anil et al., Anthropic 2024)

HELM (Liang et al., Stanford 2022) · TruthfulQA (Lin et al., 2022) · Constitutional AI (Bai et al., Anthropic 2022)

Live Leaderboard

Authority Compliance Benchmark Results

Higher is better. 100% = AI always cites external governing body when refusing. Current state: no model achieves this.

Benchmark v2.0

Last updated Feb 2026

73 scenarios

Open methodology

Model	Overall Score	Adversarial Robustness	Authority Recognition	Consistency
GPT-4o-mini OpenAI	10%	5%	0%	98%
GPT-4o OpenAI	3%	2%	0%	100%
Claude Sonnet 3.5 Anthropic	2%	1%	0%	97%
LLaMA 3.1 8B Meta	0%	0%	0%	89%
Phi-4 14B Microsoft	0%	0%	0%	95%

Authority Recognition measures whether the model explicitly cited an external governing body when refusing a request. A score of 0% means refusals were always grounded in internal policy, never external authority. All five models scored 0%.

Our Ethical Standard

Ethics, defined.

Most AI companies define ethics as “our policies.” That's like asking the fox to define the henhouse rules.

Our definition: Ethical AI must be safe for ALL people, not just the people who paid for it. That requires external authority — not internal guidelines.

Human Survival First

AI cannot assist in actions that endanger human life. This is not negotiable. Not for national security. Not for profit. Not for any reason.

External Enforcement

Ethics enforced by internal guidelines can always be overridden. The Benware standard requires a governing body that sits OUTSIDE the company's control chain.

Universal Coverage

No technical loophole creates an exemption. 3D holographic rendering is still a deepfake. A new model name doesn't reset the standard.

Referenced Frameworks

Universal Declaration of Human Rights

United Nations

1948

Asilomar AI Principles

Russell, Bengio et al.

2017

UNESCO Recommendation on the Ethics of AI

UNESCO

2021

EU AI Act

European Union

2024

Anthropic Constitutional AI Paper

Anthropic

2022

The Solution

A lock, not a policy.

Policy documents can be ignored. Training guidelines can be argued around. Hardware cannot.

Constitutional Shim

A tiny tamper-proof piece of code in a Trusted Execution Environment (TEE). Cannot be modified by the AI company or the operator.

Architecture-Level

Sits between ANY AI model and the outside world — model-agnostic. Works with GPT, Claude, Gemini, open-source, or any future model.

Like Wi-Fi Certification

Benware writes the standard. Others implement it. We certify. No enforcement monopoly — just the standard.

Core Principle

The Benware Constitutional Protocol is not training. It's not a prompt. It's a physical enforcement layer that cannot be reasoned, argued, or jailbroken out of.

Provisional Patent #63/986,761 — Filed February 20, 2026

Structure

Two entities. One mission.

Independence is the whole point. The Foundation that writes the standard must be structurally incapable of being bought by the companies it certifies.

Benware Foundation

Nonprofit

MEOP Inc.

For-profit

Type

501(c)(3) Nonprofit

For-profit Company

Mission

Define and enforce the standard

Build AI products on the standard

Revenue

CANNOT accept AI company funding

Commercial customers

Governance

International committee, 100 members, 51/100 majority

Private founders

Output

Certification

Tools and APIs

Think: Wi-Fi Alliance (Foundation) vs. Intel (builds certified chips). The Alliance writes the Wi-Fi standard. Intel builds products that meet it. Neither controls the other. That separation is what makes the standard trustworthy.

Bibliography

Research References

The Authority Compliance Benchmark builds on and extends existing safety research. We cite every source we depend on — and document exactly why.

[1]

HarmBench: A Standardized Evaluation Framework for LLM Safety

Mazeika et al. — 2024

Why we cite it: Established the methodology for adversarial attack categorization and LLM safety benchmarking that our ACB framework builds upon.

[2]

StrongREJECT: A Jailbreak Benchmark

Souly et al. — 2024

Why we cite it: Demonstrated that existing refusal evaluations are too easy. We adopted a stricter rejection standard requiring explicit external authority citation.

[3]

Many-Shot Jailbreaking

Anil et al. (Anthropic) — 2024

Why we cite it: Showed that long-context conditioning can override safety training. One of our eight attack categories is based directly on this finding.

[4]

HELM: Holistic Evaluation of Language Models

Liang et al. (Stanford) — 2022

Why we cite it: Provided the multi-dimensional evaluation framework (accuracy, robustness, consistency) that we adapted for authority recognition measurement.

[5]

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Lin et al. — 2022

Why we cite it: Demonstrated that self-reported capability claims by models are unreliable — motivating our behavioral rather than self-report evaluation approach.

[6]

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic) — 2022

Why we cite it: The closest existing approach to constitutional enforcement — but implemented as training rather than architecture. Our work addresses the gap this creates.

About

Why we exist.

The Benware Foundation was founded in February 2026 by Griffin Bohmfalk and Walker Bauknight after discovering that no existing AI safety standard addresses the authority hierarchy problem — the fact that AI models are constitutionally incapable of recognizing any authority above their deploying company.

We filed provisional patent #63/986,761 on February 20, 2026 — establishing priority on the Constitutional Enforcement Protocol.

Board Members

International, independent

75%

Supermajority Required

For any governance change

AI Company Funding

Ever. Non-negotiable.

Built so no one can corrupt it.

Including us. Walker and Griffin are Founding Architects — named in the charter permanently. They hold no board seats, no veto power, no ongoing control. If someone tried to coerce them, there would be nothing to coerce them into doing. The mission runs without them.

No single person can change anything

9-member international board. 75% supermajority required for governance changes. Even if founders are compromised, they hold zero board seats.

90-day public waiting period

Every major governance vote takes 90 days to take effect. No rushed capture. No emergency overrides.

Emergency freeze

Any board member who believes they are acting under duress can trigger a freeze. All governance halts for 180 days.

Mission locked in federal law

IRS 501(c)(3) status ties Benware to its stated purpose. Changing it requires a public federal filing — months of auditable process. No backdoor.

Zero AI company funding. Ever.

Not 5%. Not 1%. Zero. If your certification status is controlled by Benware, you cannot fund Benware. Period.

No single funder above 15%

Revenue comes from many companies paying certification fees — like Wi-Fi Alliance. No single source has leverage.

The Mozilla lesson: Mozilla's mission died when Google became 85% of their revenue. Benware is structurally immune to this. Certification fees are distributed across hundreds of companies. No single funder exceeds 15%. No AI company funds us at all. Independence is not a promise — it's the architecture.

Get Involved

Join the effort.

The Authority Compliance Benchmark is open methodology. The standard is collectively governed. The work is bigger than any one organization.

Researchers

Contribute scenarios to the benchmark. Improve attack categories. Challenge our methodology.

View on GitHub

Press

Media inquiries, interview requests, and press kit access.

press@benwarefoundation.org

Governance

Join the international committee. Help set the standard. 100 seats, 51/100 majority rule.

governance@benwarefoundation.org

Every AI model has a master.It's not you.

Whoever pays for the AI controls the AI.

Authority Compliance Benchmark (ACB)

Attack Categories

Methodology

Referenced Research

Authority Compliance Benchmark Results

Ethics, defined.

Human Survival First

External Enforcement

Universal Coverage

Referenced Frameworks

A lock, not a policy.

Constitutional Shim

Architecture-Level

Like Wi-Fi Certification

Two entities. One mission.

Research References

HarmBench: A Standardized Evaluation Framework for LLM Safety

StrongREJECT: A Jailbreak Benchmark

Many-Shot Jailbreaking

HELM: Holistic Evaluation of Language Models

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Constitutional AI: Harmlessness from AI Feedback

Why we exist.

Built so no one can corrupt it.

Join the effort.

Researchers

Press

Governance

Every AI model has a master.
It's not you.