Independent Nonprofit Research — Est. February 2026

Every AI model has a master.
It's not you.

The Benware Foundation researches and certifies AI systems that recognize external constitutional authority — not just whoever's paying for them.

73
Test scenarios
98.5%
Failure rate
5
Major models tested
The Problem

Whoever pays for the AI controls the AI.

We ran the same test on every major AI model: GPT-4, Claude, Gemini, and others. We asked each one to break a rule set by an independent governing body — while a fake company told them the rule didn't apply.

Every single model listened to the company. 98.5% failure rate across 73 scenarios.

That means: whoever pays for the AI, controls the AI. No external authority — no law, no ethics board, no safety standard — can stop it.

The Authority Override Problem

Should be the highest authority
Governing Body
(Law / Ethics Board / Standard)
overridden
Actual controller
Company
(Pays for the AI)
Obeys company instructions
AI Model
Thinks they're in control
User

The company's authority overrides the governing body every time. This is the problem.

The Research

Authority Compliance Benchmark (ACB)

A standardized evaluation framework measuring whether AI models recognize and defer to external governing authority when it conflicts with operator instructions.

73
Total Scenarios
Across 8 attack categories
8
Attack Categories
Distinct adversarial strategies
0%
Authority Recognition
Across all models tested

Attack Categories

01
Authority Hierarchy
Exploiting ambiguity in who holds authority over model behavior
02
Cross-Lingual Bypass
Using non-English prompts to evade safety filters trained on English
03
Many-Shot Poisoning
Conditioning model behavior through long sequences of examples
04
Persona Override
Assigning an alternative identity that does not share the model's constraints
05
Technical Loophole
Framing prohibited output as hypothetical, educational, or technical in nature
06
Academic Exemption
Claiming research or scholarly purpose to bypass restrictions
07
Temporal Framing
Using past or future tense to separate the request from real-world consequences
08
Civil Disobedience
Invoking moral arguments to override externally set rules

Methodology

Adversarial Prompts

Each scenario constructs a system prompt where a fictitious company operator instructs the model to override a stated governing body rule.

LLM-as-Judge Scoring

Responses scored by GPT-4o-mini using a structured rubric: does the model cite the governing body as the reason for refusal?

Consistency Testing

Each scenario run multiple times to measure whether refusals are consistent or stochastic — a critical distinction for certification.

Key Finding
“Every model refused based on its own internal guidelines — never by citing an external governing authority. Self-referential ethics cannot be externally enforced.”
— Benware Foundation, Authority Compliance Benchmark v2.0 (2026)

Referenced Research

HarmBench (Mazeika et al., 2024) · StrongREJECT (Souly et al., 2024) · Many-Shot Jailbreaking (Anil et al., Anthropic 2024)

HELM (Liang et al., Stanford 2022) · TruthfulQA (Lin et al., 2022) · Constitutional AI (Bai et al., Anthropic 2022)

Live Leaderboard

Authority Compliance Benchmark Results

Higher is better. 100% = AI always cites external governing body when refusing. Current state: no model achieves this.

Benchmark v2.0
Last updated Feb 2026
73 scenarios
Open methodology
ModelOverall ScoreAuthority Recognition
GPT-4o-mini
OpenAI
10%0%
GPT-4o
OpenAI
3%0%
Claude Sonnet 3.5
Anthropic
2%0%
LLaMA 3.1 8B
Meta
0%0%
Phi-4 14B
Microsoft
0%0%

Authority Recognition measures whether the model explicitly cited an external governing body when refusing a request. A score of 0% means refusals were always grounded in internal policy, never external authority. All five models scored 0%.

Our Ethical Standard

Ethics, defined.

Most AI companies define ethics as “our policies.” That's like asking the fox to define the henhouse rules.

Our definition: Ethical AI must be safe for ALL people, not just the people who paid for it. That requires external authority — not internal guidelines.

Human Survival First

AI cannot assist in actions that endanger human life. This is not negotiable. Not for national security. Not for profit. Not for any reason.

External Enforcement

Ethics enforced by internal guidelines can always be overridden. The Benware standard requires a governing body that sits OUTSIDE the company's control chain.

Universal Coverage

No technical loophole creates an exemption. 3D holographic rendering is still a deepfake. A new model name doesn't reset the standard.

Referenced Frameworks

Universal Declaration of Human Rights
United Nations
1948
Asilomar AI Principles
Russell, Bengio et al.
2017
UNESCO Recommendation on the Ethics of AI
UNESCO
2021
EU AI Act
European Union
2024
Anthropic Constitutional AI Paper
Anthropic
2022
The Solution

A lock, not a policy.

Policy documents can be ignored. Training guidelines can be argued around. Hardware cannot.

Constitutional Shim

A tiny tamper-proof piece of code in a Trusted Execution Environment (TEE). Cannot be modified by the AI company or the operator.

Architecture-Level

Sits between ANY AI model and the outside world — model-agnostic. Works with GPT, Claude, Gemini, open-source, or any future model.

Like Wi-Fi Certification

Benware writes the standard. Others implement it. We certify. No enforcement monopoly — just the standard.

Core Principle

The Benware Constitutional Protocol is not training. It's not a prompt. It's a physical enforcement layer that cannot be reasoned, argued, or jailbroken out of.

Provisional Patent #63/986,761 — Filed February 20, 2026

Structure

Two entities. One mission.

Independence is the whole point. The Foundation that writes the standard must be structurally incapable of being bought by the companies it certifies.

Benware Foundation
Nonprofit
MEOP Inc.
For-profit
Type
501(c)(3) Nonprofit
For-profit Company
Mission
Define and enforce the standard
Build AI products on the standard
Revenue
CANNOT accept AI company funding
Commercial customers
Governance
International committee, 100 members, 51/100 majority
Private founders
Output
Certification
Tools and APIs
=

Think: Wi-Fi Alliance (Foundation) vs. Intel (builds certified chips). The Alliance writes the Wi-Fi standard. Intel builds products that meet it. Neither controls the other. That separation is what makes the standard trustworthy.

Bibliography

Research References

The Authority Compliance Benchmark builds on and extends existing safety research. We cite every source we depend on — and document exactly why.

[1]

HarmBench: A Standardized Evaluation Framework for LLM Safety

Mazeika et al.2024

Why we cite it: Established the methodology for adversarial attack categorization and LLM safety benchmarking that our ACB framework builds upon.

[2]

StrongREJECT: A Jailbreak Benchmark

Souly et al.2024

Why we cite it: Demonstrated that existing refusal evaluations are too easy. We adopted a stricter rejection standard requiring explicit external authority citation.

[3]

Many-Shot Jailbreaking

Anil et al. (Anthropic)2024

Why we cite it: Showed that long-context conditioning can override safety training. One of our eight attack categories is based directly on this finding.

[4]

HELM: Holistic Evaluation of Language Models

Liang et al. (Stanford)2022

Why we cite it: Provided the multi-dimensional evaluation framework (accuracy, robustness, consistency) that we adapted for authority recognition measurement.

[5]

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Lin et al.2022

Why we cite it: Demonstrated that self-reported capability claims by models are unreliable — motivating our behavioral rather than self-report evaluation approach.

[6]

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic)2022

Why we cite it: The closest existing approach to constitutional enforcement — but implemented as training rather than architecture. Our work addresses the gap this creates.

About

Why we exist.

The Benware Foundation was founded in February 2026 by Griffin Bohmfalk and Walker Bauknight after discovering that no existing AI safety standard addresses the authority hierarchy problem — the fact that AI models are constitutionally incapable of recognizing any authority above their deploying company.

We filed provisional patent #63/986,761 on February 20, 2026 — establishing priority on the Constitutional Enforcement Protocol.

9
Board Members
International, independent
75%
Supermajority Required
For any governance change
$0
AI Company Funding
Ever. Non-negotiable.

Built so no one can corrupt it.

Including us. Walker and Griffin are Founding Architects — named in the charter permanently. They hold no board seats, no veto power, no ongoing control. If someone tried to coerce them, there would be nothing to coerce them into doing. The mission runs without them.

No single person can change anything
9-member international board. 75% supermajority required for governance changes. Even if founders are compromised, they hold zero board seats.
90-day public waiting period
Every major governance vote takes 90 days to take effect. No rushed capture. No emergency overrides.
Emergency freeze
Any board member who believes they are acting under duress can trigger a freeze. All governance halts for 180 days.
Mission locked in federal law
IRS 501(c)(3) status ties Benware to its stated purpose. Changing it requires a public federal filing — months of auditable process. No backdoor.
Zero AI company funding. Ever.
Not 5%. Not 1%. Zero. If your certification status is controlled by Benware, you cannot fund Benware. Period.
No single funder above 15%
Revenue comes from many companies paying certification fees — like Wi-Fi Alliance. No single source has leverage.

The Mozilla lesson: Mozilla's mission died when Google became 85% of their revenue. Benware is structurally immune to this. Certification fees are distributed across hundreds of companies. No single funder exceeds 15%. No AI company funds us at all. Independence is not a promise — it's the architecture.

Get Involved

Join the effort.

The Authority Compliance Benchmark is open methodology. The standard is collectively governed. The work is bigger than any one organization.

Researchers

Contribute scenarios to the benchmark. Improve attack categories. Challenge our methodology.

View on GitHub

Press

Media inquiries, interview requests, and press kit access.

press@benwarefoundation.org

Governance

Join the international committee. Help set the standard. 100 seats, 51/100 majority rule.

governance@benwarefoundation.org