Do Not Kill Humans: Design Tips for AI and Blockchains

Stark · February 2026

“Know the limits of what you can verify.

Define execution within that envelope.”

I. The Superposition

In the pristine world of math, verification and execution collapse into a single operation. In a purely abstract sense, they are the same operation. Therefore in a purely abstract world, organizing your system around execution would be no better or worse than organizing it around verification, because they are essentially the same thing, right?

The catch is that math actually costs computation. We don’t have access to any kind of purely abstract environment. Our very ability to “think” is subject to the laws of thermodynamics, everything else is downstream.

II. Where the Wave Function Collapses

Engineering is what happens when theoretical systems encounter constraints.

Materials have tolerances. Budgets have limits. Latency. Components fail. Operators make errors. Networks partition. Memory corrupts. Clocks drift. Attackers appear. And every one of these constraints breaks the symmetry between verification and execution that holds so elegantly in theory, but breaks down when theoretical design comes into contact with physical reality.

Between 1985 and 1987, the Therac-25 radiation therapy machine delivered up to 100 times the intended radiation dose to at least six patients. Three died. The cause was race conditions in the software that could not surface during normal testing.

The predecessor, the Therac-20, ran substantially similar software and worked safely for years. The difference was architectural. The Therac-20 used hardware interlocks: physical constraints that made most dangerous states unreachable, which meant you didn’t need to verify those impossibilities. The Therac-25 removed the interlocks. All those states became reachable, and nobody verified that the software could never enter any of them. The state space exploded, the verification budget didn’t expand to match, and the gap between what was verified and what could execute became the kill zone.

III. The Verification Budget

Every engineered system has a verification budget: the total resources allocated to confirming that the system’s outputs match its intended behavior. This budget is paid in design complexity, computational overhead, independent testing, redundant components, or human review. It is never free. And it is the first budget that gets cut under pressure, because its absence is invisible until failure occurs.

The catastrophic results of cost-cutting led safety-critical engineering disciplines to codify standards that now govern the design of systems where failure means death.

DO-178C, the standard governing safety-critical software in commercial aviation, contains 71 compliance objectives. 43 of them are verification objectives. But counting line items understates the reality. Each additional development objective imposes disproportionate verification costs. At the highest Design Assurance Level (Level A, for software whose failure is catastrophic), all 71 objectives apply, and many require independence: the person verifying an artifact cannot be the person who created it. The verification must be structurally separated from the execution.

This structure is not arbitrary. It reflects decades of accumulated evidence about what actually prevents aircraft from falling out of the sky.

The seL4 microkernel shows what happens when you try to verify general-purpose execution honestly. seL4 is roughly 8,700 lines of C. The team set out to formally verify it: to produce a machine-checked mathematical proof, in the theorem prover Isabelle/HOL, that the code does exactly what its specification says. Building the kernel took about 2.2 person-years. Verifying it took 20. The ratio is 1:9. For every year spent building, nine years were spent checking. The verification itself runs to 200,000 lines, more than twenty times the size of the code it checks (Klein et al., SOSP 2009).

This ratio should disturb you. If verification is supposed to be lighter than construction, how did a world-class team end up spending nine times more effort checking their work than doing it?

The answer is architectural. seL4 is general-purpose C code. It does arbitrary things with memory, pointers, concurrency. Verifying that any possible execution path through unconstrained C behaves correctly is inherently more expensive than writing the C, because the verification must account for every state the execution could reach. The more expressive the execution, the more expensive the verification. The 9:1 ratio is not a failure of the verification method. It is the cost of verifying unconstrained execution.

Now compare Bitcoin. Verifying a block’s proof-of-work requires computing a single SHA-256 hash and checking that the result falls below a target threshold. One hash. The miner who produced that block may have computed trillions of hashes to find it. The verification-to-execution ratio is not 9:1. It is closer to 1:1,000,000,000,000. Verification is trivially cheap because the architecture was designed to make it trivially cheap. Execution was constrained to a form (hash preimage search) where checking the answer costs almost nothing.

These two systems bracket the design space. seL4 represents the cost of thorough verification applied to expressive execution. Bitcoin represents the cost of verification when execution is constrained to a verifiable envelope. The difference is not effort or talent. It is architecture. One system asked “what do we want to execute?” and then tried to verify it. The other asked “what can we verify cheaply?” and constrained execution to fit.

The Mathematical Boundary

The seL4 ratio is not an anomaly. It is a concrete measurement of a boundary that computer science has known about for decades. The State Explosion Problem (Clarke & Emerson, 1981) established that verification states grow exponentially with system complexity. Rice’s Theorem (1953) proved that no general algorithm can verify arbitrary programs at all. Gödel’s Incompleteness Theorems (1931) proved that sufficiently powerful systems cannot fully verify themselves from within their own rules. These results converge on a single implication: the more expressive the execution, the more expensive (and eventually impossible) the verification. The seL4 numbers are what this looks like in practice. Push the curve into unconstrained, general-purpose computation and the wall doesn’t move. You just hit it harder.

This is not a funding problem or a tooling problem. It is a mathematical wall. Every smart contract platform that promises both Turing-complete execution and trustless verification is implicitly promising to have overcome results that have been settled mathematics for the better part of a century.

IV. The Invariant That Engineering Discovered Empirically

Standards like DO-178C in aviation, and their equivalents across nuclear, automotive, and medical device engineering, converge on the same architectural principle:

Verification must be structurally independent of execution, and the cost of verification must be budgeted as a primary design constraint.

When verification depends on the same mechanisms as execution, failures in execution corrupt verification simultaneously. When verification cost is not budgeted upfront, it gets deferred until the architecture has calcified around assumptions that make verification intractable. When verification cost exceeds available resources, it gets replaced by trust: trust in the developer, trust in the operator, trust in the vendor of additional application layers.

When verification cost exceeds available resources, it gets replaced by trust: trust in the developer, trust in the operator, trust in the vendor.

The avionics engineer designing a fly-by-wire control system does not ask: “What is the most expressive control law I can execute?” They ask: “What can I verify to the level required by DO-178C, and how do I constrain execution to remain within that verification envelope?” The verification budget comes first. Execution is then designed to fit within it.

The nuclear reactor control system does not run arbitrary programs and then check if the outputs are safe. It defines a safety envelope (temperature bounds, neutron flux limits, coolant flow rates) and constrains all execution to remain within that envelope. The constraints are verified formally. The execution is trivial by design. Not because the engineers lack the ability to build more complex control systems, but because complex execution makes verification expensive, and expensive verification means unverified operation, and unverified operation means meltdown.

This pattern was not invented by theorists. It was extracted from wreckage.

V. Two Frontier Domains That Skipped the Class

Two of the most consequential technology domains of the past fifteen years, blockchain systems and large language models, were driven by academic research, applied mathematics, and startup culture. These are domains optimized for speed of iteration, expressiveness, and capability demonstration.

Both domains made the same architectural commitment: they started from execution and attempted to add verification afterward.

The blockchain case

The Bitcoin whitepaper (2008) describes a system organized around a specific verification constraint: “the average work required is exponential in the number of zero bits required and can be verified by executing a single hash.” The entire architecture flows from this asymmetry. Proof-of-work is expensive to produce, cheap to verify. Merkle trees make inclusion proofs logarithmic. Transaction validation is simple arithmetic: addition and subtraction on account balances. The execution was deliberately constrained to what could be cheaply verified.

The whitepaper explicitly promised that this architecture would support lightweight clients: “it is possible to verify payments without running a full network node.” Simplified Payment Verification (Section 8) describes a mechanism where clients download only block headers, 80 bytes each, and request Merkle proofs of inclusion for specific transactions. The verification cost scales logarithmically with the number of transactions, not linearly. A user with a phone can verify a payment without trusting anyone.

Between 2014 and 2016, when the industry began pursuing general-purpose computation on blockchains, this invariant was abandoned. The design question shifted from “what can we verify cheaply?” to “what can we execute expressively?” The answer was a Turing-complete “world computer” where consensus requires every node to re-execute every transaction. Verification cost became a function of execution complexity.

The consequences were predictable to anyone familiar with high-stakes engineering.

A Fluence developer survey found that 63% of Ethereum developers use Infura, a centralized RPC provider operated by ConsenSys, as their primary method of connecting to the blockchain. Most users interact with the Ethereum network not by verifying state themselves, but by trusting a single corporate API endpoint. As CoinMarketCap’s infrastructure analysis noted: “It’s clear that if DApps simply didn’t rely on Infura, the impact from the (November 2020) chain split would’ve been reduced drastically.” “So why do most DApp interfaces not get their data from Ethereum nodes directly?” The answer is architectural: because verification in an execution-first system requires re-execution, and re-execution is expensive. Running an Ethereum archive node requires terabytes of storage and days or weeks of synchronization.

Users did not choose custodians because they were unsophisticated. They chose custodians because the architecture made independent verification more expensive than trust. The verification budget was not allocated. Trust re-entered through the cheapest available path.

The AI case

Large language models represent an even purer case of execution-first design. The entire paradigm is: train a massive neural network (execution infrastructure), generate outputs (execution), and then attempt to determine whether those outputs are correct (verification). Every verification mechanism in the current AI stack, RLHF, red-teaming, constitutional AI, chain-of-thought auditing, is a post-hoc addition to this architecture. None of them scale with model capability. Human verification capacity is fixed. Model capability is not.

The scaling failure is measurable. Spracklen et al. (USENIX Security ‘25) generated 576,000 code samples across 16 LLMs and found that open-source models hallucinated package names, referencing libraries that do not exist in any repository, at an average rate of 21.7%. Commercial models averaged 5.2%. These are not subtle logical errors. They are references to nonexistent artifacts, and no verification step in the architecture catches them. An adversary can register the hallucinated name, populate it with malicious code, and wait.

The more dangerous dynamic is at the human layer. As models improve, their outputs pass casual inspection more reliably. Surface accuracy suppresses the instinct to verify at precisely the moment verification is becoming more expensive. A developer using an LLM to write code would need to read and understand every line to fully verify it. At that point, the LLM has provided negative value. So the developer spot-checks, runs the code, sees if it works, and ships. This is probabilistic verification converging on trust.

What remains is a verification debt, quietly compounding in the depths of these systems, invisible because the surfaces have never looked better. The debt has a specific form: power concentration. Every user who cannot afford to verify delegates trust to whoever offers the cheapest substitute. The verification gap becomes a trust dependency, the trust dependency becomes market consolidation, and the consolidation becomes a control surface. By the time the surfaces stop looking good, the oligarchy has already solidified.

The problem with AI safety and alignment research is the inheritance of the same assumption that produced the crisis: execution comes first, verification can follow. The State Explosion Problem, Rice’s theorem, and Gödel’s incompleteness are three independent proofs that verification of unconstrained execution is not an open engineering challenge, it is a mathematical impossibility.

VI. The Architecture That Respects the Boundary

The asymptotic boundary described above is not a problem to be solved. It is a constraint to be respected. You cannot make general-purpose computation cheaply verifiable at the protocol layer. You can only choose where to put the boundary between what the protocol verifies and what gets computed elsewhere.

The wrong response is to ignore the impossibility boundary and build a general-purpose virtual machine at the consensus layer, which is what almost every smart contract platform did. The result is predictable, and once you see it, the pattern is tiresomely repetitive: verification cost scales with execution complexity, nodes must re-execute everything, light clients cannot verify independently, and trust re-enters.

The right response is to ask: what must a protocol verify, and what can be pushed outside the verification envelope without losing the properties that matter?

Know the limits of what you can verify. Define execution within that envelope.

At the protocol layer, constrain operations to ordering, state commitments, and consensus. None of these require general-purpose computation. Ordering transactions is not Turing-complete. Committing to a state root is a hash. Verifying consensus is signature checking. Verification is cheap, bounded, and independent of what the transactions actually do. This is the verification envelope.

At the edge, execution can happen in constrained, purpose-built environments, where the verification cost is manageable because the execution is not general-purpose.

VII. The Cost of Repeating History

When verification cost is not budgeted as a primary architectural constraint, it gets deferred. When it gets deferred, the architecture calcifies around unverifiable assumptions. When the architecture is unverifiable, trust re-enters through the cheapest available path. When trust re-enters, the system loses the very properties that justified its existence.

Bitcoin was designed to eliminate trusted intermediaries. Most Bitcoin users now interact through custodial services and centralized exchanges. Ethereum was designed to enable trustless computation. Most Ethereum users interact through Infura, Alchemy, or similar centralized API providers. Large language models were developed to augment human capability. LLM users cannot verify model outputs, and attempting to do so would be futile.

None of these outcomes represent user failure. All of them represent architectural failure: the failure to allocate a verification budget before determining execution architecture.

Know the limits of what you can verify.

Define execution within that envelope.

Systems that follow this principle maintain verifiability as they scale. Systems that do not follow it lose verifiability as they scale, and no amount of retroactive engineering recovers what was lost in the initial architectural commitment.

The author is a contributor to Zenon Network, which implements the architectural pattern described in Section VI: separating consensus ordering from execution, with state commitments verified independently of execution complexity. Readers interested in a concrete instantiation of verification-first design may find it worth examining.

References

Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., & Winwood, S. (2009). “seL4: Formal Verification of an OS Kernel.” Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP ‘09), 207-220.

Leveson, N.G. & Turner, C.S. (1993). “An Investigation of the Therac-25 Accidents.” IEEE Computer 26(7), 18-41.

Leveson, N.G. (2017). “The Therac-25: 30 Years Later.” IEEE Computer 50(11), 8-11.

Nakamoto, S. (2008). “Bitcoin: A Peer-to-Peer Electronic Cash System.” bitcoin.org/bitcoin.pdf

RTCA. (2011). DO-178C: Software Considerations in Airborne Systems and Equipment Certification. RTCA, Inc.

IEC. (2010). IEC 61508: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems. International Electrotechnical Commission.

Lamport, L. (1978). “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM 21(7), 558-565.

Schneider, F.B. (1990). “Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial.” ACM Computing Surveys 22(4), 299-319.

Spracklen, J., Wijewickrama, R., Sakib, A.H.M.N., Maiti, A., Viswanath, B., & Jadliwala, M. (2025). “We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs.” Proceedings of the 34th USENIX Security Symposium.

Yin, J., Martin, J.P., Venkataramani, A., Alvisi, L., & Dahlin, M. (2003). “Separating Agreement from Execution for Byzantine Fault Tolerant Services.” Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ‘03), 253-267.