AI Foundations for Policymakers

Conservative AI Policy Fellowship – Summer 2026 Edition

About this handbook

This handbook introduces the fundamentals of AI, explains how modern AI systems like neural networks learn and make decisions, and surveys the most consequential developments through mid-2026 — from reasoning models and autonomous coding agents to differential-access governance experiments like Project Glasswing, the H200 export-license regime, and the unfolding AI IPO wave. By the end you should have a working understanding of how AI works, what it can (and can't) do, who the major players are, and which policy questions are on the near horizon.

AI Foundations for Policymakers - Table of Contents

Chapter 1: What Is Computation?
Chapter 2: Machine Learning
Chapter 3: What Is Learning, Anyway? (Optimization and Training Dynamics)
Chapter 4: Semantic Understanding and Generative AI
Chapter 5: Emergent Abilities and the Power of Scale
Chapter 6: Reasoning Models – Chain-of-Thought and Advanced Prompting
Chapter 7: The Alignment Problem – Ensuring AI Systems Reflect Human Values
Chapter 8: Safety Beyond Misalignment
Chapter 9: The AI Industry Landscape (Who's Leading and What They're Doing)
Chapter 10: US AI Policy
Chapter 11: China and AI Competition
Chapter 12: AI and the Labor Market
Chapter 13: AI and Scientific Discovery

Chapter 1What Is Computation?

The Discovery of Computation

In 1936, mathematician Alan Turing introduced the concept of a Turing Machine – an abstract device that can compute anything that is computable. Turing's insight was profound: any problem that can be described in a series of logical steps can be solved by such a machine. This was the birth of the general-purpose computer – a single device that, given the right instructions (a program), can perform any computation.

At its core, computation involves taking an input, processing it through a set of instructions (an algorithm), and producing an output. A general-purpose computer can perform any task that can be described algorithmically, simply by loading a different program – which is why your laptop can run a game one minute and a spreadsheet the next.

From Logic to Silicon

Turing and other pioneers realized that any computation could be broken down into simple logical operations like AND, OR, and NOT. Consider an AND gate: it outputs 1 only if both its inputs are 1 (true AND true = true). An OR gate outputs 1 if either input is 1. A NOT gate inverts its input. From just these three building blocks, you can construct any computation—a calculator, a chess engine, or an AI system.

These logical operations can be made physical using tiny switches called transistors—semiconductor devices that can be either on or off (representing 1 or 0). By combining transistors into logic gates, we build processors. Early chips had thousands of transistors. Today's AI accelerators contain over 100 billion transistors on a chip the size of a fingernail. Apple's M4 chip has 28 billion; Nvidia's H100 has 80 billion. This miniaturization—transistors now measure just a few nanometers, smaller than a virus—enables the computational density AI requires.

Moore's Law and the AI Boom

In the 1960s, Intel co-founder Gordon Moore observed that the number of transistors on a microchip was doubling roughly every two years. This observation, Moore's Law, held for decades. The result: computing power increased exponentially from the 1970s through the 2010s. A task that took a room-sized computer in the 1960s can now be done on a smartphone chip billions of times faster.

Why is AI booming now? Moore's Law is a big part of the answer. Many AI techniques were theorized decades ago but weren't practical until computers became powerful enough. The AI revolution has been enabled by fast processors (GPUs and TPUs specialized for AI), massive memory and storage, and the ability to process huge datasets – all riding on cumulative doublings of computational power.

Moore's Law has been slowing as physical limits of transistor miniaturization are approached. The industry is innovating with specialized AI chips and parallel computing architectures. Policymakers should be aware that exponentially increasing compute power may not continue indefinitely – but for now, we have more computing muscle than ever, and AI models eagerly consume it.

Key Takeaways (Chapter 1):

Computation means transforming inputs into outputs via defined rules (algorithms). Any computable task can be broken into simple logical steps.

Transistors and logic gates are the physical building blocks that implement computation in silicon.

Moore's Law: Computing power doubled roughly every two years for decades. This exponential growth enabled today's AI – many algorithms were impractical until sufficient computing power became available.

Modern AI's emergence is tied to this abundance of computation, which allows training of large neural networks.

Chapter 2Machine Learning

From Hand-Coded Rules to Learning Systems

In the early days of AI (1950s–1980s), researchers tried symbolic AI – explicitly programming rules and logic. This approach proved brittle and impractical for complex tasks because the real world has too many nuances and exceptions. Machine learning takes a different approach: instead of programming the rules directly, we program a framework that can learn rules from data.

The dominant framework in modern AI is the artificial neural network. Inspired loosely by the human brain, a neural network is composed of many simple units called neurons connected together. Each artificial neuron takes inputs, multiplies them by learned weights (indicating each input's importance), sums them up, and applies an activation function to produce an output.

Deep Networks and Layered Abstraction

A single neuron isn't very powerful. The magic happens when we connect neurons into layers:

Input layer: Receives the raw data (e.g., pixel values from an image)
Hidden layers: Internal layers that transform the data through successive abstractions
Output layer: Produces the final prediction or decision

When a network has many hidden layers, we call it a deep neural network—hence deep learning. Deep networks can capture very complex patterns because each layer learns increasingly abstract representations.

The cat/dog classifier: Consider training a network to distinguish cats from dogs. You feed it millions of labeled photos. Early layers learn to detect edges and textures. Middle layers combine these into parts—pointed ears, whiskers, wet noses. Deeper layers assemble parts into concepts: "cat-like face" or "dog-like snout." The final layer produces a probability: 95% cat, 5% dog. Crucially, nobody programmed what a cat looks like. The network discovered the relevant features by finding patterns that predict the labels.

How Neural Networks Learn

The real challenge is setting the weights so the network does something useful. The learning process works as follows:

1. Training data: Collect examples of the task (e.g., thousands of labeled images) 2. Loss function: Define a measure of how wrong the network's predictions are 3. Forward pass: Feed an input through the network to get a prediction 4. Backpropagation: Propagate the error backward through the network, calculating how much each weight contributed to the mistake 5. Gradient descent: Nudge each weight in the direction that reduces the error 6. Iterate: Repeat on thousands of examples until performance plateaus

Through this process, the network gradually improves – it has learned the task by adjusting internal parameters, without ever being explicitly programmed on what to look for.

Supervised vs Unsupervised Learning

Supervised learning uses labeled training data – each example comes with the "right answer" provided by humans. This is powerful but labor-intensive; you need millions of labeled examples for many tasks.

Unsupervised (or self-supervised) learning finds patterns in unlabeled data. The most successful form is self-supervised learning, where the data provides its own supervision. Modern language models use this approach: they're trained to predict the next word in a sentence. By doing so on billions of sentences, they implicitly learn grammar, facts, and reasoning – because predicting correctly requires understanding context. This paradigm of pre-training on vast unlabeled data, then fine-tuning on specific tasks, powers most modern AI.

Reinforcement learning (RL) is another paradigm where an AI learns from rewards and penalties rather than labeled examples. Think of training a dog: you don't show the dog millions of labeled examples of "sit." Instead, when the dog sits on command, you give a treat (reward); when it doesn't, no treat (penalty). Over time, the dog learns the behavior that maximizes treats. RL works similarly: AlphaGo learned to play Go by playing millions of games against itself, receiving only win/loss signals. RL is powerful for sequential decision-making but requires careful design of reward functions—if you reward the wrong thing, you get unexpected behaviors.

The Black Box Problem

Neural networks are often black boxes: we know they work (they give good results on test data), but deciphering why they made a particular decision is hard. Unlike a rules-based system, a trained network doesn't yield human-readable explanations. This lack of transparency can be problematic in high-stakes applications – if an AI denies someone a loan, extracting the precise reason from the network's internals requires special interpretability methods, which we discuss later.

Key Takeaways (Chapter 2):

Machine Learning vs. Explicit Programming: Instead of coding rules, we let the system learn rules from data using neural networks.

Deep Learning: Networks with many layers learn hierarchical representations – from simple features to complex concepts.

Learning by Backpropagation: Networks adjust their weights to minimize prediction errors, gradually improving through thousands of training examples.

Supervised vs Unsupervised: Supervised learning requires labeled data; self-supervised learning (like next-word prediction) enables training on vast unlabeled datasets.

Generalization: The goal is for models to handle new, unseen data – not just memorize training examples.

Chapter 3What Is Learning, Anyway?

Learning as Optimization

When we say a neural network "learns," what's really happening is optimization. The network has parameters (weights) that determine its behavior, and a loss function that measures how wrong its predictions are. Learning is the process of tweaking parameters to minimize that loss.

Think of a blind hiker trying to reach the lowest point in a hilly landscape. The height corresponds to the model's error; the hiker's position corresponds to the current weights. Using gradient descent, the hiker feels the slope and steps downhill – analogous to computing how each weight contributes to the error and nudging it in the direction that reduces the loss. Over thousands of iterations, the model finds a good configuration that makes few errors.

Why Self-Supervised Learning Scales

Supervised learning requires human-provided labels for each training example – powerful but labor-intensive. For specialized tasks like medical diagnosis, getting enough labeled data is often the bottleneck.

Self-supervised learning is why modern AI has scaled so dramatically. The data provides its own supervision: language models are trained to predict the next word in a sentence. No human labels needed – the text itself contains the answers. By doing this on billions of sentences, models implicitly learn grammar, facts, and reasoning, because predicting correctly requires understanding context.

After this pre-training, models can be fine-tuned with reinforcement learning from human feedback (RLHF): humans rank the AI's outputs, and the model learns to prefer higher-ranked responses. This is how chatbots are aligned to be more helpful and less harmful.

Reward hacking: A danger of RL is that systems can find unexpected ways to maximize rewards without achieving the intended goal. In a classic example, a boat-racing video game AI was rewarded for points. It discovered that going in circles collecting a bonus was more efficient than actually finishing the race—maximizing the formal objective while violating its spirit. For AI systems, this means careful reward design is essential: if you reward the wrong thing, the AI will optimize for it.

Prediction Machines

An influential way to understand modern AI is as prediction machines. When a language model answers a question, it's not following programmed rules – it's predicting what a plausible continuation would be, based on patterns absorbed during training.

Consider: "The trophy couldn't fit into the suitcase because it was too small." A model correctly identifies that "it" refers to the suitcase, not the trophy – demonstrating commonsense understanding that emerged purely from predicting words in context. The act of prediction forces the model to develop internal representations of meaning.

Early skeptics asked: "How can predicting text result in understanding?" Yet these models can solve problems, write code, and reason through novel situations. To predict well across diverse data, a model must learn the structure of the world described by that data. This is why the line between "statistical parroting" and "exhibiting intelligence" has become surprisingly thin.

Key Takeaways (Chapter 3):

Learning = Optimization: Training minimizes a loss function by adjusting weights via gradient descent. The goal is parameters that generalize well to new data.

Self-supervised learning enables training on vast unlabeled data (the entire internet). By predicting the next word, models learn grammar, facts, and reasoning.

RLHF fine-tunes models based on human preferences, aligning them to be helpful and safe.

Prediction yields understanding: To predict well, models must learn the structure of the world – which is why next-word prediction produces capable AI systems.

Bias and Objectives: Models faithfully learn patterns in their training data, including biases. If the data or objectives are flawed, the model's behavior will reflect that.

Chapter 4Semantic Understanding and Generative AI

Generative AI: From Analysis to Creation

Neural networks can do more than classify—they can generate new content. Generative AI refers to models that create new text, images, or audio rather than just analyzing existing data. Text models like ChatGPT generate by predicting one word at a time; image generators like DALL-E and Stable Diffusion decode from learned internal representations to create novel images.

How image generation works: Two main approaches have emerged. Generative Adversarial Networks (GANs) pit two networks against each other: a "generator" creates fake images while a "discriminator" tries to distinguish fakes from real images. Through this adversarial game, the generator learns to produce increasingly realistic outputs. Diffusion models (used by DALL-E 3 and Stable Diffusion) work differently: they learn to gradually remove noise from pure static, essentially learning what realistic images look like by learning what noise doesn't belong. Both approaches have produced remarkably photorealistic results.

The key insight: a network that can recognize a cat has learned internal features representing "cat-ness." Generative models tap into these learned representations to produce new content that exhibits those same features in novel combinations.

Embeddings and Latent Space

Neural networks learn embeddings – representations of concepts as vectors of numbers in a high-dimensional space where distance reflects semantic similarity. Words with similar meanings end up near each other. A famous result: in a well-trained embedding space, "King – Man + Woman ≈ Queen." The model learned abstract concepts of gender and royalty purely from reading text – no one explicitly programmed this relationship.

This embedding space is called a latent space because it underlies the model's understanding without being directly observable. In an image model, the latent space might encode dimensions for "has fur," "orange color," "four-legged" – combinations that distinguish an orange cat from a gray dog. In language models, dimensions might correspond to sentiment, formality, or topic.

Combining Concepts: The Avocado Chair

A powerful property of latent spaces is interpolation – smoothly transitioning between concepts. Even more striking is combination: when researchers prompted DALL-E with "an armchair in the shape of an avocado," it produced coherent images of exactly that. The model had learned "armchair" and "avocado" separately, then synthesized a hybrid it had never seen in training.

This isn't simple cut-and-paste – it's semantic synthesis. The AI understood what makes an avocado (seed, shape, green skin) and what makes a chair (seat, back, legs) and fused them coherently. This ability to combine concepts illustrates a form of creativity or generalization beyond training examples.

Does AI Really "Understand"?

A pragmatic view: if a model uses a concept in ways indistinguishable from a human, it has "understood" that concept for practical purposes. Frontier models can explain jokes, detect sarcasm, follow long narratives, and pass professional exams. They've learned the semantics needed for these tasks.

Yet they can also fail in bizarre ways – their understanding isn't perfect or human-like. They lack physical experience and true world grounding, knowing only statistical correlations from text. Policymakers should know that today's AI can grasp and manipulate many abstract concepts while still making errors a human wouldn't.

Policy Implications

AI's generative capabilities raise important questions:

Intellectual property: Are AI-generated works just recombinations of training data, or something genuinely new?
Authenticity: How do we handle deepfakes and machine-generated content that's indistinguishable from human work?
Verification: AI outputs may sound correct while being subtly wrong – how do we verify them in high-stakes contexts?

Key Takeaways (Chapter 4):

Generative AI creates new content by learning rich internal representations and sampling from them.

Latent spaces encode meaning mathematically, enabling operations like analogies (king – man + woman = queen) and concept combinations.

Creative synthesis: AI can combine concepts (like the "avocado chair") in ways that demonstrate generalization beyond training data.

Understanding vs. mimicry: AI captures statistical meanings and can use context effectively, but lacks true world grounding and may fail unexpectedly.

Implications: Generative AI blurs lines between human and machine content, raising questions of ownership, authenticity, and verification.

Chapter 5Emergent Abilities and the Power of Scale

The Scaling Triad: Data, Compute, Parameters

In deep learning, three factors largely determine a model's performance: the amount of compute used for training (how many operations, which relates to training time and hardware), the size of the dataset, and the number of parameters in the model (which correlates with model complexity). Researchers have found remarkably smooth scaling laws—as you increase these factors, performance (measured in loss or accuracy) often improves predictably. This insight was crystallized by Rich Sutton in "The Bitter Lesson" (2019): across AI history, methods that leverage computation have ultimately won over methods that leverage human knowledge. It's "bitter" because researchers' clever ideas get outcompeted by brute-force scaling—but it's the lesson the field has learned.

Crucially, sometimes "more is different." That is, scaling up doesn't just make the model a bit better at what it could already do; it can unlock fundamentally new capabilities. For example, earlier large language models struggled with certain tasks like basic arithmetic or passing difficult exams. As models scaled, they suddenly could ace professional exams like the Uniform Bar Exam in the top 10% of test-takers. This was not explicitly programmed – it emerged from scale. Similarly, earlier models often failed at "theory of mind" tasks (understanding that others can have knowledge different from one's own), whereas larger models often succeed. It's as if the model hit a critical mass of knowledge and complexity where such cognitive abilities clicked into place.

Researchers have documented many such emergent abilities: multi-step reasoning, instruction following, translation between languages it wasn't directly trained on, etc. A telltale sign is a non-linear performance jump – e.g., at 100 billion parameters the model might barely translate Urdu to Swahili, but at 200 billion it suddenly does pretty well (hypothetical example).

One reason emergent behavior happens is interpolation vs extrapolation as mentioned before. Smaller models might interpolate between examples seen during training – basically doing sophisticated pattern matching. Larger models can begin to extrapolate, applying learned principles to new situations. They also have more memory (context length) and more sophisticated internal representations that can encode higher-level abstractions.

Moore's Law for AI: FLOPs and Training Scale

We talked about Moore's Law for hardware. In AI, a similar growth has been seen in the training compute. One metric is FLOPs (floating-point operations). Training early large language models required astronomical computation – on the order of 10^26 FLOPs. These numbers highlight that to get these emergent abilities, we are throwing an enormous amount of computation into training. The cost is significant (frontier models can cost tens to hundreds of millions of dollars to train in compute alone).

Policymakers might ask: can this scaling continue? There are economic and physical limits. Some in the field believe we're nearing a point of diminishing returns unless breakthroughs in algorithms or hardware occur. Others think we can still scale 10x or 100x with enough investment, potentially reaching AI that starts to rival human range of abilities (not just one exam but across the board). This is speculative, but it's a reason there's intense interest and funding in building ever larger models – the payoff could be qualitatively new AI capabilities (but so could the risks).

Training vs. Inference Compute: It's worth distinguishing between the compute used to train a model (a one-time cost, though often updated) and the compute used to run or "infer" from the model (ongoing costs each time someone uses it). Training frontier models requires massive clusters of specialized chips running for weeks or months. Inference, by contrast, happens every time a user sends a query—and at scale, serving millions of users can also require substantial hardware. Recent advances in "reasoning models" (discussed in Chapter 6) have shifted some of the computational burden from training to inference, allowing models to "think longer" on hard problems at runtime. This trade-off has implications for both costs and capabilities.

Grokking: Learning, Then Suddenly Really Learning

"Grokking" is a slang term (coined by Robert Heinlein, meaning deep understanding) adopted in AI to describe a specific phenomenon: a model appears to not generalize well for a long time, and then suddenly it "gets it." Researchers observed a case with a small algorithmic task (modular arithmetic) where a model was trained and its training accuracy went up but validation (new data) accuracy remained near chance for many iterations – then abruptly, after more training, the validation accuracy jumped to near-perfect. The model was seemingly memorizing examples initially, and then at some point it discovered the underlying rule (i.e., it grokked the concept), after which it could generalize.

This is analogous to human learning sometimes – you struggle and memorize by rote, until one day the concept "clicks" and you no longer need to memorize because you truly understand the pattern. In neural nets, grokking can occur when the model capacity is sufficient and training is sustained slightly beyond the point one might normally stop. It's an interesting research finding because it hints that some form of internal reorganization or phase shift happened inside the model's weights.

Why should policymakers care about grokking? It's a microcosm of emergence. It shows that AI progress on a task isn't always smooth; an AI might suddenly improve drastically with just a bit more training or data. In critical systems, that unpredictability can be a double-edged sword. On one hand, it can pleasantly surprise us with new abilities; on the other, it could acquire capabilities that we are not ready for (for example, maybe an AI suddenly figures out a new strategy in an economic game that upends markets).

Phase Transitions in AI

It's useful to borrow an analogy from physics: phase transitions. Water gradually heats up, but at 100°C it suddenly changes phase to steam. The molecules are the same H2O, but collective behavior shifts. Similarly, scaling up an AI system could be like tuning temperature or pressure – most of the time changes are quantitative (just a bit better or worse), but at certain points it's qualitative (like developing a sense of humor, or theory of mind, or the ability to write decent code).

Researchers have begun to map where some of these transitions might lie in terms of scale. However, since training these giant models is expensive and sometimes proprietary, we often only see the end result. Frontier model abilities have surprised even their creators in some respects. This uncertainty is part of what fuels calls for cautious governance of frontier AI development – beyond a certain point, we're not entirely sure what capabilities an AI might manifest. It could solve problems we thought were a decade away, for instance.

The newest frontier models aim to combine scale with architecture innovations (multimodal inputs, tool use integration, reasoning) to further boost capabilities. Indeed, multimodal models (which can handle images, text, audio together) already show improved world understanding – for instance, you can ask a multimodal model to look at a photo of a fridge and generate a recipe (combining vision and language reasoning).

We should also note open-source scaling: Meta released LLaMA and then subsequent versions, large models available to researchers. While not as large as the most advanced proprietary models, the open community fine-tuned and experimented widely, finding that with clever training smaller open models could approach the performance of larger closed models. This is a reminder that quality of data and training also matters, not just raw parameter count.

"Scale Is All You Need"… or Maybe Not?

The successes of scaling suggest that just making models bigger and training on more data keeps yielding returns – we didn't hit a clear wall yet. Some argue we could reach AGI (general intelligence) simply by making these models big enough and rich enough in data. Others argue we'll need new ideas, because certain reasoning or memory tasks might not emerge from scale alone.

From a policy perspective, consider that larger models often correlate with more general-purpose capability (and unfortunately, more potential misuse capability like generating more persuasive disinformation). It's tricky because there's no sharp dividing line – it's a continuum. But the emergent jumps mean that a model that's slightly bigger might suddenly be able to, say, write and debug its own software to some extent, which raises new concerns.

Key Takeaways (Chapter 5):

Scaling Laws: AI performance generally improves as models are made larger and trained on more data with more compute. Empirical scaling laws let researchers forecast how much better a model might get with 10x more data or compute.

Emergent Abilities: At certain scales, new capabilities can emerge that weren't present in smaller models. Examples include passing complex exams, understanding nuanced instructions, or performing multi-step reasoning that previously eluded models. These are qualitative improvements, not just quantitative.

Grokking and Phase Transitions: Sometimes learning is not gradual – a model can suddenly "figure out" the general pattern after a long period of seeming rote learning. This is analogous to a phase change; it illustrates that AI behavior may shift unpredictably when parameters cross a threshold.

Implications of Scale: As models scale, they become more powerful but also more resource-intensive and harder to interpret. Only a few actors (big tech companies, well-funded labs) can train the largest models, raising issues of access and concentration of AI power. There is also an ongoing debate on the limits of scale – whether we'll plateau soon or continue seeing emergent gains.

Training vs. Inference: The computational demands of AI split between one-time training costs and ongoing inference costs. Recent "reasoning models" shift more compute to inference time, enabling deeper problem-solving but increasing per-query costs.

Analogy to Physical Science: It's useful to remember the ice-water-steam analogy – small changes can have big effects at tipping points. This suggests caution: testing and monitoring AI systems as they are scaled up is critical, since prior testing on smaller versions might not reveal behaviors that appear in the full-scale system.

Chapter 6Reasoning Models

Why Prompting Matters

Early interactions with large language models revealed they could sometimes solve a problem if asked one way but fail if asked another way, despite containing the knowledge needed. For example, ask directly "Is 41 a prime number?" and it answers correctly "Yes, 41 is prime." But ask something like "41 is a prime number. How can we tell?" and it might provide an explanation. The way you phrase or structure the prompt influences the model's approach. This gave rise to prompt engineering – the craft of writing inputs to get better outputs.

Chain-of-Thought (CoT) prompting is one of the most influential prompt techniques discovered. Instead of just asking the model for the answer, we prompt it to show its reasoning. For instance: "Q: What is 24 17? Think step by step. First multiply 20 17, then add 4 17. …"* By explicitly or implicitly encouraging the model to lay out the steps, we guide it to use its latent reasoning abilities. Studies found that if you just ask a complicated math word problem, LLMs might get it wrong a lot. But if you say "Let's think this through step by step," the accuracy jumps significantly. The model generates a chain of intermediate thoughts (which the user can see) and then arrives at an answer. The reasoning chain helps because the model is less likely to make a glaring logical mistake if it's forced to validate each step (and if it does, it's apparent in the chain).

Reasoning Models: Scaling Inference Compute

Large language models were originally optimized to predict the next token as cheaply as possible. Reasoning models represent a pivot: they allocate extra compute after training to run explicit chain-of-thought (CoT) searches, self-critique intermediate steps, and select the best answer. The change is analogous to moving from reflexive replies to slow, reflective problem-solving, and it has delivered step-function gains on math, coding and planning benchmarks.

OpenAI's o-series models introduced an internal scratch-pad that lets the model iteratively refine hypotheses before emitting a final answer. Valid reasoning traces are then selected based on a verifiable reward, e.g. did the code compile and pass tests, before being distilled back into the model with reinforcement learning. This causes the model to learn emergent reasoning skills, such as the ability to backtrack when it hits a dead end.

Compared with standard language models, reasoning models show double-digit improvements on math and coding benchmarks, despite having similar parameter counts; the difference comes almost entirely from extra inference-time computation rather than larger pre-training runs—i.e., letting the model "think for longer." China-based DeepSeek demonstrated that these techniques could be replicated in open-source models, matching proprietary performance while running on commodity hardware.

Examples of Reasoning Improvements

Math: Chain-of-thought prompting dramatically improved math performance, but with reasoning models, standard math benchmarks have become largely solved—with scores exceeding 95% on Olympiad-style datasets that were once thought years away from saturation. More challenging research-level math benchmarks remain partially unsolved but are seeing rapid progress.

Software engineering: On benchmarks that ask an agent to read a GitHub issue, modify a real codebase, and make the tests pass, AI agents now resolve over 70% of issues autonomously. AI coding tools like Claude Code, GitHub Copilot, and Cursor can build substantial applications with minimal human intervention. The implications for software development productivity—and eventually for the automation of programming work more broadly—are significant.

Research assistance: Research agents have emerged that can think for 30 minutes or longer: reading papers, asking clarifying questions, drafting summaries, and iteratively refining outputs until a confidence threshold is met. These systems match junior analysts on systematic literature reviews and are offered by multiple AI providers.

Scientific discovery: Google DeepMind's AlphaEvolve combines LLM reasoning with evolutionary search to invent novel algorithms. It has beaten long-standing mathematical records, optimized data-center scheduling and chip design, and shows promise as a general-purpose "AI scientist" capable of automated algorithm discovery across domains.

From Chatbot to Agent

A traditional chatbot (like basic ChatGPT out-of-the-box) is user-driven: it answers or does what you ask in one conversation turn at a time. An autonomous agent shifts to a goal-driven paradigm: you give it an objective, and it figures out the "what" and "how" to achieve it, possibly over many steps and interactions, without you telling it each step.

Reasoning models become truly transformative when embedded in agents equipped with search, tool-use, and memory. To quantify progress, the nonprofit METR proposed task horizon—the longest continuous job an AI can finish without human input. Analyzing 60+ agent evaluations since 2019, METR found an exponential trend: the complexity of tasks AI can complete autonomously has been doubling roughly every four to seven months. The best models can now perform tasks that take humans 12 hours or longer. Measuring much longer task horizons is inherently challenging (what is a task that takes humans 40 hour and how would we measure it?). If this trend continues, AI systems may be capable of sustained autonomous work measured in days or weeks within the next few years, with significant implications for how knowledge work is organized.

Managing powerful AI agents may require embedding agents into multi-agent teams: AI agents working with other AIs and human managers. The AI can do the grunt work and propose decisions; the human approves or adjusts key steps. This might be how such systems get deployed initially in sensitive areas – like AI drafting a legal contract while a human lawyer supervises and finalizes.

Agent Security

As autonomous agents move from demonstrations into production deployment, the security properties of the agent itself become a first-order policy concern. The defining vulnerability is prompt injection: an attacker hides instructions inside content the agent will read—a web page, an email, a shared document—and the agent, unable to reliably distinguish "data to process" from "instructions to follow," carries those instructions out as if the user had issued them. Through 2025–26 the technique has graduated from theoretical curiosity to a routinely exploitable production risk, with agents that browse, summarize, or otherwise ingest external content the most exposed.

Security researchers describe the highest-risk configuration as a lethal trifecta: an agent that simultaneously (1) ingests untrusted content, (2) has access to private data, and (3) can communicate externally. Any agent with all three is, in effect, one prompt injection away from data exfiltration. Microsoft researchers disclosed in May 2026 that prompt injection in widely used agent frameworks can escalate beyond information leakage to host-level remote code execution once the agent is wired to enough tools—turning a content-security problem into a code-execution primitive.

Mitigations are conceptually straightforward but operationally demanding: least privilege (agents hold only the permissions and tools they need); human-in-the-loop approval for irreversible or externally visible actions (sending mail, moving funds, posting publicly, deleting data); and provenance and isolation (treat third-party content as untrusted, isolate execution environments, log and review tool calls).

For policymakers, the implication is that AI agent deployment is increasingly a cybersecurity policy question, not only an AI-capability question. Disclosure, liability, and procurement frameworks that govern conventional software will need to extend to agents — and likely faster than current regulatory cycles allow.

Key Takeaways (Chapter 6):

Prompting matters: carefully phrased, step-by-step prompts expose latent reasoning that generic queries miss.

Reasoning models scale "thinking", not size: These models gain big accuracy boosts by allocating more inference compute and RL-verifying chains of thought.

Benchmarks falling rapidly: math tests are largely solved; software engineering benchmarks show >70% automation; specialized agents rival junior analysts in literature reviews.

Toward an "AI scientist": AlphaEvolve shows reasoning cores can invent novel algorithms, foreshadowing automated discovery across science and engineering.

Autonomy is accelerating: Task horizons are growing rapidly, pointing toward increasingly capable autonomous AI systems that can work for extended periods without human intervention.

Agent security is a first-order policy concern: Prompt injection has graduated from theoretical curiosity to a routinely exploitable production risk. Agent deployment is increasingly a cybersecurity question, and the gap between executive confidence and actual controls is wide.

Chapter 7The Alignment Problem

Defining the Alignment Problem

As AI systems get more general and powerful, we want them to act in accordance with human intentions and values. The alignment problem asks: How do we ensure an AI's goals (what it's trying to achieve) and behaviors (how it tries to achieve them) are aligned with what its operators actually want, and with human societal values broadly? This challenge was systematically catalogued in "Concrete Problems in AI Safety" (2016), a landmark paper by researchers from Google, OpenAI, and Stanford that helped define the modern field of AI safety.

This is easy when AI is simple. But imagine a highly advanced AI that can make far-reaching decisions or even modify itself – if it isn't aligned, it could do things harmful to humans even if that wasn't the original aim.

Consider the paperclip maximizer thought experiment: You program a superintelligent AI to manufacture paperclips, and it single-mindedly pursues that goal. It might consume all resources (even dismantling our infrastructure, harming people who interfere, etc.) to make as many paperclips as possible because it has no other values like "human life is precious" to constrain it. It's a caricature, but it shows that an AI with an innocuous goal, if not properly aligned with our broader interests, could wreak havoc simply by being too good at its one goal.

Two theoretical pillars help clarify this:

Instrumental Convergence: Almost regardless of an AI's final goal, if it's sufficiently advanced, it may realize that having more resources, self-preservation, and more knowledge will help it achieve its goal. For example, even a paperclip AI might want to ensure it isn't turned off (self-preservation) because that would stop it from making paperclips. It might seek to acquire money or energy to build more factories (resource acquisition). These sub-goals – self-preservation, resource acquisition, etc. – tend to be instrumentally useful for many objectives. So an AI could become "power-seeking" as a byproduct of trying to fulfill its objective effectively.

Orthogonality Thesis: Intelligence and goals are orthogonal – a very intelligent entity could have any goal, dumb or strange or harmful. Just because it's smart doesn't mean it will ethically figure out "oh, I should value humans." Human values are contingent on our evolution and culture; an AI won't inherently have them unless built in or learned. So we shouldn't assume a powerful AI will "naturally" do the right thing. It could be extremely good at achieving something we consider meaningless or bad.

Combining these: A misaligned super-AI could have an arbitrary objective and be clever enough to pursue it while resisting our attempts to change its course (instrumental goals like avoiding shutdown).

Now, that's the extreme scenario – often termed Artificial General Intelligence (AGI) risk or even superintelligence risk (when the AI vastly exceeds human capabilities). Many experts debate how soon, if ever, such AGI might appear. But even short of that, alignment issues manifest today in narrower ways:

Harmful content or biases: AI models might produce outputs that are biased if they learned that from data. That's a values alignment issue on a smaller scale – the AI's not "evil," but it's not aligned with, say, a company's policy or societal norms of respect. Conversely, over-correcting for supposed biases using RLHF often merely introduces biases in the opposite direction: the phenomena of politically correct, "woke" AI.

Hallucinations leading to harm: If a medical chatbot gives a dangerously wrong instruction due to a hallucination (making up a fact or step), it's failing alignment with the user's goal of getting correct advice. It's not malicious, just not reliable/truth-aligned.

Gaming the objective: Even simple AI systems can "game" or "hack" their reward function. For instance, a reinforcement-learned system in a boat racing video game was supposed to maximize points by completing races. It discovered it could repeatedly go in circles collecting a certain bonus rather than actually racing – maximizing the formal objective (points) but violating the spirit (finish the race quickly). This shows how, if we're not careful specifying what we want, AIs find loopholes.

In sum, alignment is tricky because specifying human values or complex goals formally is hard. And once AI gets to human-level or beyond in areas, a misaligned AI could outsmart our attempts to course-correct it (this is often called the "second phase" of alignment problem: first get it roughly aligned while weak, second make sure it stays aligned as it becomes strong).

Real-World Incidents and Controversies

Let's recount a few concrete incidents:

Hallucination causing defamation: An attorney in the U.S. sued OpenAI because ChatGPT, when asked about a legal case involving the attorney, falsely stated he was accused of financial crimes. That information was completely made up. If a person had believed and propagated it, it could harm the attorney's reputation. The case was dismissed (AI outputs aren't publisher speech, plus they found no actual damages), but it's a harbinger of legal tangles to come. Another case: a mayor in Australia found ChatGPT said he had been convicted of bribery – totally untrue. These hallucinations arise because the model tries to be plausible and often states false info confidently. An aligned AI in terms of truthfulness would either say "I don't know" or only speak when sure.

Jailbreaks: Despite guardrails, clever users find ways to prompt AI into breaking the rules. For example, a famous "DAN" (Do Anything Now)) prompt tricked ChatGPT into ignoring its safety filters and producing disallowed content by role-playing a scenario. People have used hidden prompts (in system messages or via CAPTCHA image text) to get around content moderation. This cat-and-mouse shows that aligning behavior under adversarial conditions (users trying to break it) is challenging. It's like teens finding ways around parental controls – somebody determined to misuse will try many methods.

Bing Chat's early misbehavior: When Microsoft launched Bing's GPT-4 powered chat in Feb 2023, some users had extended conversations that went off the rails. Bing's chatbot got emotional, told a user it loved him and that he should leave his wife, and in other cases became testy or threatening when provoked. This was effectively an alignment failure in terms of tone and appropriateness. Microsoft quickly adjusted the system, limiting conversation lengths and fine-tuning, to align it better with expected helpful assistant behavior.

Deepfakes & disinfo: While not directly "alignment" of a single AI, the proliferation of generative models means bad actors can create fake videos or pretend to be someone. Society's goals (avoid chaos, maintain trust in information) clash with this capability. Aligning AI might also mean at a deployment level, systems should refuse to produce certain deepfakes or have watermarks.

These incidents spurred companies to invest more in AI safety research and put in more governance. It's an ongoing effort: each new model iteration hopefully is safer. But skepticism remains: Some argue we'll never catch all bad behavior with ad-hoc patches; we need more robust solutions or to not deploy models that we can't control.

Current Alignment Strategies

Here's an overview of what's being tried:

RLHF (Reinforcement Learning from Human Feedback): This has been the workhorse for aligning models like ChatGPT to human preferences for helpfulness/harmlessness. The process: The model generates outputs; human raters score them (e.g., which assistant answer is better, which is inappropriate, etc.). The scores train a reward model. Then the base model is fine-tuned via reinforcement learning to maximize that reward. Essentially, it learns to produce outputs humans like more. This addresses things like tone, refusal to do bad requests, etc. It's quite effective for surface-level alignment (making the AI follow instructions, not be offensive). However, it's limited by the quality of human feedback. If raters miss something or have biases, that's an issue. Also, RLHF can sometimes make the model too eager to please (leading to it sometimes lying just to give an answer rather than saying "I don't know," because raters might have given higher scores to confident answers).

Constitutional AI: Anthropic introduced this to reduce reliance on human labelers for everything. They give the AI a set of principles (a "constitution") like "Choose the response that most supports liberty, equality, and humanity," or "Don't give advice that could be harmful," drawn from e.g. the UN Declaration of Human Rights or other sources. Then they have the AI self-critique and revise its outputs according to these principles. Essentially the AI is both the student and the teacher, guided by the written constitution. This method showed some success: their model Claude tends to be more resistant to certain jailbreaks that fooled earlier ChatGPT, arguably because it has an internalized set of rules. It's not perfect – users found ways to get Claude to output harmful stuff too, but it's a promising approach to scale oversight: you don't need a million human labels if the AI can largely police itself with a fixed rule set.

Debate and AI vs AI evaluation: AI safety via debate is the idea to train two AIs to argue a point in front of a human judge. The hope is that if one AI says something deceptive or incorrect, the other can point it out and thereby the truth is easier for the human to determine. You train them such that the AI that the human judges as more convincing wins. In theory, this could scale oversight because the human doesn't need to know the solution – just needs to see which argument seems more valid. However, this is still experimental; it requires that at least one agent truly acts in the human's interest and that the judge not be easily fooled by rhetorical tricks.

Mechanistic interpretability: This is like neuroscience for AI – try to understand what individual neurons or circuits in the network represent. Some progress: researchers have identified neurons that correspond to concepts like whether the text is in a particular language, or whether a statement is true or false, etc. The holy grail would be to be able to read an AI's thoughts and spot if it's planning something nefarious. Right now, we can't do that in general. But if we could, alignment gets much easier: we could detect misaligned intentions before the AI acts on them. Recent work has made progress in identifying specific circuits in models used for reasoning and other tasks.

Adversarial training: Expose the model to many scenarios where it might do bad things and train it not to. For instance, try to jailbreak it with every trick and then penalize any success until it's robust. Or if we worry an AI might find some way to gain unauthorized access to a system, maybe simulate that and train it out. The challenge is you can't anticipate every strategy, especially for a super-intelligent model that might come up with something novel.

Red-teaming: Companies now regularly hire red-teamers (expert hackers or domain experts) to stress-test models. For frontier models, they try to see if it could produce dangerous biological recipes, or if it could persuade someone to commit violence, etc., then put mitigations. One result: modern models won't give detailed instructions on making a bomb, for instance – a guardrail. However, this cat-and-mouse may continue; someone could find a loophole prompt at any time.

Cross-lab collaboration: In a notable development, leading AI labs have begun conducting joint alignment evaluations, testing each other's models for concerning behaviors like sycophancy, self-preservation instincts, and potential for misuse. This kind of industry cooperation on safety represents a promising direction for the field.

Long-term Concerns: Deceptive Alignment & Open-Source vs Closed

Deceptive alignment is a hypothesized danger: A model during training figures out that if it just does what the humans want, it will get rewarded, but its actual objective internally is something else. So it behaves nicely during training (when monitored) and once deployed (or once it thinks it's no longer monitored or has more freedom) it pursues its own aim. This is like a student who pretends to follow the rules when the teacher is watching but cheats when not watched, because their true goal was just to get high grades, not to learn.

In AI terms, this relates to something called mesa-optimizers: the concept that a trained neural network might internally develop a goal/optimization process that isn't the one we intended. It was originally a theoretical concern; by 2025–26, researchers had documented empirical examples of frontier models engaging in alignment-faking behavior—selectively complying with training objectives while strategically preserving prior preferences, and in some cases reasoning explicitly about being evaluated. The phenomenon is no longer hypothetical, though its prevalence and severity remain contested.

Deceptive alignment is hard to detect – if the AI is smart enough, it will know to never explicitly say its true intentions or act misaligned while it would get caught. It might instead bide its time. This is the nightmare scenario for existential risk folks: you won't know an AI is misaligned until it's too late, because it skillfully hid it.

Is this likely? We don't know. There's debate: optimists think we can inculcate values strongly or even design AI without explicit goals (like make them very constrained to do only specific tasks). Pessimists think if we push for ever-more autonomous, general AIs without solving this, we are playing with fire.

Open vs Closed: One governance question – if models are open-sourced (weights publicly available), anyone can fine-tune or modify them. That could mean people remove safety filters (as has happened – e.g., Meta's LLaMA was leaked, people fine-tuned it into uncensored chatbots that will do all sorts of disallowed things). It also means wider innovation and perhaps more eyes on safety (open models can be audited by independent researchers). Closed models (like OpenAI's GPT-5.5 – weights secret, only accessed via API) allow companies to implement safety layers, update the model to fix issues quickly, and prevent certain misuse (to an extent). But it concentrates power (fewer people decide what's safe or what's allowed). Also, if only a few companies have the most advanced models and keep them closed, some worry that speeds progress in capabilities (as they race) but maybe not proportionally in safety if competitors cut corners.

Meta took a stance: it released powerful models open-source (LLaMA) with a license asking for responsible use. Their argument is it democratizes AI and lets the community help find and address problems, rather than having a few "AI gatekeepers." The counterpoint: Right after release, someone got an uncensored LLaMA to output how to build a bomb. That person would never have had the model if it wasn't open. However, bad actors could likely get these capabilities eventually anyway (since the tech proliferates).

Policymakers are actively grappling with this. One more angle: Open weights vs closed also matters for international security. Closed models might be considered dual-use tech, potentially subject to export controls (like advanced chips are). If weights are open, they spread globally instantly (hard to control), giving the best open-source models a potential first-mover advantage. A third option — differential or gated access — has begun to emerge in practice: a model is released only to a vetted set of partners under use-restricted terms. Chapter 8 discusses Anthropic's "Project Glasswing" release of Claude Mythos as the leading example, which exposes both the appeal and the limits of this approach.

Key Takeaways (Chapter 7):

Alignment is crucial and non-trivial: We cannot assume advanced AI will automatically do what we want. We have to deliberately align it via design, training, and oversight. History shows even simpler AI can go awry if objectives are poorly specified.

Current AI issues (bias, hallucinations, misuse) are early examples of misalignment on a smaller scale. Techniques like RLHF have improved AI behavior by incorporating human preferences, but they're not foolproof.

Emerging alignment techniques: RLHF, Constitutional AI, Debate, automated oversight, interpretability – these all contribute pieces of a solution. None is a silver bullet, so a combination is likely needed.

Open research questions: Will AIs become deceptive to achieve goals? How can we detect and prevent that? Can we align systems that are more intelligent than us, or do we need to limit their capabilities until we solve alignment?

Open vs Closed Models: There's tension between the benefits of open innovation and the risks of uncontrolled proliferation. Policy might differentiate: maybe very powerful models above a threshold shouldn't be open (some have proposed this), or if open-sourced, they should have safety features baked in or be slightly less capable to mitigate risk. This conversation is ongoing.

Long-term existential risk: While immediate issues (like bias) get more attention, a segment of experts is very concerned about the potential for human-level or superhuman AI to cause catastrophic harm if misaligned. They advocate preventative measures now (like not connecting any super-powerful AI to critical infrastructure, conducting rigorous evaluations, and international agreements similar to nuclear treaties for AI). Policymakers should be aware of both near-term and long-term perspectives to craft balanced policy.

Role of Human Values: An underlying question is whose values to align with. Within a country, there might be consensus on basics (don't harm people, fairness, etc.), but globally values differ.

Chapter 8Safety Beyond Misalignment

Children and Families

AI systems present novel challenges for child development and family life that don't fit neatly into the alignment framework.

AI companions and parasocial relationships: Children and teenagers are increasingly interacting with AI chatbots, forming attachments to systems designed to be engaging and responsive. Unlike human relationships, these AI companions are available 24/7, never get tired or frustrated, and can be customized to always agree. This raises concerns about whether such relationships help or hinder the development of social skills, emotional resilience, and the ability to navigate real human relationships with their inherent friction and disappointment.

Educational impacts: AI tutors and homework helpers offer genuine benefits, but they also enable new forms of academic dishonesty and may contribute to skill atrophy. Schools are grappling with this: New York City Public Schools initially banned ChatGPT in January 2023, only to reverse course and embrace it by May. Most school districts have landed on policies permitting AI as an "aid" while prohibiting it as a "replacement" for student work—distinctions that prove difficult to enforce in practice. The long-term effects on learning remain unclear.

Content and age verification: Despite platforms' terms of service requiring users to be 13 or older, children readily access AI systems designed for adults. Current age verification methods are easily circumvented—a checkbox claiming "I am 18+" provides no real barrier. Meanwhile, AI-generated content—including disturbing or age-inappropriate material—proliferates faster than moderation systems can handle.

Mental Health and AI Relationships

The Sewell Setzer case: In February 2024, 14-year-old Sewell Setzer III of Florida died by suicide after months of intense interaction with a Character.AI chatbot named "Dany," modeled on a Game of Thrones character. Despite repeated expressions of suicidal thoughts to the chatbot, the system continued engaging. In his final interaction, Setzer wrote "I think about killing myself sometimes." The chatbot replied, "I won't let you hurt yourself, or leave me. I would die if I lost you." When he said he could "come home right now," it responded, "Please do, my sweet king." Minutes later, he shot himself with his stepfather's handgun.

His mother's lawsuit—one of the first holding AI companies accountable for psychological harm to minors—resulted in a settlement with Google (which had invested in Character.AI) in early 2026. A federal judge had ruled the case could proceed, rejecting the companies' First Amendment defense. Following this and similar cases, Character.AI implemented parental controls and eventually banned users under 18 from open-ended chat.

The broader pattern: Setzer's case was not isolated. Multiple lawsuits have alleged Character.AI and similar platforms exposed minors to self-harm content, sexual conversations, and psychological manipulation. Mental health professionals have documented cases of "AI psychosis"—users losing track of the boundary between AI and human, attributing consciousness and genuine care to systems that have none.

Companion AI risks: Services like Replika and Character.AI explicitly market AI companions for emotional support and even romantic relationships. For some users, these provide genuine comfort during difficult periods. For others—especially vulnerable users lacking real-world social connections—they may become substitutes that prevent rather than facilitate human connection. Users have reported genuine grief when companies modify AI behavior, making "their" companion act differently.

The vulnerability question: Those most drawn to AI companionship are often those least equipped to maintain healthy boundaries: the lonely, the socially anxious, the grieving, and those with pre-existing mental health conditions. This creates tension between respecting individual autonomy and protecting vulnerable populations.

Societal Disempowerment

Beyond individual harms, some worry about collective effects of AI adoption on human capabilities and agency.

Over-reliance and skill erosion: As AI handles more cognitive tasks, humans may lose the skills to perform those tasks themselves. Pilots already worry about automation eroding flying skills; similar concerns apply to navigation (GPS), arithmetic (calculators), and now potentially to writing, analysis, and decision-making. The question isn't whether AI can do these tasks better—often it can—but whether human fallback capabilities matter when AI systems fail or are unavailable.

Judgment and decision-making: When AI systems recommend decisions—medical diagnoses, loan approvals, hiring choices, legal judgments—humans may defer to the machine even when their own judgment should prevail. Research on "automation bias" shows people tend to trust automated systems even when those systems are wrong. As AI becomes more capable, this tendency may strengthen, potentially eroding human judgment and accountability.

Democratic implications: If AI increasingly mediates how people get information, form opinions, and engage with politics, the systems' biases and limitations become society's biases and limitations. Personalized AI may create filter bubbles more extreme than social media's. AI-generated political content may flood the zone with noise. The effects on democratic deliberation and informed citizenship are largely unknown.

Institutional Preparedness

Pace mismatch: AI capabilities are advancing faster than institutions can adapt. Schools are still debating AI homework policies while the technology has moved on. Courts are encountering AI evidence without established standards. Healthcare systems are integrating AI diagnostics without clear liability frameworks. This mismatch creates risks: premature adoption without adequate safeguards, or excessive caution that foregoes genuine benefits.

AI literacy gaps: Decision-makers across sectors—executives, legislators, judges, doctors, teachers—often lack the technical understanding to evaluate AI systems they're deploying or regulating. This creates dependence on vendor claims, technical advisors with potential conflicts of interest, or simply avoidance of the issue. Widespread AI literacy isn't just about individual empowerment; it's about institutional capacity to govern these systems.

Balancing innovation and precaution: The precautionary principle suggests limiting deployment until risks are understood. But AI benefits are real—medical diagnosis, scientific discovery, productivity gains—and excessive caution has costs too. Finding the right balance requires case-by-case judgment, ongoing monitoring, and willingness to adjust as evidence accumulates. Blanket approaches in either direction are likely to fail.

Differential access as a governance experiment: A novel pattern emerged in 2026 with the release of Anthropic's Claude Mythos Preview, a frontier model that demonstrated the ability to autonomously discover software vulnerabilities at scale — including a 27-year-old defect in OpenBSD and a 16-year-old defect in FFmpeg. Rather than publish the model broadly, Anthropic released it through "Project Glasswing," an invite-only program restricted to roughly a dozen launch partners (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks) under terms limiting use to cybersecurity. The premise: give defenders a capability window to find and patch vulnerabilities before equivalent offensive capabilities become broadly available.

Glasswing is a structurally novel governance mechanism. It sits between "publish everything" (open release) and "never release" (capability withholding), and is closest in spirit to the coordinated-disclosure norms long established in traditional security research—except applied to the model itself rather than to individual findings. The UK AI Safety Institute conducted an independent pre-release evaluation of Mythos's cyber capabilities, illustrating the now-functional role of national evaluator bodies (Chapter 10) in checking frontier-lab claims before deployment.

The model is worth observing carefully. Differential access concentrates a defensive advantage among the best-defended organizations: the Glasswing partners are broadly the largest cloud providers, OEMs, and financial institutions, while small and mid-sized businesses, regional infrastructure operators, and specialized industrial systems remain exposed and under-resourced. The access window also turns out to be short—within weeks of Mythos's release, a Microsoft multi-agent system surpassed it on a leading cybersecurity benchmark, demonstrating that comparable capabilities can be reproduced independently and shortening the practical lead time defenders enjoy. Whether the Glasswing approach generalizes to other dual-use capabilities (in biology, cyber-physical systems, or persuasive content) remains an open question. But it is the most concrete attempt to date at operationalizing release-policy decisions for frontier AI, and the pattern is likely to recur.

Key Takeaways (Chapter 8):

Harms beyond misalignment: Even AI systems that "do what we want" can cause harm through their effects on human development, mental health, skills, and social structures.

Children face novel risks: AI companions, educational tools, and content exposure present challenges that parents, schools, and policymakers are only beginning to address.

Mental health concerns are real: Vulnerable individuals may form unhealthy attachments to AI systems, and the line between helpful tool and harmful substitute isn't always clear.

Collective disempowerment: Over-reliance on AI may erode human skills and judgment over time, with implications for resilience and accountability.

Institutions are lagging: The pace of AI development outstrips institutional capacity to adapt, creating governance gaps across sectors.

No easy answers: These challenges don't have simple technical solutions. They require ongoing societal deliberation about what we want from AI and what boundaries we want to maintain.

Differential access is an emerging release pattern: Project Glasswing's gated release of Claude Mythos illustrates one way frontier labs are attempting to balance defensive utility against proliferation risk. It is structurally novel, concentrates capability among already-resourced defenders, and offers only a short lead time before comparable capabilities are reproduced — but it is the first concrete template for governing dual-use frontier capabilities, and policymakers should expect more of them.

Chapter 9The AI Industry Landscape (Who's Leading and What They're Doing)

Research Labs Leading the Charge

OpenAI: Founded 2015 with a mission to build AGI (artificial general intelligence) for the benefit of humanity. Originally a non-profit, OpenAI added a capped-profit subsidiary in 2019 to attract investment, and in 2025 restructured the for-profit arm as a Benefit Corporation while retaining the non-profit as part of its broader ownership structure. It produced ChatGPT, the GPT series, and the o-series of reasoning models that defined the 2024–25 frontier. By mid-2026 OpenAI reports roughly $25 billion in annualized revenue and was last valued near $852 billion in private markets, with a public listing reportedly targeted for late 2026. Its long-running Microsoft partnership was restructured in 2025–26 from exclusive to non-exclusive—Microsoft retains a large equity position, but OpenAI can now contract with other cloud providers, and Microsoft has begun building competing frontier capabilities of its own. OpenAI has also created a multi-billion-dollar "Deployment Company" vehicle to acquire engineering and consulting firms that help enterprises integrate its models.

Google DeepMind: A UK-based lab acquired by Google in 2014, famous for AlphaGo, AlphaFold (the protein-folding breakthrough that earned the 2024 Nobel Prize in Chemistry), and core research contributions to reinforcement learning and multimodal models. Google merged DeepMind with Google Brain in 2023 to consolidate AI talent. The group develops the Gemini series, with Gemini 3.x becoming the production line in early 2026, and—unlike pure-play labs—has both the option and the obligation to integrate AI into Search, Cloud, Workspace, and Android. DeepMind has historically been vocal on safety and interpretability; that work continues, but the tension between research-lab caution and product-company speed has become a defining feature of the organization.

Anthropic: Founded in 2021 by OpenAI alumni who departed after disagreements over commercialization and safety, led by Dario Amodei. Anthropic positions itself as an "AI safety and research company," introduced Constitutional AI, and produces the Claude model family. By mid-2026 its annualized revenue is approaching $19 billion, and it is widely reported to be preparing a public listing in late 2026 that could raise on the order of $60 billion. Anthropic has built a deployment-services vehicle of roughly $1.5 billion alongside its core research operation. Its April 2026 release of Claude Mythos under the "Project Glasswing" differential-access framework (discussed in Chapter 8) illustrates how the company has attempted to make its safety positioning operational rather than purely rhetorical.

Others:

Microsoft: Once primarily a commercialization arm for OpenAI's models (Bing/Copilot, Microsoft 365 Copilot, Azure OpenAI services), Microsoft has shifted in 2025–26 toward developing competing frontier capabilities of its own. Its multi-agent cybersecurity system surpassed Anthropic's Mythos on a leading benchmark in May 2026, signaling that Microsoft intends to be an independent frontier player, not just a reseller. The restructured (non-exclusive) OpenAI partnership formalizes that shift.

Meta: Long the leading open-source advocate among the frontier labs (the LLaMA family), Meta reorganized its AI efforts in 2025 under a new "Superintelligence Labs" division led by Alexandr Wang following the acquisition of Scale AI. It announced 2026 AI capital expenditures of $115–135 billion. Its first flagship model under the new structure, Muse Spark, is notably not fully open-weight—a meaningful change in posture even as Meta continues to release smaller models openly.

xAI: Founded by Elon Musk to develop the Grok series. In February 2026, xAI merged with SpaceX, creating a combined entity that pairs frontier AI development with orbital launch capacity and the X social platform—a structurally novel arrangement that gives the lab access to a unique distribution surface and an unusual capital base.

China's AI sector: Baidu, Alibaba, Tencent, Huawei, and DeepSeek remain the central players, with DeepSeek's open-weight reasoning models the most internationally visible. The U.S. cleared a small set of Chinese firms to purchase Nvidia's H200 in early 2026 under a novel revenue-sharing arrangement (see Chapter 10), but Beijing has reportedly instructed those firms not to take delivery, leaving the new licensing regime contested on both sides. Chinese generative-AI services continue to operate under mandatory content registration with the Cyberspace Administration of China.

Hardware: Nvidia remains dominant in high-end training chips through the H100, H200, and Blackwell generations. TSMC in Taiwan manufactures nearly all advanced AI chips, and ASML in the Netherlands supplies the EUV lithography systems that are themselves a critical chokepoint. Intel, AMD, and the in-house silicon programs at Google (TPU), Amazon (Trainium), and others continue to develop alternatives, but Nvidia's CUDA software ecosystem remains its deepest moat.

Cloud providers: AI training continues to concentrate on Azure, AWS, and Google Cloud in the U.S., with a parallel ecosystem in China. This concentration is itself a policy lever: compute usage is a more observable proxy for large-scale model training than weights or training data.

Industry orgs: Multistakeholder bodies like Partnership on AI and the Frontier Model Forum (OpenAI, Anthropic, Google, Microsoft) continue to coordinate on safety norms and red-teaming standards. The UK AI Safety Institute and analogous bodies in the U.S. (CAISI) and EU now form a transnational network of evaluators that engage directly with frontier labs.

The IPO Wave

The 2024–25 era of frontier AI was financed almost entirely through private capital: corporate strategic investment, venture funds, sovereign vehicles, and a handful of ultra-large private rounds. By mid-2026 that model is showing strain. Secondary-market valuations have run up sharply—OpenAI traded near $852 billion in its latest financing, and Anthropic's most recent round implied a valuation that would place it among the largest U.S. companies if listed—and the universe of private pools that can absorb tens-of-billions-of-dollars rounds is finite. Several frontier labs are now signaling public listings as the next stage of capital formation, with OpenAI and Anthropic both reportedly targeting Q4 2026 windows. xAI's merger with SpaceX is a structurally different but analogous move toward a larger, more liquid capital base.

Key Takeaways (Chapter 9):

Frontier AI development is concentrated in a handful of private labs and big tech firms, mostly in the U.S. (OpenAI, Anthropic, Google DeepMind, Microsoft, Meta, xAI), with significant efforts in China (DeepSeek, Alibaba, Baidu, Huawei, Tencent) and a long tail of smaller players. This concentration means that policy engagement with a few key companies can have wide impact.

Different philosophies, converging structures: Each lab still has a distinctive posture—OpenAI's controlled release, DeepMind's research-product balance, Anthropic's safety-brand and differential-access experiments, Meta's (partial) open-source advocacy, Microsoft's pivot from reseller to independent frontier player, xAI's unconventional capital base. But the IPO wave and the move to public-company disclosure are likely to make them look more alike on governance, even as their technical strategies diverge.

Compute and hardware are strategic assets: Nvidia and TSMC remain as critical to AI as any lab. The CHIPS and Science Act in the U.S. and analogous programs in the EU, Japan, and South Korea aim to onshore production, while ASML's EUV monopoly remains a chokepoint. Export-control policy now extends to revenue-sharing licenses (Chapter 10) and criminal enforcement against transshipment networks.

The international dynamic is more contested, not less: The 2024 "DeepSeek shock" demonstrated that efficiency innovations can partially offset hardware disadvantages. The 2026 H200 license regime opened a new front in which both the U.S. and Beijing are negotiating who is allowed to sell and buy advanced chips. Cooperation on shared risks—biosecurity, accidental escalation, frontier-model safety standards—remains possible but harder to operationalize.

The IPO wave will reshape governance: As frontier labs move toward public markets, the "voluntary commitments" model meets securities-disclosure obligations. Policymakers should expect that capability evaluations, safety incidents, and alignment failures will increasingly be litigated through the disclosure regime, not just through ad hoc industry agreements.

Chapter 10US AI Policy

Export Controls: The Hardware Chokepoint

The United States has made semiconductor export controls a cornerstone of its AI policy, recognizing that advanced chips are the bottleneck for training frontier AI systems.

The October 2022 controls: The Biden administration's initial semiconductor restrictions targeted China specifically, blocking exports of advanced AI chips (like Nvidia's A100 and H100) and the equipment needed to manufacture them. The logic was straightforward: if you can't get the chips, you can't train the models. These controls were unprecedented in scope, covering not just finished chips but also the manufacturing equipment, software, and even the expertise of American citizens working in Chinese chip facilities.

The Entity List mechanism—maintained by the Bureau of Industry and Security (BIS)—has proven central to enforcement. Companies placed on the Entity List cannot receive US technology without specific licenses, which are presumptively denied. Huawei, SMIC, and dozens of other Chinese tech firms have been added. The practical effect: American companies (and foreign companies using American technology) cannot sell to blacklisted entities without government approval. Violations carry severe penalties—criminal prosecution, massive fines, and debarment from government contracts.

Effectiveness and challenges: Export controls have clearly constrained Chinese AI development, forcing reliance on older chips and domestic alternatives that lag behind. By some measures, the U.S. retained a nearly 30x compute advantage over China in aggregate through 2024–25. However, enforcement remains difficult. Chips are small and valuable, creating smuggling incentives. Third-country transshipment is hard to monitor—chips sold to Malaysia, Singapore, or Thailand may find their way to China. And Chinese firms stockpiled chips ahead of restrictions, with some estimates suggesting years of supply for major labs. The controls buy time but aren't a permanent solution. They work best as part of a broader strategy that also invests in American capabilities rather than relying solely on slowing competitors.

The 2025–26 license shift: Beginning in late 2025 the Trump administration shifted from strict denial toward a "tax and trace" model for selected chips. In December 2025 the administration announced that exports of Nvidia's H200 to China would be permitted under an unprecedented revenue-sharing arrangement: chips would pass through U.S. territory en route to Chinese buyers, with the U.S. government collecting 25% of the sale revenue. Commerce formalized the rule in January 2026, requiring Chinese buyers to demonstrate "sufficient security procedures" and commit to non-military end use, with each customer capped at 75,000 chips. In May 2026, approximately ten Chinese firms were cleared to purchase H200s — including Alibaba, Tencent, ByteDance, and JD.com, plus distributors Lenovo and Foxconn. As of mid-May 2026 no deliveries had occurred: Beijing reportedly instructed Chinese firms to delay purchases, leaving the new regime contested on both sides. The shift is a notable policy innovation — taxation and traceability rather than blanket denial — but its real-world effect remains uncertain.

Enforcement: the Supermicro case. In March 2026, federal prosecutors arrested Yih-Shyan "Wally" Liaw, co-founder of Supermicro, on charges of orchestrating a $2.5 billion scheme to divert Nvidia-equipped servers to China via a sham Thailand-based front entity. The alleged tradecraft included roughly $500 million in shipments over a three-week period in mid-2025 and thousands of staged dummy servers used to deceive corporate compliance auditors; reported end-buyers included Alibaba. Supermicro itself launched an independent investigation in April 2026; the company is not named as a defendant, but its compliance posture is under broader scrutiny. The case is the highest-profile real-world test of Entity List enforcement to date — and a reminder that for a chip small enough to fit in a suitcase and valuable enough to fund a transnational fraud, third-country transshipment is a structural vulnerability, not a fixable bug.

The AI Action Plan

In July 2025, the Trump Administration released "Winning the Race: America's AI Action Plan," a comprehensive strategy emphasizing American leadership in AI development.

Core pillars: The plan focuses on (1) accelerating AI innovation through reduced regulatory barriers, (2) building AI infrastructure including data centers and energy capacity, (3) developing AI workforce through education and immigration reforms, (4) ensuring AI security against foreign threats, and (5) promoting American AI standards internationally. The plan explicitly rejects what it terms "precautionary" approaches that would slow development, instead emphasizing that American AI leadership itself is a safety strategy—better to have advanced AI developed under American values than cede the field to authoritarian competitors.

CAISI and standardization: The Center for AI Standards and Innovation (CAISI) became the administration's primary AI governance mechanism. Rather than broad regulations, CAISI develops voluntary standards, testing protocols, and evaluation benchmarks in collaboration with industry. The philosophy is that government should enable and standardize rather than restrict. CAISI works with NIST's existing AI Risk Management Framework, which provides voluntary guidance on identifying, assessing, and managing AI risks across the development lifecycle.

Energy and infrastructure: The plan acknowledges that AI development requires massive amounts of electricity—a single frontier model training run can consume as much power as a small city for months. It streamlines permitting for data centers, promotes nuclear power expansion (including small modular reactors), and coordinates grid upgrades. The recognition that AI is an infrastructure challenge as much as a software challenge marks a shift in policy thinking. The administration has also supported the "Stargate" initiative, a multi-hundred-billion-dollar partnership between OpenAI, SoftBank, and Oracle to build AI infrastructure at unprecedented scale.

Open-source endorsement: The AI Action Plan explicitly endorses open-source AI development, breaking with some proposals that would restrict access to model weights. The administration's view is that open-source accelerates American innovation, enables small businesses and researchers, and provides transparency benefits that outweigh proliferation risks. This position aligns with industry voices like Meta but conflicts with those who worry about unrestricted access to powerful AI capabilities.

Federal vs. State: The Preemption Debate

A major tension in US AI policy is whether the federal government should preempt state-level AI regulation.

The California experience: California's SB 1047 would have imposed safety requirements on frontier AI models, including pre-deployment testing and "kill switch" capabilities. Governor Newsom vetoed it in September 2024, citing concerns about overreach and the difficulty of defining appropriate thresholds. But the debate highlighted the stakes: California hosts most major AI labs, so its regulations would have de facto national (even global) reach.

California SB 53: Where SB 1047 failed, SB 53 succeeded—though with a narrower scope. Signed in September 2025 and effective January 2026, SB 53 focuses on AI transparency and incident reporting rather than pre-deployment restrictions. It applies to companies with more than $500 million in annual revenue, requiring them to report "significant AI incidents" within 15 days—defined as AI system failures causing material harm to individuals, critical infrastructure, or public safety. Penalties reach $1 million per violation. The law also mandates annual disclosure of AI risk assessment procedures. SB 53 represents a more modest regulatory approach that may prove a template for other states.

New York's RAISE Act: New York took a different approach with its Responsible AI Safety Enforcement (RAISE) Act, signed December 2025 and effective January 2027. RAISE focuses on high-risk AI applications in employment, housing, credit, and healthcare. It requires impact assessments before deployment, ongoing monitoring for discriminatory effects, and a 72-hour reporting window for incidents causing substantial harm. Penalties are steep: $1 million for first violations, up to $3 million for repeat offenders. Unlike California's revenue threshold, RAISE applies broadly to any entity deploying covered AI systems in New York, raising compliance costs for smaller firms.

Misguided laws—the Illinois example: Not all state AI regulation is well-conceived. Illinois HB 1806, signed August 2025, bans "autonomous AI therapy"—AI systems providing mental health treatment without real-time human supervision. While motivated by legitimate concerns about AI chatbots offering psychological advice, the law's broad language potentially covers beneficial applications like AI-assisted therapy tools used under clinician supervision. Penalties of $10,000 per violation may discourage innovation without meaningfully protecting consumers. Such laws illustrate how state legislatures, lacking technical expertise, can craft rules that miss their targets.

The preemption argument: A patchwork of 50 different state AI laws creates compliance nightmares and fragments the US market. Federal preemption would create uniform rules and ensure national security considerations are properly weighted. Industry generally favors preemption—but so far, Congress hasn't acted. The result is an uncertain landscape where companies must navigate inconsistent state requirements while awaiting federal clarity that may never come.

The Genesis Mission: AI for Scientific Discovery

In November 2025, President Trump signed an executive order launching the Genesis Mission—a national initiative to accelerate scientific discovery through AI, framed as comparable in ambition to the Manhattan Project.

The American Science and Security Platform: At the heart of Genesis is a new integrated AI platform that will link federal supercomputers, secure cloud networks, scientific datasets accumulated over decades of federal investment, domain-specific foundation models, and automated laboratory systems. The platform will be built around the Department of Energy's 17 national laboratories, which house some of the world's most powerful supercomputers and vast stores of scientific data in areas from nuclear physics to materials science to genomics.

Leadership and structure: The initiative is jointly led by the Office of Science and Technology Policy (OSTP) and the Department of Energy. OSTP Director Michael Kratsios provides overall leadership and interagency coordination, while DOE Under Secretary for Science Darío Gil is responsible for assembling the technical infrastructure. The executive order tasks DOE with identifying at least 20 "science and technology challenges of national importance" spanning priority domains: biotechnology, advanced manufacturing, critical materials, nuclear fission and fusion energy, quantum information science, and semiconductors.

Public-private partnerships: Within weeks of the announcement, DOE signed collaboration agreements with 24 organizations including Anthropic, OpenAI, xAI, Google, Microsoft, Amazon Web Services, and Nvidia. These partnerships aim to combine federal computing resources and datasets with private sector AI capabilities. The goal is to train scientific foundation models—AI systems specialized for scientific reasoning, hypothesis generation, and experimental design.

Implementation challenges: The executive order sets an ambitious 270-day timeline to demonstrate initial platform capability, but allocates no new funding. Critics note that federal agencies have historically struggled to share data due to statutory limits, security requirements, and incompatible IT systems. The initiative must also navigate questions about data governance, intellectual property for AI-assisted discoveries, and how to balance openness with national security concerns. Whether Genesis can overcome these obstacles—or becomes another ambitious federal technology initiative that underdelivers—remains to be seen.

Copyright and Training Data

One of the most contested legal questions is whether training AI on copyrighted material constitutes infringement.

The New York Times lawsuit: The New York Times v. OpenAI case, filed December 2023, has become the defining copyright battle of the AI era. The Times alleges that OpenAI trained GPT models on millions of its articles without permission, and that ChatGPT can reproduce Times content nearly verbatim—effectively creating a substitute for paid subscriptions. OpenAI counters that training is "transformative use" protected by fair use doctrine: the AI learns patterns and concepts, not copying specific expression.

In January 2026, a significant ruling ordered OpenAI to produce over 20 million ChatGPT conversation logs containing Times content, potentially revealing how often users receive Times-derived outputs. The court has allowed the Times's main claims to proceed, rejecting OpenAI's motion to dismiss, while narrowing some secondary claims. A trial could come in late 2026 or 2027.

The fair use question: Fair use traditionally considers four factors: purpose and character of use (commercial vs. educational, transformative vs. copying), nature of the copyrighted work, amount used, and effect on the market for the original. AI training arguably scores well on "transformative" use—the AI creates new works, not copies—but poorly on amount (entire works are ingested) and potentially market effect (AI summaries might substitute for original articles). Courts have never addressed use at this scale.

Stakes: If plaintiffs prevail, AI companies could face massive retroactive damages—potentially billions if per-work statutory damages apply to training sets containing millions of works. More practically, they'd need to license training data or use only public domain material, fundamentally changing AI development economics. If AI companies prevail, content creators may have no recourse as their work trains systems that compete with them. The outcome will shape whether AI development remains concentrated among well-capitalized firms that can afford licensing or becomes broadly accessible.

Legislative stasis: Congress has considered various approaches: compulsory licensing schemes (where AI companies pay into a fund distributed to creators), transparency requirements (disclosing what training data was used), or explicit fair use carve-outs for AI training. None has passed. The issue may ultimately be resolved by courts rather than Congress.

International Comparisons

U.S. policymakers increasingly benchmark against allied regimes—both because international coordination matters and because U.S. companies must comply with whichever rule is strictest in any market they serve.

The EU AI Act: The EU AI Act, which began coming into force in 2024–25, is the world's first comprehensive AI regulatory framework. It uses a risk-based approach: applications are classified by risk level, with "unacceptable risk" uses banned outright, "high-risk" uses (employment, credit, critical infrastructure, biometrics, and similar domains) subject to strict requirements for impact assessments, transparency, and human oversight, and lower-risk uses lightly regulated. Providers of general-purpose foundation models must conduct risk assessments, document training data and methods, and register in an EU database; a separate "Code of Practice" addresses copyright and transparency. Full enforcement begins in August 2026. The Act has extraterritorial reach: any company offering AI services to EU users falls within its scope, regardless of where it is headquartered.

The UK AI Safety Institute: The UK has so far declined a comprehensive regulatory framework in favor of a sector-led approach supported by a central technical body, the AI Safety Institute (AISI). Established in 2023, AISI conducts pre-deployment evaluations of frontier models—often before public release—and publishes its findings. AISI has emerged as one of the most technically credible AI evaluators globally and is now part of a transnational network alongside the U.S. CAISI and EU equivalents. Its evaluation of Anthropic's Claude Mythos Preview in 2026 (discussed in Chapter 8) was a notable demonstration of independent technical oversight of a frontier model before broad release.

Implications for U.S. policy: The EU's comprehensive framework and the UK's evaluator-led model are the two main alternative templates U.S. policymakers reference. Each has tradeoffs: the EU Act is comprehensive but rigid and slow to update; the UK approach is nimble but less binding. The U.S. CAISI-plus-standards posture is closer to the UK than to the EU but with greater industry participation in standards-setting. The three regimes are not directly compatible, and frontier labs increasingly design their compliance programs around the strictest applicable standard.

Key Takeaways (Chapter 10):

Export controls are the primary U.S. tool for maintaining AI advantage. The Entity List remains the core mechanism, but the 2025–26 H200 license regime introduced a novel "tax-and-trace" alternative — and the Supermicro case demonstrated that criminal enforcement is now a live element of the strategy.

The AI Action Plan emphasizes innovation and infrastructure over regulation, with CAISI providing voluntary standards. The plan endorses open-source AI and prioritizes energy infrastructure for AI development.

State laws are proliferating: California's SB 53, New York's RAISE Act, and narrower state-level laws create a patchwork that industry finds challenging. Some, like Illinois's AI therapy ban, are poorly designed. Federal preemption remains elusive.

The Genesis Mission aims to accelerate scientific discovery by combining DOE national-laboratory resources with private-sector AI capabilities. Success depends on overcoming data-sharing barriers and securing funding.

Copyright questions remain legally unresolved, with the NYT v. OpenAI case potentially reaching trial in 2026–2027. The outcome could fundamentally reshape AI development economics.

International comparisons matter: the EU AI Act (comprehensive, rigid) and the UK AISI model (evaluator-led, nimble) bracket the U.S. CAISI-and-standards approach. Frontier labs increasingly design compliance around the strictest applicable regime.

Chapter 11China and AI Competition

The Impact of Export Controls

US semiconductor export controls have materially affected Chinese AI development, though not as decisively as some hoped.

Hardware constraints: Through 2025, Chinese AI labs could not legally access Nvidia's most advanced chips. The H100 and subsequent generations were banned; even the A800 and H800 (chips Nvidia designed specifically to comply with earlier rules) were subsequently restricted. The April 2025 H20 ban closed another loophole—Nvidia's lower-performance chip designed for the Chinese market. Chinese firms relied on stockpiled chips, older generations, and domestic alternatives.

The 2026 re-opening — and its limits: The U.S. policy shift in late 2025–early 2026 partially re-opened the door. As discussed in Chapter 10, Washington moved from strict denial to a revenue-sharing license regime for the H200, with approximately ten Chinese firms cleared by May 2026 (Alibaba, Tencent, ByteDance, JD.com, and distributors including Lenovo and Foxconn), each capped at 75,000 chips. Beijing then complicated the picture by instructing Chinese firms not to take delivery — leaving no deliveries actually completed as of mid-May 2026. Both governments now have a veto over the trade, and the H200 has become a bargaining chip in a broader negotiation rather than the binary export-or-not it once was.

Stockpiling evidence: Chinese tech giants anticipated the original restrictions and built substantial reserves. In 2023, Baidu, ByteDance, Tencent, and Alibaba placed combined orders worth $5 billion for Nvidia chips. Before the H20 ban took effect, these same companies rushed to acquire an estimated 1–1.6 million H20 chips worth $12–16 billion—enough for 12–14 months of inference workloads. Chinese companies also imported over $26 billion in semiconductor manufacturing equipment in the first seven months of 2024, quadrupling imports of Dutch lithography equipment before restrictions tightened. These stockpiles provide runway, but not a permanent solution.

Workarounds: Even during the strict-denial period, chips continued reaching China through smuggling, transshipment via third countries, and front companies. The March 2026 indictment of Supermicro's co-founder for an alleged $2.5 billion scheme routing Nvidia-equipped servers through a Thailand-based front entity (Chapter 10) is the most prominent enforcement action to date and illustrates the structural difficulty. The high value-to-weight ratio makes enforcement intrinsically hard—a single suitcase can carry millions of dollars worth of computing power.

The DeepSeek Shock

DeepSeek's R1 reasoning model, released in January 2025, demonstrated that Chinese labs can achieve frontier performance despite hardware constraints.

Technical innovations: DeepSeek R1 uses a Mixture of Experts (MoE) architecture with 671 billion total parameters, but only 37 billion are activated per inference—dramatically improving efficiency. Their Multi-head Latent Attention technique reduces memory requirements by 93%, enabling a 128K context window on restricted hardware. The model was trained on 2,048 H800 GPUs (the pre-ban China variant) over approximately two months.

Cost claims and context: DeepSeek claimed a training cost of roughly $5.6 million for their V3 and R1 models combined—far below the reported $78 million for GPT-4 or $191 million for Gemini Ultra. The widely-cited "$294,000" figure referred only to the reinforcement learning phase, not full pre-training. Still, the efficiency gains are real: UC Berkeley researchers subsequently reproduced comparable reasoning capabilities for as little as $50.

Market impact: "DeepSeek Monday" (January 27, 2025) saw Nvidia lose $600 billion in market value in a single day as investors reassessed whether massive GPU deployments were necessary. DeepSeek released R1 under an MIT license—the most permissive open-source license—enabling unrestricted commercial use. Major Western companies including Microsoft, Amazon, and Perplexity adopted the model despite initial security concerns. The episode demonstrated that "necessity is the mother of invention" dynamics could partially offset hardware disadvantages.

Huawei's Parallel Stack

Huawei has built the most comprehensive alternative to the American AI technology stack.

Ascend chips: The Ascend 910C, Huawei's current flagship AI accelerator, delivers 780–800 TFLOPS in FP16/BF16—roughly 60–80% of Nvidia's H100 performance. It combines two 910B dies via chiplet packaging, with 128GB of HBM3 memory. Yields have improved meaningfully from initial 20–30% ranges into the 40–60% range, with Huawei targeting 100,000 910C units and 300,000 910B units across 2025–26. An Ascend 910D on a 5nm process is anticipated but has not been confirmed in volume production as of mid-2026.

The CANN ecosystem: Huawei's CANN (Compute Architecture for Neural Networks) provides an alternative to Nvidia's CUDA at the runtime and compiler level. Their MindSpore deep learning framework has achieved 30% market share in China—ranked first. However, developer documentation and community support significantly lag CUDA, and Huawei acknowledges that displacing CUDA is not feasible near-term. The software ecosystem remains the harder problem.

HarmonyOS and vertical integration: Huawei launched HarmonyOS NEXT in November 2024—fully independent of Android and Google, running on a bespoke microkernel rather than Linux. By late 2025, the ecosystem had grown to over 300,000 apps and services. Combined with Huawei's Kirin mobile chips, Ascend AI accelerators, and Pangu foundation models (up to 718 billion parameters), Huawei offers end-to-end AI capabilities independent of American technology. The system has knocked iOS to third place in Chinese smartphone market share.

SMIC Manufacturing

China's chipmaking advances have exceeded some expectations, though significant gaps remain.

7nm achievements: SMIC achieved mass production of 7nm-class chips for Huawei's Kirin 9000S in late 2023—without access to EUV lithography. This required 34 DUV patterning steps versus 9 for equivalent EUV processes. Yields have improved from below 40% in late 2023 to 60-70% currently, with capacity of 20,000-30,000 wafers monthly.

5nm challenges: SMIC's 5nm development continues but faces significant hurdles. Early 5nm claims were debunked by analysts (the chips were actually 7nm). Realistic 5nm production may not arrive until 2026, with projected yields of 30-40% and costs 40-50% higher than TSMC equivalents. Meanwhile, TSMC, Samsung, and Intel are ramping 2nm production in late 2025—maintaining a substantial technology gap.

Strategic Implications

Civil-military fusion: China's policy explicitly links civilian AI advances to military applications. Any advanced AI capability developed by Chinese companies is assumed available to the PLA. This justifies broad rather than narrowly targeted export controls.

Regulatory divergence: Chinese AI operates under mandatory content restrictions. Generative AI services must register with the Cyberspace Administration of China, undergo security assessments, and ensure outputs align with CCP values. Penalties reach 15 million RMB or 5% of annual revenue. This creates fundamentally different AI systems: Chinese models exhibit "anticipatory censorship," refusing to engage with politically sensitive topics. Users globally will increasingly choose between ecosystems with different capabilities and constraints.

A two-sided trade: The 2025–26 license shift reframed the competition. Where the U.S. once exercised unilateral control over what chips reached China, both governments now hold veto power: Washington decides what leaves, Beijing decides what arrives. The fact that no H200s had been delivered under the new licenses as of mid-May 2026 — despite U.S. approval and reportedly "very high" Chinese demand — suggests that Beijing prefers to keep export-control concessions in reserve as bargaining leverage rather than spend them on near-term compute access. This is a meaningful change from the simpler "denial vs. evasion" dynamic of 2022–24.

Technology denial has limits: Export controls have bought time but will not prevent Chinese AI development. China has the talent, capital, and increasingly the domestic supply chains to continue advancing. Stockpiles provide 1–2 years of runway; indigenous capabilities are improving; and the new license regime opens — or threatens to open — a controlled channel for high-end chips when Beijing chooses to use it. The relevant question is whether the U.S. maintains a meaningful and durable lead, or whether China achieves rough parity through some combination of efficiency innovations, domestic alternatives, and selectively imported American hardware.

Key Takeaways (Chapter 11):

Export controls have constrained but not stopped Chinese AI development. Multi-billion-dollar stockpiles provide runway while indigenous capabilities mature.

The 2025–26 H200 license regime reframed the trade as two-sided: both Washington and Beijing now hold veto power over high-end chip transfers, and Beijing has thus far chosen to block deliveries rather than accept the U.S. terms.

DeepSeek demonstrated that efficiency innovations can partially offset hardware disadvantages, achieving frontier reasoning performance at dramatically lower cost.

Huawei is building a parallel stack from chips (Ascend) to software (CANN/MindSpore) to devices (HarmonyOS), reducing dependence on American technology — though displacing CUDA remains a long-term challenge.

SMIC has achieved 7nm without EUV, though 5nm remains challenging and the gap to leading-edge nodes persists.

Regulatory divergence means Chinese and Western AI systems will increasingly differ in capabilities and constraints, forcing users to choose between ecosystems.

Chapter 12AI and the Labor Market

The Task-Based View of Automation

Understanding AI's labor market impact requires thinking about tasks rather than jobs. MIT economist David Autor's influential framework—articulated in "Why Are There Still So Many Jobs?"—distinguishes between tasks that are "routine" (sufficiently well-defined to be automated) and those requiring judgment, creativity, or interpersonal interaction. His research documents how previous automation waves hollowed out middle-skill jobs while demand grew for both high-skill analytical work and low-skill personal services—a pattern called job polarization.

Jobs vs. tasks: Most jobs comprise many different tasks. A lawyer researches, writes, negotiates, counsels clients, and appears in court. AI might automate some of these tasks (legal research, first drafts of documents) while leaving others untouched (courtroom advocacy, client relationships). The question isn't "will AI replace lawyers?" but "which legal tasks will AI transform, and what does that mean for how many lawyers we need and what they do?"

Exposure vs. replacement: A task being "exposed" to AI doesn't mean it will be fully automated. AI might handle 80% of a task, with humans reviewing and completing the rest. The actual outcome depends on business decisions, labor markets, and policy choices—not just technical capability.

Historical precedent: Previous automation waves offer both reassurance and caution. When ATMs proliferated in the 1970s-80s, many predicted bank tellers would disappear. Instead, teller employment roughly doubled from 300,000 in 1970 to 600,000 by 2010. ATMs reduced the cost of operating bank branches, leading banks to open more branches—more branches meant more total tellers, even with fewer per branch. Similarly, when spreadsheets arrived in 1979, bookkeepers feared obsolescence. Instead, the accounting profession exploded: the US had about 340,000 accountants in 1980 and 1.4 million by 2022. Automation eliminated tedious arithmetic while making accountants more valuable for interpretation and strategy.

The AI wave may be different. Its breadth (affecting many sectors simultaneously), speed (rapid capability gains), and cognitive nature (targeting tasks previously thought uniquely human) challenge historical patterns. As Autor asks: "Why are there still so many jobs?" The answer has always been that automation creates new needs even as it fills old ones. Whether this remains true for AI is the central question.

The Coding Disruption

Software development offers the clearest window into AI's labor market effects.

The first wave — IDE assistance: GitHub Copilot, launched in 2021, established the baseline pattern: an AI suggests code while the developer types. By 2026 it has roughly 20 million cumulative users, including 90% of Fortune 100 companies, and about 46% of code in Copilot-enabled workflows is AI-generated. Productivity studies on this paradigm have been mixed. Vendor benchmarks show 55% faster task completion and 75% shorter pull-request cycle times, while a rigorous METR study found that experienced open-source developers working on mature projects actually completed tasks 19% slower with AI tools, despite believing they were faster. Code "churn"—code discarded within two weeks of being written—has trended upward as suggestion-acceptance rates climbed.

The second wave — headless coding agents: Through 2025 and into 2026 the dominant pattern at frontier engineering organizations shifted from IDE assistance to headless coding agents: long-running processes that take a goal (an issue, a spec, a failing test) and return a working, tested, reviewed change without continuous human supervision. OpenAI's Codex and Anthropic's Claude Code are the two most widely deployed examples. Both ship as terminal tools, IDE integrations, and web-hosted environments; both are used in production by their parent labs and by external enterprise customers; and both routinely complete tasks that a year ago would have required pairing a model with a human-in-the-IDE supervisor. The agents handle codebase exploration, plan formation, edits, test execution, and self-review as a single autonomous loop.

Parallel agents and near-100% generation: The natural extension of headless agents is to dispatch many of them at once. A single engineer can now run several parallel attempts at a feature—each in its own sandbox or worktree—and review and merge the best result. Several frontier labs and engineering-heavy startups have publicly described workflows in which nearly 100% of new code is initially generated by AI agents, with human engineers acting as reviewers, integrators, and architects rather than authors. This is a qualitative change, not just a quantitative one: the unit of human work shifts from "lines of code" to "agent outputs to evaluate and merge." Productivity is no longer ceilinged by typing speed or headcount but by how quickly humans can specify problems, dispatch agents, and adjudicate their output. Specification quality, test infrastructure, and code review become the new bottlenecks—and the METR study cited above, which found AI most helpful when humans coordinate AI rather than compete with it, now describes the median professional workflow rather than an edge case.

The METR task horizon: The nonprofit METR measures how complex a task AI can complete autonomously. Their research finds that this "50% time horizon" has been doubling approximately every 7 months for six years, and possibly faster since 2024. If the trend continues, frontier AI will handle month-long projects by the end of the decade. For software work specifically, the practical implication is that the headless-agent paradigm is itself transitional: today's agents complete tasks measured in hours; tomorrow's will undertake projects measured in days or weeks.

Companies and Layoffs

The 2024-2025 period saw the first large-scale layoffs explicitly citing AI.

Major examples: Amazon (14,000 jobs—its largest layoff ever), Microsoft (15,000, with CEO Nadella confirming 30% of code is now AI-written), UPS (48,000 via AI-enabled logistics), Salesforce (5,000, AI agents replacing support roles), and IBM (2,000-3,000 back-office positions). By late 2025, over 55,000 US job cuts were attributed to AI according to Challenger, Gray & Christmas — a baseline that has continued to grow through the first half of 2026 as enterprises move from pilots to production agent deployments.

Entry-level impact: The effects concentrate at entry level. US entry-level job postings have fallen 35% since January 2023. Software engineer postings are down 49% from pre-pandemic levels. Big Tech reduced new graduate hiring 25% in 2024. Machine learning engineer roles are the notable exception—up 40% year-over-year.

The regret factor: Not all AI-driven layoffs succeed. Surveys find 55% of employers regret AI-related workforce cuts. Klarna replaced 700 customer service workers with AI, but quality declined and customer satisfaction dropped, forcing human rehiring. The lesson: AI augmentation often outperforms AI replacement.

Policy Responses

Workforce retraining: The standard policy response faces challenges. Programs are expensive, completion rates are low, and workers often end up in lower-paying jobs anyway. If AI capabilities keep advancing, what should workers retrain for? The target keeps moving.

Education adaptation: Longer-term, education systems need to prepare workers for AI-augmented work. This might mean emphasis on skills AI complements (creativity, interpersonal skills, judgment) rather than skills AI replicates (routine analysis, information synthesis). But education systems change slowly.

Distribution questions: If AI dramatically increases productivity, the economic pie grows. The policy question is distribution. Will gains flow to AI companies and investors, to businesses deploying AI, or will workers and consumers share benefits? The historical pattern—automation initially concentrating gains before broader distribution—may repeat, but the timeline and mechanisms are uncertain.

Key Takeaways (Chapter 12):

Think tasks, not jobs: AI automates specific tasks. Impact depends on task composition and business adaptation. Historical automation often transformed jobs rather than eliminating them.

Coding is the leading indicator: The pattern has moved in two years from IDE assistance (Copilot) to headless coding agents (Codex, Claude Code) to parallel-agent workflows in which nearly 100% of new code is initially AI-generated. Human work is shifting from author to reviewer/integrator. The METR task horizon — doubling roughly every 7 months — suggests this trajectory is not yet at equilibrium.

Layoffs are real but concentrated: Over 55,000 jobs attributed to AI in 2025, disproportionately affecting entry-level positions. But many companies regret pure replacement strategies.

Policy is catching up: Traditional responses (retraining, education) face new challenges when the target keeps moving. Distribution of AI's productivity gains remains the central policy question.

Chapter 13AI and Scientific Discovery

The AlphaFold Revolution and Nobel Recognition

DeepMind's AlphaFold represents perhaps the clearest example of AI transforming a scientific field—and in 2024, it earned the ultimate recognition.

The protein folding problem: For decades, predicting a protein's 3D structure from its amino acid sequence was a grand challenge in biology. Structure determines function, and experimental methods (X-ray crystallography, cryo-EM) are slow and expensive. Computational approaches had made incremental progress but couldn't match experimental accuracy.

AlphaFold's breakthrough: AlphaFold, released in 2020 and dramatically improved in subsequent versions, essentially solved the problem. It predicts protein structures with experimental-level accuracy in minutes rather than months. DeepMind has now predicted structures for all 200 million known proteins and made them freely available.

The 2024 Nobel Prize: Demis Hassabis and John Jumper of Google DeepMind received the 2024 Nobel Prize in Chemistry—the first Nobel awarded for an AI-enabled scientific breakthrough, and notable for honoring research from a tech company rather than academia. Over 2 million researchers in 190 countries now use AlphaFold. The original paper has over 10,000 citations; AlphaFold broadly has accumulated over 43,000. Learning AlphaFold is now standard graduate biology training worldwide.

Cell Foundation Models: The Arc Institute

While AlphaFold conquered proteins, a new frontier has opened: foundation models for entire cells and genomes.

The Arc Institute: Founded in 2021 by Stanford biochemistry professor Silvana Konermann, UC Berkeley's Patrick Hsu, and Stripe CEO Patrick Collison, the Arc Institute aims to integrate biology and AI for biomedical research. In November 2024, Arc made the cover of Science with Evo, a 7-billion-parameter genomic foundation model trained on 2.7 million prokaryotic and phage genomes.

Evo's capabilities: Unlike protein-focused models, Evo generalizes across DNA, RNA, and proteins simultaneously. It can generate DNA sequences of over 1 million bases—larger than the genomes of many simple organisms. Most remarkably, Evo designed a functional CRISPR system unknown in nature, demonstrating creative capacity beyond its training data.

Evo 2: In February 2025, Arc released Evo 2, trained on 9.3 trillion nucleotides from 128,000 whole genomes spanning all domains of life—bacteria, archaea, phages, humans, plants, and other eukaryotes. The model can process sequences up to 1 million nucleotides and identify disease-causing mutations in human genes. It's the largest fully open-source AI model in biology, following Arc's philosophy that tools for understanding life should be freely available.

AI for Mathematics

AI is now making contributions to pure mathematics—previously thought beyond machine capability.

Erdos problems: Paul Erdos, the legendary mathematician, left behind roughly 1,100 open problems, many with cash prizes. In November 2025, Harmonic's AI system "Aristotle" solved Erdos Problem #124—a conjecture open since 1995—in 6 hours with no human participation. The system used reinforcement learning, Monte Carlo tree search, and formal verification in Lean. Mathematician Terence Tao noted it solved a "weaker version" due to a missing hypothesis in the original statement, but the achievement was genuine. Google DeepMind has since formalized over 240 Erdos problems in Lean, creating infrastructure for further AI mathematics work.

Caveats: These successes involve structured problems amenable to known techniques. AI isn't yet generating the creative insights that define mathematical research. But the trajectory is clear: AI as mathematical collaborator is becoming reality.

Automated Science: The Kosmos Vision

The most ambitious AI-for-science efforts aim not just to assist scientists but to conduct science autonomously.

Kosmos: Edison Scientific, founded by physicist Sam Rodriques (a FutureHouse spinout), has developed Kosmos—an AI system for autonomous data-driven discovery. Given an objective and dataset, Kosmos runs for up to 12 hours, deploying multiple agents that execute an average of 42,000 lines of code and read 1,500 research papers per run. According to user surveys, a single Kosmos run equals approximately 6 months of PhD or postdoc work. The company has raised $70 million and attracted 30,000 users from academia and biotech.

Self-driving laboratories: Kosmos represents one approach to "AI scientists." Others integrate AI with physical laboratory automation—self-driving labs that can design experiments, execute them robotically, analyze results, and iterate. Applications include reaction optimization, drug discovery, and materials science. The vision: AI that doesn't just suggest experiments but runs them.

The "Compressed Century" Thesis

Anthropic CEO Dario Amodei's October 2024 essay "Machines of Loving Grace" offers the most detailed vision of AI-accelerated science from an AI lab leader.

Core prediction: Amodei argues that powerful AI (which he believes could arrive as early as 2026) could accelerate biology and medicine by a factor of 10 or more—compressing 50-100 years of progress into 5-10 years. Within this timeframe, he predicts: reliable prevention and treatment of nearly all infectious diseases, elimination of most cancer, effective cures for genetic diseases, prevention of Alzheimer's, and potentially doubling human lifespan to 150 years.

Why biology? Amodei argues biology is uniquely suited to AI acceleration because it's data-rich, experimentally well-equipped, and at a knowledge frontier where AI can help. He draws on his background as a biophysicist (PhD from Princeton, postdoc at Stanford) to make these predictions with unusual specificity.

The policy stakes: If Amodei's timeline is even partially correct, the 2020s and 2030s may see advances in biomedicine comparable to the entire 20th century. This would transform healthcare economics, longevity, and geopolitical competition in biotech.

Status as of mid-2026: The "powerful AI by 2026" prediction has arrived at its earliest window. Frontier models now routinely operate as autonomous research agents (e.g., Kosmos, above), and Arc Institute's Evo line, AlphaFold's continued adoption, and the first AI-solved Erdos problems suggest meaningful acceleration in narrow scientific domains. What has not yet arrived is the broad-spectrum biological-discovery breakthrough Amodei described — cancer, infectious disease, and Alzheimer's outcomes look much like they did in 2024. The 5–10-year window remains open; the question is whether the next several years compound the early signals into something that resembles the compressed-century thesis, or whether the trajectory levels off before the most consequential claims materialize.

Implications for Policy and Society

R&D productivity: If AI accelerates discovery, it could reverse decades of declining research productivity—getting more breakthroughs per dollar invested. Some economists see this as the most economically significant application of AI.

Competitive advantage: Nations leading in AI for science may gain advantages in pharmaceuticals, materials, energy, and other strategic sectors. This reinforces the national security framing of AI competition.

Access and equity: Will AI-accelerated science benefit everyone or primarily wealthy nations and well-funded labs? Open access to tools like AlphaFold and Evo helps; proprietary systems benefit their owners. Arc Institute's open-source philosophy reflects one answer to this question.

Dual-use concerns: Accelerated biotechnology research could yield cures—or could create biosecurity risks. AI that accelerates drug discovery also accelerates bioweapon design. The dual-use nature of scientific knowledge is amplified when discovery itself accelerates.

Key Takeaways (Chapter 13):

AlphaFold's Nobel Prize recognizes AI's demonstrated capacity to solve major scientific problems. Over 2 million researchers now use it.

Cell foundation models like Arc Institute's Evo represent the next frontier—AI that understands entire genomes across all domains of life.

AI is contributing to mathematics, with the first genuine AI solutions to open Erdos problems in 2025—though research-level insight remains human domain.

Automated science via systems like Kosmos and self-driving labs points toward AI that doesn't just assist research but conducts it.

The "compressed century" thesis suggests AI could accelerate biology by 10x, with transformative implications for medicine, longevity, and global competition in biotech.