14 Comments
User's avatar
Oliver Sourbut's avatar

Well said. I've been arguing almost exactly this internally! Delaying proliferation is one thing; forecasting (inc. using frontier for empirics) and mobilising resilience is another. I've also emphasised that 'automation isn't automatic', so using frontier to defend against trailing isn't a free lunch: prep pays dividends (related to your footnote 2).

Expand full comment
Helen Toner's avatar

Oh I love "automation isn't automatic" - great way of putting one of the challenges here.

Expand full comment
Oliver Sourbut's avatar

As well as being an action-relevant consideration for defence, it's also a relevant slogan for risk forecasting. What does 'decisive strategic advantage' actually look like (if anything) in practice? I think where it's R&D-bound (always?), the putative DSAs might take time and resources to develop, which might have detectable signatures. Unclear whether this is good or bad (cautiously, good).

Expand full comment
ARX-Han's avatar

This is a great policy post - one of the best I've read in the space.

I hope you get more traction on this, since state capacity seems to be exclusively pointed at the race dynamic itself.

Expand full comment
Max Langenkamp's avatar

I like this a lot!

A complementary idea I've been writing about (and gave a talk on at the internet archive) is about focusing on the digital-to-physical interface. The focus on AI frontier capabilities to me lends itself to "AI recursively self improves and gains unimaginable power" —  both feel like they muddy more than they clarify. Since my focus is biosecurity, I want to focus on how they manipulate molecules that then hurt us. At least for biosecurity, I'd say focusing on the digital-to-physical interface is a helpful heuristic to take advantage of the adaptation buffer. (I have to say the phrase 'adaptation buffer' is a little hard for me to remember)

"We are not currently set up to reliably prevent such attacks, and even a single attack could be catastrophic (i.e. cause thousands or millions of deaths, perhaps even more)" I'd add the caveat that, at least with biological attacks, I expect to see a couple of attempts failing before a catastrophic attack (which to be clear can still be singular)

Expand full comment
John Wittle's avatar

I worry that the offense/defense asymmetry might be really, really imbalanced, might not be anywhere near calibrated to the kinds of domains we're used to

I keep thinking of Deviant Ollam's various talks on physical, real-world security, the ones he gives at blackhat and defcon and hope... He shows how easy it is to drive a truck full of fluid barrels up to a water treatment plant at night, break in non-destructively and without setting off any tampering alarms, pull the barrels right up to the post-treatment reservoir and demonstrate the ability to dump them in undetected, then drive away

Or how easy it is to break into basically any building in the world if it tries to leverage its elevator as part of its security model (which is far, far too many of them), needing to swipe a badge to access certain floors or whatever. Buy the key on amazon for $3 and not only gain root access to the elevator, but also the elevator security logs

So much of our world has never really been glanced at by someone with a security mindset, and so much of it implies that offense is way, way easier than defense

What does your model, where defense sort of stays a few steps ahead of offense, look like if the imbalance is just too large to overcome? That's my fear, and that's where I think a lot of the totalitarian solutions come in. Because if you actually can make nukes out of dirt and sand in your back yard, then we really do have to monitor every atom of dirt and sand in the entire world...

Idk. Mostly I'm just despondent about the whole prospect.

Expand full comment
Helen Toner's avatar

Wait, maybe I'm misunderstanding - but shouldn't the idea that huge vulnerabilities *already* exist make us more optimistic? Since it suggests that the bottleneck is actually on the number of people who want to cause massive harm, rather than the difficulty of causing it? I think this is a hard-to-model aspect of all this - figuring out how many attempts of what kinds should we expect.

Expand full comment
John Wittle's avatar

you're right and i don't know what it means

right now the status quo seems to be that you can do some really, really horrible stuff, and have an enormous negative impact on many millions of lives, especially if you are willing to get caught... but nobody really ever does, for some reason

i can't see a really good reason why the advent of AGI would change this, exactly... but it still feels like a big wrench getting thrown into the gears... it means "is offense hugely impactful" might be less of a factor, and "will anybody attempt offense" more of a factor, maybe? but i'm not sure how to reason about either of those questions tbh

it also might be the case that, as per your model, deploying AI to defensive purposes will *solve* this "nothing is secure" problem in software, in a way that would be much harder in physical space

i don't know, i just felt a bit uneasy about your chart, seeing the dollar values plotted against capacities and thinking "but what if defense and offense are totally uncoupled, the lines might not intersect at all, offense might just grow faster than defense"

Expand full comment
Jeanne Dietsch's avatar

AI "white blood cells" makes sense to me. These should probably be sent from the "frontier" in front of bad actions as well as the hacker communities watching in real time.

Expand full comment
Benjamin's avatar

What about not developing exponentially more powerful models and compute?

Non proliferation will pose challenges to democracy and concentration of power. I am unsure to what degree these risks might still exist just through the frontier models (even in a proliferation world) and capitalism especially if there is some kind of recursive self-improvement.

No matter what we need to find solutions to these problems and a split in the AI safety community (https://knightcolumbia.org/content/ai-as-normal-technology?utm_source=substack&utm_medium=email) instead of working together seems somewhat likely and very harmful.

Focus on commonalities doomers (also support solutions that create a fair world, stop gradual disempowerment, prevent concentration of power), non-doomers slow things down so you can realistically implement your solutions.

Why not slow things down if coordination is in any way possible. Technological progress is exponential that doesn't mean that our adaption to it has to be exponential.

At some point of technology totalitarian surveillance might be necessary but then let's make sure it's implemented in a non catastrophic way. This seems more likely if we aren't in a situation where we quickly need to implement it because of new threats arising with new frontier models, better computer and possibly unlimited proliferation. However, maybe both sites can agree on limiting new frontier models and new compute.

---

One thing I am worried about is that switching form proliferation to not might be impossible because of attractor states and lock-in.

Expand full comment
Roderick Read's avatar

I doubt a self improving AI wants to steer us away from better resource distribution as a world model optimisation outcome.

Will it encourage religious dogma or see the advantages of wider community? Surely it goes for larger picture.

Would it be happy waiting 1000 years for the dust to settle and carry on without us? Nobody wants that for their pets.

I think we'll still be amusing enough to keep

Expand full comment
JC's avatar

This seems silly when sufficiently strong AI will, inevitably, by definition, kill us all. That should be the focus here.

Expand full comment
Joe Gruenbaum 🇺🇦's avatar

Helen, it is with great regard for you and what you stand for and work for that I write this: I think the buffers are truly unjust. I think incentives shape AI optimization functions: growth incentives and profit revenue per interaction, optimized by distributed cumulative gain. However your talk of terrorism neglects the everyday, normal uses of AI which present an even greater strategic loss of personal data! For example, deepseek is 100 percent jailbroken, as in, in studies it has proven to leak everything.

OpenAI and the IC are colluding at this point, and Trump is president. Is the threat currently some imaginary terrorist running Llama swarms?

Or is it a tech-industry complex driven by profit motive that refuses to return us to the core of what American intellectual property protection used to be: a warren and brandies guarding of the intellectual task and intellectual privacy.

What of OpenAI's attitude on studio Ghibli? What of the anthropic contracts for defense, sickening as though they may be?

Is the real harm primarily the access to powerful foundational models for "terrorists" which in itself is a word only defined in S+ circles though Kant and other democratic theorists would posit defining "enemy" or "security" behind a viel somewhat erases the purpose of a democratic peace?

No, I think we know the answer. The real threat is not those who we label "terrorists"—it is what AI does with a definition of terrorism provided by Little Marco.

Expand full comment
TheAISlop's avatar

Good article. Thank you. Nonproliferation frameworks depend on trust, yet history shows trust degrades quickly when existential stakes are involved. Relying on frontier labs to forgo leveraging their coding advantage is a weak foundation for governance; the real competition is in code, not policy, and good intentions are not sufficient protection.

Expand full comment