Iran and the opportunities and limits of an “IAEA for AI”
The situation in Iran shows why an IAEA for AI would not be sufficient to prevent AI-related international crises.
Various people have suggested (or critiqued) an “IAEA for AI” – an international institution or set of international institutions, modeled after the International Atomic Energy Agency, that would help address safety and security risks related to AI.
Different people have different versions of this in mind, but I’d say the two key ingredients are:
Access: The IAEA for AI would have access to (some subset of) the world’s compute supply chains and thereby know where the biggest datacenters are, much like the IAEA keeps tabs on enriched uranium. It would be able to inspect those datacenters to make sure GPUs aren’t going missing and that model weights aren’t being snuck out on thumb drives, and it would be able to audit the AI systems trained or deployed at the largest datacenters against some set of safety and security standards.
Credibility: The IAEA for AI would have to be trusted by all (or almost all) parties as not only having expertise, but also being solely focused on ascertaining the facts. This would be accomplished through rigorous processes and professional norms and having a highly international staff. It would have to know what it’s talking about and also be seen as a neutral arbiter that is not “putting its thumbs on the scale” of a particular conclusion.
Despite the actual IAEA arguably having these properties to a large extent, and no one really doubting their top-line conclusions about Iran, bombs are still falling.
I’m personally a fan of the “IAEA for AI” concept and think that the nuclear/compute analogy is better than many people think (and actually, I think computing supply chains are much easier to monitor than nuclear ones – there is only one TSMC and one ASML, but there are nine and maybe soon ten nuclear powers).
At the same time, the current situation in Iran shows that even if an “IAEA for AI” is necessary for some purposes, it won’t be sufficient for resolving the tricky geopolitical issues raised by AI.
Aligned vs. misaligned interests
One reason that an IAEA for AI might be necessary eventually is that the world actually agrees on a lot of things related to AI, and could benefit from having a common set of baseline rules. People don’t generally want AI to go rogue, they don’t generally want for it to be easy to use AI to create biological weapons, etc.
Actually implementing seemingly-commonsense safety and security standards like this one may easier said than done, just as nuclear safety and non-proliferation are difficult in practice, but there is reason for hope based on the fact that some risks are ~universally objectionable. Screenshot from here.
So it may be possible to negotiate some set of bare minimum safety and security standards, and then audit everyone (at least everyone who has very large amounts of computing power) against those standards. This would give everyone the confidence that at least everyone’s doing the basics, and that no one has to completely throw caution to the wind in the “AI race.”
Breathing room in an otherwise very competitive situation is valuable, just as the IAEA helps to ensure safe nuclear energy production, and has avoided a Chernobyl-tier safety incident since Chernobyl, while also avoiding worst case security risks like diversion of nuclear weapons to terrorists.
But as Iran reminds us, just because most people have a lot of common ground most of the time doesn’t mean there won’t be conflict sometimes, and we should expect that to be true of AI, too. Iran seems to think (or at least be entertaining the idea) that they will be more secure if they build nuclear weapons even though it makes a lot of other people (especially Israel) feel less secure. An IAEA for AI, on its own, would not tell us what to do when someone doesn’t want to play by the rules or when the rules don’t address the situation at hand (e.g., because the rules have a “lowest common denominator” characteristic, only covering the very most egregious risks).
The last mile problem
As I alluded to above, people generally seem to agree on the facts of the situation in Iran – but not quite. There’s a “last mile” problem in the sense that there are some details of the situation, such as whether Iran has decided to actually weaponize their nuclear capabilities, which apparently the US and Israel disagree about. I’m not a nuclear expert but my rough sense is that some things are much easier to verify through inspections than others, and the “weaponization decision” part is trickier than the “large-scale uranium enrichment” part. I think it’s pretty likely that there will be analogies to this situation in AI.
For example, it may be very feasible to keep track of the top 10 largest datacenters and top 10 most capable AI models in the world since each one will involve multi-billion dollar investments. But it will be much harder to keep track of the top 100 datacenters or top 100 models, let alone the top 1000, etc. And it may also be very difficult to keep track of whether a company has disabled some critically important safety feature that they found was slowing down their effort to create an automated AI R+D loop.
There’s a concept in nuclear non-proliferation of “breakout times” – how long, once a decision has been made to build a nuclear weapon, until you actually have it, and how does this vary with the starting point, the technical capabilities of the country, etc.? The Iran nuclear agreement (JCPOA) was based around the idea of freezing Iran’s nuclear weapon potential in place and carefully tracking that they weren’t moving any closer to a weapon – achieving a breakout time of around a year. Longer is generally better since there is then more time to negotiate a resolution to an emerging crisis.
(For the avoidance of doubt, I want to note that I think it’s plausible we could have avoided the current crisis if the JCPOA were still in place. Even if true, the larger point is still true, namely that “the IAEA existing” is insufficient on its own)
Screenshot taken from here.
With AI, breakout times could conceivably be very short – it could be hours, minutes, or even seconds for some very specific but important change to occur. I don’t think this is self-evident, but it’s at least plausible. Perhaps the most dangerous activities will take months or more. We just don’t know yet.
Fortunately, the more capable and dangerous AI gets, the more capacity we’ll have to have highly intelligent and, critically, credible “AI auditors” helping us out. This raises the question: how feasible is it to automate an IAEA’s ability to keep track of that last mile, helping us “fight fire with fire”? And for what kinds of risks will this be feasible? I don’t think anyone knows, but I think people interested in AI governance should think more about this kind of issue.