Reasons that AI Can Help with AI Governance
How much it can help exactly is unclear, but AI is clearly a part of the solution to its own problems.
Introduction
It’s easy to overstate the case for AI being the solution to AI-related problems. For example, one simple case for open source AI absolutism is something like “good guys with AI will always beat bad guys with AI.” This sounds nice but embeds a bunch of assumptions about offense-defense balance scaling, the resources available to different actors, etc. And it’s quite easy to think of ways in which that perspective could be wrong.
For example, it seems like non-AI-related defenses against pandemics (e.g., physical stuff like ventilation and far-UVC) will be essential in order to prepare for a world in which AI “democratizes” the ability to create biological weapons. It doesn’t matter how much AI you throw at better vaccines if you can’t distribute those vaccines fast enough to stop a super contagious outbreak. It’s a simple reality that problems and solutions don’t always occur at the same “level” or in the same “domain” (e.g., in this case, the domains of bits vs. atoms).
But it’s also easy to understate the case for AI being helpful in cleaning up its own messes, i.e., by not considering this at all. AI is a general purpose technology – so of course it can be used to help with a lot of stuff, including governing AI. Insofar as there are significant ways that AI can be usefully deployed in service of governing AI, the overall AI governance challenge will be easier than it otherwise would be, and I think there are many such ways. I’m not sure exactly how much it can help, but I give three reasons below why we can be confident the answer is “more than zero.”
Reason 1: It’s Happening Already in Safety and Security
There are tons of ways that AI can help with AI-related safety and security. The examples below provide a proof of concept of the “AI can help somewhat with AI governance” thesis (since safety and security are parts of AI governance), and they also give some suggestive evidence that there could be more such cases outside of the safety/security context since there’s nothing obviously special about these domains.
(Note that I’m mostly using examples from OpenAI since it’s the company I’m most familiar with, but there are related efforts at various organizations.)
AI of course introduces safety issues, but also can itself be used in various positive ways for safety purposes. AI can help humans oversee other AIs, by spotting flaws in the “target” AI’s reasoning. It can help monitor for misuse and other safety issues associated with AI as it’s being used. It can generate feedback during the reinforcement learning process. And more generally, it can be used to automate research on safety.
Some of these are more speculative but others — reinforcement learning, abuse monitoring — are happening at large scale at many companies everyday. It’s unclear how much this should make us optimistic about AI safety being likely to get solved, and we shouldn’t get complacent, but clearly there are some use cases.
Similarly, in the domain of security, again AI can create issues but also can help improve security in various ways. There is already at least one documented case of language models helping find vulnerabilities in real code. There are tons of other ideas being explored in this space, and again some are more speculative and some are daily practices.
The security-related applications of AI aren’t just in computer security but also in other areas that might be threatened by AI like epistemic security: AI can be used maliciously to create risks to public knowledge and democratic decision-making, e.g., via more heavily automated disinformation campaigns, but it can also help mitigate those same risks. For example, in a recent report on several interrupted disinformation campaigns, authors at OpenAI wrote:
Throughout this investigation, our security teams leveraged ChatGPT to analyze, categorize, translate, and summarize interactions from adversary accounts. This enabled us to rapidly derive insights from large datasets while minimizing the resources required for this work.
Reason 2: The training/inference gap
It’s more expensive to train a single AI system than to run that same system. This asymmetry has various implications, some of which are scary from a governance perspective (e.g., an unsafe AI system could be scaled up a lot after being trained, using the same compute).
But there are also ways in which this asymmetry could be helpful governance-wise. The training/inference gap means that “old AIs can gang up on new AIs”: by the time you create the first AI of a new generation, there will be gazillions of copies of older AIs, and they can be “on deck” to spot the smallest sign of danger in the new AI, and to nip shenanigans in the bud.
Compute can be leveraged asymmetrically not just by creating a lot of copies of a system for defensive and beneficial purposes, but also scaling up performance on a very specific task via scaling up test-time compute. You might, for example, take an existing AI system and sample from it thousands or millions of times, or with a very long chain of thought, or both, in order to find signs of problems in a new AI system.
Note that this point applies not just to catastrophic safety issues like “making sure the AI doesn’t kill everyone” but also a wide range of governance tasks.
Suppose that two companies or countries are trying to work out an agreement but the negotiations have stalled. In theory, they might love the idea of bringing in a “mediator AI,” that would – given confidential information from both sides that provably would not be leaked – suggest the best possible compromise according to mutually agreed-upon criteria. In order to be useful, both sides would need to be extremely sure that the system works as advertised, and both sides could throw a ton of compute (using their own AIs) at scrutinizing the proposed mediator AI before using it.
As with the open source absolutism argument we considered above, there are lots of assumptions being baked in here. I’m not saying it’s an inevitability that this asymmetry saves us. Critically, in order for this argument to work, you need to make sure that a sizable fraction of the world’s compute is actually being used for this purpose, and not just helping with homework and ads and generating dumb blog post graphics, etc. How to ensure that this potential asymmetry is actually leveraged in practice is something I’d like to see more work on.
Another way in which the training/inference asymmetry can work in our favor here is that “good AIs” can leverage a lot of training and test-time compute in order to specialize in specific areas. These specializations can confer an advantage over more generic “bad AIs.” Of course, some bad AIs may also be specialized (e.g., for cyber-offense), but some of the concerns around AI safety specifically relate to progress in highly general systems of the kind that are the primary focus of major AI companies today, so it’s worth thinking about what all can be done about that.
For example, AIs specialized specifically for cyber-defense, or detecting signs of deception in another AI, might have an advantage over an overall more sophisticated AI that is less expert in those areas. This again could give “the past and present” of humans and AI an edge over the potentially dangerous “future.”
Reason 3: Improving human decision-making
I mentioned mediator AIs as an example of a good use of compute, but there’s a more general point here, which is that AI can help humans make better decisions. Governance is, after all, at least today, about humans deciding stuff, often with the aid of technology. So how can AI specifically help humans make better decisions?
There’s a growing literature on this that I won’t be able to do justice to here, but I’ll briefly share some of my personal hot takes on exciting directions:
Mediator AIs (see related discussion above). These could reduce transaction costs (for finding a cooperative solution) and provide a workaround to the issue of asymmetrical information, both of which can be major drivers of conflicts between people and groups. This can be helpful for improving decision-making generally but, per the subject of this blog post, it can be helpful in negotiating solutions to AI-related issues. For example, legislators from different parties in one country could use such an AI to help them find common ground in addressing the labor market impacts of AI, or multiple countries could do so when negotiating an agreement on AI safety.
Advisor AIs. While everything is getting disrupted (in good and bad ways) by AI, human decision-makers will have an increasingly intelligent source of advice during this process – AI. The intelligence on these advisor AIs can be cranked up for especially high stakes decisions with more test-time compute, as discussed above. I’m particularly excited about the use of AI to help humans avoid groupthink in collective decision-making settings.
Simulator AIs. Increasingly realistic simulations of humans (and AIs) in various situations can be carried out, again leveraging AI, in order to think through novel situations. If appropriately designed and used, these simulator AIs could be particularly useful in uncovering simulations that challenge decision-makers’ assumptions about how things might play out, and which help surface creative solutions that wouldn’t have been considered otherwise.
None of these use cases is being treated as an urgent area to invest and innovate today, so they are quite speculative right now. But we (you?) could change that. The underlying “base” AI capabilities will be there eventually, but will the ideal model behavior be well-studied? Will the right UX have been discovered? Will there be promising pilot experiments that motivate more investment, and consideration of high-profile uses?
Conclusion
I’ve given three reasons that AI could help us govern AI, besides the fact that it’s a general purpose technology. That doesn’t mean that it actually will be used to the maximum possible extent (or anywhere close to that). I’d also like to see more systematic philanthropic and government support for these kinds of applications; more work on these applications by people in academia, civil society, and industry who have the freedom to do so; and more analysis of policy levers for increasing fraction of compute applied towards defensive and beneficial uses of AI rather than malicious and harmful uses.