Scoring Humanity’s Progress on AI Governance
Note: this was originally published on Medium here in May 2023.
Succeeding at AI governance (not merely avoiding catastrophe but also building a much better and fairer world) seems doable, and is not especially mysterious–it just takes a lot of work.
If we succeed (which is of course not certain), it seems like it will require some or all of the following ingredients (and maybe some others):
Shared understanding of the challenge
Technical tooling
Regulatory infrastructure
Legitimacy
Societal resilience
Differential technological development
I don’t have an airtight argument as to why they are jointly sufficient (or jointly close enough to being sufficient that we can feel pretty good about this being a helpful direction to push in, and then we can just tack on stuff later). But I think it’s a good starting point and we should invest a lot in all of these.
If the above are all achieved, then we’ll–approximately — have a rough understanding of the issues at stake, have the regulatory and technical tools to do something about it, and be resilient to big societal changes. And we will (as suggested by “differential technological development”) be both maximizing the percentage of AI applications that are beneficial and minimizing the prevalence of harmful AI. That’s a pretty good situation. Issues will surely come up but we should be in a pretty good position to handle them.
Speed of investment in each of these domains is important given that we don’t know how much time we have (more on this below).
In considering what scores to assign, I roughly imagined that we achieve AGI/superintelligence sometime this decade (see Discussion for more), but I averaged together a range of possibilities within that window, and“baked in” the current trajectory of progress in each area. So e.g. an A- doesn’t mean that we’re ready today, but that I expect that if things continue roughly as they currently seem to be trending in my view, we’ll end up judging our progress as A- level-ish in retrospect. And there is some “grading on a curve” going on such that I’d probably never actually give us an A or A+ in any category, since I’d want to encourage people to look for things we might be missing.
Shared understanding
What we need to do: Create a shared understanding of the upsides and downsides of AI to align perceived interests and incentives.
Why it’s necessary: As argued in Askell, Brundage, and Hadfield (2019), shared beliefs in high collective upsides and high collective downsides from AI make cooperation more likely.
Why it’s possible: AI has the potential to be highly positive-sum by helping solve shared societal problems, is non-excludable/non-rival once created (i.e. it can be shared by multiple parties), and its risks could be global in nature.
Grade: B- — The general public agrees on the risks being significant, and supports regulation. Policymakers are now talking about AI a lot, and there are some moves in the direction of taking race dynamics seriously. I am cautiously optimistic that the increasing severity/scale of the societal impacts of AI will make the shared stakes even more salient over time.
Note: My overall gut level of optimism is closer to this grade than the ones below, because I think this category is a leading indicator of the others–but we’ll see. I mostly do not take that into account in the scores below so as to preserve some independence across categories, hence why the scores are often much worse than this.
Technical tooling
What we need to do: Make rapid technical progress in safety and policy research (including alignment, interpretability, dangerous capability and proliferation evaluations, and proof-of-learning) so that we have the tools we need at the right time in order to ensure risks are appropriately managed.
Why it’s necessary: We need the technical capacity to manage risks and govern AI, in addition to the desire and incentives to do so.
Why it’s possible: I’m not sure how to prove this but I don’t know of any reason to think it’s physically impossible to govern AI, just arguments for why it may be challenging.
Grade: C- — A fair amount of people are working on this, but many more people should be working on it, and a lot of investment in it seems inefficient (e.g. people starting random “AI safety startups” with unclear theories of change and having to deal with a bunch of management/fundraising overhead).
Regulatory infrastructure
What we need to do: Further incentivize sufficient safety by regulating high-stakes development and deployment via mechanisms like reporting and licensing requirements (to ensure regulatory visibility into frontier AI research and deployment), third party auditing of risk assessment, incident reporting, liability insurance, compute governance, and confidence-building measures. Regulation should cover AI inputs (e.g. compute), processes (e.g. safe development), and outputs (e.g. first party use, API based deployment, and individual use cases), and should take an “ecosystem approach,” i.e. taking into account that AI risks cannot be fully managed at the point of inference (e.g. discovering and addressing a particular deceptive use of AI may depend heavily on context, and will require more wide ranging interventions touching on social media policy enforcement, advertising regulation, etc.).
Why it’s necessary: By default, real or perceived local incentives may not be totally aligned with the common good, and people may not have the means of coordinating or the information needed to coordinate.
Why it’s possible: We have solved(ish) other big regulatory issues before, and I am not aware of a slam dunk argument against this one being solvable.
Grade: C+ — As of the past few months, policymakers have started paying serious attention to this issue. It remains to be seen whether the AI Act + the recent burst of White House attention + the general raised profile of AI will lead to sufficiently fast progress here.
Legitimacy
What we need to do: Experiment with novel means of public input into private decisions, increase the extent to which democratic governments are informed about and involved in AI development and deployment, and increase the transparency of private decision-making.
Why it’s necessary: Absent some means of legitimate decision-making, AI may enable consolidation of power or be managed in a way that is susceptible to groupthink, self-favoritism, and other failure modes.
Why it’s possible: We have developed lots of ways of eliciting and aggregating human values (elections, surveys, etc.) and while there are some impossibility results for specific kinds of outcomes, under more reasonable assumptions it seems possible to have legitimate-ish decision-making.
Grade: D+ — There’s been relatively little effort on this to date. OpenAI is investing a lot in it at the moment and I am excited for that to continue but I am setting a high bar for what counts as success here (roughly something like equal influence on AI development/deployment across people — see e.g. Danielle Allen’s latest book for some related concepts). Another reason why it’s hard to get a high score is that authoritarian governments being in power (and governing AI development/deployment in their jurisdictions) might put some upper limit on the score I’d assign here.
Societal resilience
What we need to do: To prevent pollution of the epistemic commons and increase humanity’s ability to navigate this turbulent period, there should be heavy investment in informed public deliberation via proof of personhood, media provenance, AI literacy, and assistive AI technologies. To prepare for rapid AI-enabled economic disruption, shore up the social safety nets globally and begin global distribution of AI-enabled productivity gains (leveraging proof-of-personhood techniques).
Why it’s necessary: Society could be captured by a small group of humans + many copies of AIs by creating the false impression of consensus; reality could splinter into many echo chambers; decision-making in crises could become impossible; there could be civil wars/toppling of legitimate governments etc. due to economic churn and political misuse of generative AI; AI could enable stable totalitarianism; etc.
Why it’s possible: We used to have a world where we could be totally sure human-looking outputs were human-generated; there are lots of ways to address this such as cryptography and biometrics.
Grade: F — We already have deepfakes, TTS scams, etc. running amok and there doesn’t yet seem to be an appreciation of the fact that provenance, watermarking etc. are all “whack-a-mole” type situations w.r.t. both modalities and providers. The politics of AI job displacement is about to blow up and policymakers (and even most in AI) don’t really appreciate the extent of this. Also, solving these things seems hard (lots of collective action issues around standard design + implementation + building demand for their use, plus free speech concerns, privacy concerns with biometric based proof of personhood solutions, an unclear and contested endgame for the future of work, etc.).
Differential technological development
(context on the term: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4213670)
What we need to do: Leverage all of the asymmetric advantages that humanity has over future misaligned AIs, malicious actors, and authoritarian leaders–namely, more time to prepare, greater (initial) numbers of people and AIs, greater (initial) amounts of compute, a compelling objective many can rally around (avoiding human disempowerment or extinction), and having “the initiative.” Use these to invest in societal defenses such as dramatically improved cybersecurity, physical defenses, AI-enabled strategic advice, AI literacy, and a capacity for fast institutional adaptation.
Why it’s necessary: If we don’t leverage our advantages and fortify society, we will eventually be cognitively outmatched by the first unintentionally or intentionally misaligned AGIs that arise, and/or they will be used for really bad purposes. So we need to plan for that in advance.
Why it’s possible: We have the advantages above, and have steered technological trajectories before (e.g. purchase guarantees for vaccines).
Grade: D+ — Basically everything good happening with AI is happening because of “the market” (which is good at some things and not others). Many AI applications are of dubious value. There has yet to be a significant push to shift the balance of AI uses in a big way (other than regulation of unsafe stuff and nominal/not super well prioritized “AI for good” investments). AI contributing to military conflict is a serious possibility.
Discussion
Note that I don’t say much about AGI or superintelligence, or any particular technical capability generally, here. There is a reason for this, which is that I think the structural issues involved in AI policy are similar at most points in time–the specific safety and other issues will change, but there will be common themes such as there being a small number of leading labs with disproportionate computing power, steady diffusion of capabilities to lesser labs, and AI being a general purpose technology that can be explicitly directed towards some purposes (e.g. defensive rather than offensive) instead of others.
The temporal dimension of AI governance is tricky to reason about: some irreversible events and trends are possible or perhaps even likely, so in that sense there is finite time. But also many strategic concepts (like differential technological development) are “atemporal” in that we should be doing similar things regardless of the current state of capabilities. And also, some interventions will speed up over time conditional on the right ingredients being in place, since the stakes will become more salient and the tools will become more capable.
Acknowledgments/disclaimers: Thanks to Sam Manning, Girish Sastry, Gretchen Krueger, Josh Achiam, William Saunders, Daniel Kokotajlo, and others for helpful discussions. All errors here are my own and I am only speaking for myself, not necessarily anyone else at OpenAI.