Standards, Incentives, and Evidence: The Frontier AI Governance Triad
One way of thinking about the ingredients we need to make frontier AI systems safe and secure.
Introduction
There are many ways in which you can take the vague question of “how is this AI governance stuff going?”, break it down into bite-sized chunks, and measure society’s progress. I’ve discussed this topic a bit before, and while my thinking is still evolving, I want to share bits and pieces of my views as they come into focus.
One framework I find helpful can be summarized as: “governing frontier AI requires standards for safety and security, incentives for the leading AI developers to follow those standards, and evidence that the standards are being followed.” With this triad, frontier AI companies will have a clear sense of what counts as “safe and secure enough,” they will be incentivized to meet those standards, and other parties (including competitors, governments, the general public, etc.) will be reassured that they are actually following the standards.
This triad is not exhaustive. We also need — at least — technical solutions for safety and security, investment in society’s resilience to AI misuse and accidents, ways of making sure we actually know who the leading AI developers are (e.g., via tracking high-end computing hardware), processes for sharing safety and security incidents, etc. But I find the triad helpful, and I hope you will, as well.
Standards
By “standards,” I mean a detailed and mutually agreed upon articulation of what counts as “safe and secure enough” for the most capable AI systems. AI safety and security standards should be detailed enough to be actionable and predictable where desirable and more flexible where desirable, with there being updates over time in both directions via some sort of expertise-heavy process. Standards are much better established in other industries like food safety, airplane safety, nuclear safety, etc. For example, there are limits on the amount of contaminants in our food supply and the kind of testing that needs to be done for airplanes.
A risk assessment matrix that forms part of the US Department of Defense’s standard for system safety. The meaning of these terms is defined more precisely here.
Related concepts include best practices, voluntary commitments, frontier safety policies, norms, technical standards, etc. “Technical standards” has a more specific meaning, e.g. being codified by a “standards setting organization” – I think you could have “standards” in my sense without having technical standards in the typical sense (e.g., because it’s too hard to be precise about some things), and vice versa (e.g., technical standards that few people read/care about and which are paywalled, which is common today, unfortunately).
Regulation might make reference a particular set of standards or call for the creation of a set of standards. Some examples of potentially important standards efforts include the General-Purpose AI Code of Practice (called for by the AI Act, which I’ve discussed before here, here, and here) and the series of best practices being articulated by the Frontier Model Forum (FMF). It may be good for standards to evolve, with a tight feedback loop to daily practice, via some sort of private entities such as an industry body like the FMF, or via a regulatory market.
Incentives
My personal preferred path for incentivizing compliance with AI standards is for the US federal government and the EU government to collaborate on a set of expert-informed safety and security standards, and impose fines on violators of these standards (while also incentivizing honest reporting of mistakes). I’d like to see private actors drive compliance with these standards or variants on them (e.g., meeting a higher bar may be associated with lower insurance premiums). Multiple, independent “demand signals” AI standards seems good to me from the perspective of avoiding capture and enabling a defense-in-depth approach.
Cyber insurance premiums over time (to be clear, it’s not obvious to me that cyber insurance is a great success story, though I think a range of mechanisms for incentivizing safeguards should be considered). Figure excerpted from this article by Swiss Re.
But not everyone agrees with me about that vision for the future, and that’s fine. Critically, you can support the basic idea of an AI governance triad while still preferring that we incentivize compliance in another way – such as via a governmental “carrot” rather than a “stick,” or through private carrots and sticks rather than governmental ones.
Examples of alternative demand signals for AI standards include:
Government procurement requirements
Company procurement requirements
Investor due diligence on large capital investments and loans
Insurer criteria that incentivize higher standards (for AI-specific insurance, cyber insurance, etc.)
Consumer purchasing behavior, informed by private evaluations/scorecards (think “Consumer Reports for AI”)
Requirements associated with importing restricted items (e.g., high-end computing hardware)
This is not exhaustive, and again, I don’t think we have to pick a single one. In my ideal world, there will be some convergence of standards (multiple demand signals for the same thing, in case, e.g., a company is more concerned with/affected by litigation risk than insurance premiums or vice versa), and some divergence (so we don’t have to choose between setting a bare minimum and pushing for continuous improvement).
Evidence
A company can say their system is safe and secure, or that they follow a given set of standards, but should you believe them? As AI gets more capable, the risks associated with mistaken claims, fraudulent claims, or too-vague-to-be-verified claims will increase, and a cloud of suspicion between competitors could drive corner-cutting (indeed, this is happening already and could get worse). Ideally, organizations building and/or deploying frontier will provide sufficient evidence to believe their claims.
Evidence that an organization is following AI standards can come in many forms. I think the question of how to ensure this evidence is compelling – including to very skeptical parties with very different goals, values, etc. – is a key one and merits much more analysis. But briefly, I would say that there are at least three key categories of evidence.
A company can be transparent about the safety and security properties of particular systems or their whole organization, with this transparency either being voluntary or required. Some things can just be stated or shared, and easily verified by third parties (e.g., performance on a certain safety-related benchmark). But transparency won’t always be possible, and some information needs to be shared with a more limited set of parties, leading to the second category.
External assessment of frontier AI can help in cases where outsiders don’t want to take a company at their word about something, but are OK with listening to an independent third party. External assessment can range from fairly shallow black-box evaluations which just use the existing product affordances (e.g., the API or a graphical user interface) all the way through to full-blown “AI audits” that look at normally-non-public information such as training data, internal documents, etc. Again, this is an area where much more could be said.
A company can have a policy protecting (certain categories of) whistleblowers on safety and security issues, so that outsiders can have more confidence that if there were conflicting evidence, they’d hear about it. The stronger such policies are, the more “absence of evidence” (lack of whistleblowing) can become “evidence of absence” (regarding violations). There is a lot of bipartisan interest in protecting AI whistleblowers, and a growing set of external infrastructure to help AI company employees navigate related situations – precisely because the first and second categories are imperfect. There has also been discussion of subsidizing whistleblowing in the context of export control violations. The basic logic behind whistleblower bounties, which have been successful in the financial sector, could also apply to frontier AI safety and security more generally.
Again, I prefer a defense-in-depth approach in this and many other areas. Ideally, the evidence provided by companies will be structured in some way that allows critical public scrutiny, such as in the form of a “safety case” for a particular AI system. System-level safety and security cases might be nested inside a larger “organizational safety and security case,” which makes reference to processes for external assessment as well as whistleblowing protections.
An example of a safety case that might be made for an AI system. Taken from here.
Related concepts include verification, third-party auditing and assessment, compliance reviews, structured access, safety evaluations, AI assessments, benchmarks, penetration testing, etc. Key frontiers include negotiating sufficient access for third parties to come to informed conclusions while protecting sensitive information, and demonstrating the safety and security of internal deployments. There are also various legislative efforts to ensure sufficient whistleblower protections for AI company staff.
Conclusion
This triad is not intended to be a complete description of current AI governance or a full prescription, either. But I think it has significant utility for concisely describing the situation we’re in today – e.g., as just one example, the EU’s AI Act articulates a particular set of incentives (e.g., fees up to some percentage of corporate income), these are associated with standards that are being hashed out in public via the General-Purpose AI Code of Practice, and there is debate about how evidence should work, e.g., what role third party assessment should play. The triad also provides a high-level roadmap for investments that entrepreneurs, executives, policymakers, and others can make – i.e., exploring new kinds of incentives, contributing to articulation of standards, and piloting new ways of providing reliable evidence.