Feedback on the Second Draft of the General-Purpose AI Code of Practice
Compliance with the most intensive requirements is (mostly) feasible for big companies, but that doesn't mean all is well.
Introduction
In a previous blog post that I co-authored with Dean Ball, we provided feedback on the first draft of the General-Purpose AI Code of Practice (COP). See that post for more context, but to briefly recap, the COP is a way of complying with the EU AI Act. The second draft of the COP was published a few weeks ago.
Some of what we said in the first post still applies to the second draft, and some doesn’t, but I won’t repeat any of that content here. Instead, below I will first summarize my overall view of where things are at in the process, and then share fresh reactions to the second draft.
Note that the bulk of this post will be pretty inside-baseball-y and will assume some knowledge of the COP, which can be found here, though maybe you should wait for the third draft if you’re just starting out. I won’t be offended if you don’t read all of this, unless you’re one of these people. I suspect most of you will probably be interested in the big picture section, though.
Also note that I am not a lawyer and didn’t have time to consult one on this so it’s possible I may be misinterpreting some things, and of course this is not legal advice regarding compliance.
The Big Picture
The chairs and vice-chairs iterating on the COP will hear many different perspectives on the big picture. Some will say that the COP is a case of tyrannical overreach from European bureaucrats that is impossible to comply with, and others will say that it’s a toothless document that caves to industry and VC lobbying.
Here’s my take, which is somewhere in between.
Generally, things seem to be moving in a direction of greater specificity and coherence, and reducing both overreach and under-reach, between the drafts. This suggests that the feedback process is working reasonably well and there is a genuine desire to balance various factors. I also would add that many aspects of the current draft are very well-considered, though I won’t enumerate all the things I like about it below (e.g., I think it’s the right call to allow external assessors who aren’t EU citizens).
At the same time, it has clearly been hard for the chairs and vice-chairs to keep up with an insane schedule, and I really wish the European Parliament had just asked to kick this process off much sooner, like when the Act was first passed, instead of a few months out from it going into force. This rushedness shows up in, e.g., there being a lot of unfinished business in each version that seems like it’s just a matter of time rather than requiring more feedback. I don’t think there’s a solution to this now, but it’s just worth being aware of so frustration is directed in the right place (i.e., at Parliament, not the chairs and co-chairs).
With one exception, big companies should not have too much trouble complying with the provisions on “models with systemic risks,” the part I’m paying closest attention to. Models with systemic risks are, for now, basically models that require more compute to train and are therefore assumed to pose higher risks of various kinds, and I’ll circle back below on the future of this framework below.
Since only a small number of companies can afford to make models with systemic risks as currently defined, it is reasonable to have at least slightly higher compliance burdens for these kinds of models, though it shouldn’t be so extreme that even big companies can’t comply. It looks like we’ve avoided that failure mode, at least – my assessment is that it should take no more than one additional program manager, lawyer, and engineer to comply with these provisions if a company is already taking reasonable actions to assess and mitigate major risks (which most relevant companies have already committed to doing anyway via voluntary commitments). Perhaps it will be less than this, and I will consider ways to make things more efficient below. And perhaps it’s more, but it’s certainly not several dozen, again assuming a reasonable starting point.
However, just because compliance is feasible for big companies doesn’t mean that the provisions will achieve their original goals. If compliance is inefficient or mis-targeted with respect to the real risks at stake, then the AI Act could be counterproductive by giving safety a bad name within companies and distracting employees and governments from attending to more pressing issues. I’m not saying that this will happen, since I think the COP will continue to improve, but I’m just contextualizing why I think it’s worth helping to make the COP as efficient and effective as it can be.
A recurring theme below is that whether the COP makes sense depends in large part on the threshold for determining that a model has systemic risks. I assume below, unless stated otherwise, that the provisions on systemic risks will apply to only a small number of companies, but I revisit this issue in a few specific cases.
Lastly, I’ll note that the working groups are clearly struggling to manage some fundamental tensions baked into the AI Act, particularly on open source. I don’t think that these tensions are resolvable by the AI Office, and either require different solutions from other political bodies, or an amendment to the AI Act down the road.
The biggest tension, in short, is between having requirements be proportional to risk, on the one hand, and treating small businesses and open source as basically untouchable, on the other. In general, yes, bigger companies will have bigger and more risky models, and yes, more compute-intensive systems/companies deserve a closer look, and yes, it’s good to avoid crushing open source. But baking these loose correlations into the AI Act as immutable axioms leads to all sorts of strange contortions, like saying that someone open sourcing a model with systemic risk to the EU as a whole should not require anyone on staff red teaming it, but that those who deploy that same model via an API (where there are more affordances for preventing harm, including stopping deployment or changing the model) should have to do some red teaming before deployment. Isn’t that exactly backwards?
This is what happens when there are multiple conflicting constraints built-into the legislation as a result of lobbying from various directions, and this should be a cautionary tale for the US going forward.
Diving into the Details
Change over time
Both the first and second drafts have noted that there is not yet a plan for updating the COP over time.
I hope that the third version begins to shed more light on this, and would like to suggest one possible answer:
The COP should be updated quarterly, though there should be a grace period such that it’s OK to be in compliance with a version that is up to one quarter old. Less frequent updates than this would lead to being way out of date relative to the state-of-the-art, and more frequent updates (or a lack of a grace period) would make it hard to plan ahead.
The update process should consider not just changes to the COP but also the trigger for systemic risk, since these cannot be cleanly separated. If needed, Parliament should pass legislation allowing these to be jointly optimized (right now, the chairs and vice-chairs are designing the COP in isolation which is tricky).
As is currently happening with the drafting process, the AI Office should provide indications of “where their head is at,” including possible future changes being contemplated. This would appropriately thread the needle between avoiding big sudden changes, on the one hand, and avoiding any big changes (even when they’re needed), on the other.
The feedback process should include, in addition to Futurium, a secure email address to which any party (including those who haven’t registered as stakeholders) can send input. Additionally, there should be at least quarterly opportunities to register as a stakeholder in Futurium or any other higher-bandwidth modes of engagement.
There should be some sort of baked-in presumption in favor of simplification and lightening the compliance burden, to counteract what will likely be a default tendency for the rules to grow more and more detailed over time. This could include, e.g., a target page length of less than 50 pages, or relying as much as possible on best practices codified via another body, a presumption in favor of deleting or having to justify certain rules on a yearly basis, or various other approaches.
Inefficiency in reporting processes
As noted in the big picture section above, there are various ways in which compliance could be feasible but still not ideal, such as by being inefficient in a way that gives negative associations to safety and policy (and the EU), or focusing on the wrong things. I have some concerns of this nature when it comes to the documentation required for explaining development/deployment processes at a company generally and for specific models.
In short, I think the EU should make explicit that compliance can occur via public documents that substantively address all or most of the required information, and for companies to simply supplement this privately and/or publicly with specific outputs for EU purposes, as opposed to creating entirely separate documents from scratch including all of the information required. Already, companies do a lot to produce “system cards” (as well as API documentation) that address a large fraction of the topics covered in the COP, often due to voluntary commitments. It would be inefficient to require all of this information to be rewritten in very slightly different ways, and right now it’s not clear whether this is permissible.
Alternative compliance options
As discussed elsewhere, I think it’s valuable to explore ways of verifying compliance with AI rules that leverage the judgment of a human “on the ground” moreso than formal documentation and technical mechanisms (though all of these have a place). One way of doing this in an EU context is to have external assessors be given deep access to a company (e.g., by being “employees for a week” every few months) and to report to the EU that they found the company to be in compliance and either documenting the details themselves or providing them just upon request. Essentially, the idea is to substitute “the human touch” (and someone who will over time have a lot of experience assessing compliance and writing it up) instead of company employees doing paperwork. This option may be particularly important for small businesses which may struggle to parse the long COP document and prefer a more lightweight option, though I would also like to see innovation in compliance processes at big companies.
Recall the concern about giving safety a bad name within companies, and generally making things worse. While having a third party poking around at the company may not be everyone’s cup of tea, it may be the better option compared to an employee (who should perhaps be working on the substance of safety rather than its documentation) doing the same, and a third would get more efficient at this over time compared to someone who does it part-time and intermittently.
Note that this is sort of a substitute for the recommendation above on “inefficiency in reporting processes” – if there isn’t a burdensome reporting process to begin with, then this idea is less important, but there may still be value in exploring this kind of approach even if the efficient reporting problem is solved. For example, consider the case of a small company that does not have experience writing system cards and that might want help getting started, and then repurpose the documentation produced by a third party assessor.
As discussed further below, a single assessor may not be able to cover everything at least in the near-term, so it may not be literally one employee for a week, but perhaps a few people with expertise in different areas could join a company for a week and thereby make things more efficient overall, since again, assessors will be quite familiar with the COP if they do this routinely.
Over and under-inclusive reporting requirements
As noted in the first post, the details of the reporting requirements in the first table seem not fully justified, and I won’t repeat all of what was said there, but will revisit one theme.
Some things that should be public are not very explicitly called out in the second draft, such as a system’s “goals” (e.g., as reflected in its “constitution,” rule-based reward models, etc.). And conversely, some things that are called out, like synthetic data details, are not obviously appropriate to demand in full as they can be highly sensitive. Likewise for forecasts of future capabilities. It has not yet been demonstrated that the European Union can protect such IP from theft, and unless/until that is demonstrated, there should be a recalibration of what information is requested in areas that aren’t specifically called out in the AI Act and which are not clearly required in order to assess safety risks.
If and when the EU demonstrates a strong capacity to protect information (e.g., having a SCIF set aside for this purpose and clear standards for screening of personnel who would receive the shared information), and there’s a stronger case made that full information on methods is essential for the AI Office to do its job, there should be more of an expectation of sharing information in such secure settings. But for now, I believe the EU should (unless statutorily required) generally be requesting transparency on capabilities of AI systems, goals of systems, evaluations of risks, and mitigations for those risks, not methods for producing the capabilities to begin with, or forecasts of future capabilities. Sometimes these are closely intertwined, but sometimes they are not.
Note that this is one of the recommendations on which I am most unsure. I could see a good argument for (e.g.) it being in the public interest for current and expected future capabilities to be fully open, though currently this seems risky to me, and the COP would not achieve this currently – it would just require it to be given to EU authorities, not the public. It seems to me for now that, at least unless and until company security has been dramatically improved and compute governance essentially solved (such that there’s not much anyone can do to “weaponize” information about enticing capabilities, such as by stealing them or scaling up to replicate them), then there should be some discretion in what to say about current and future non-commercial capabilities.
As is, it seems to me that sharing forecasts with the EU in an insecure way will just create a insecure repository of information at the EU AI Office that bad actors will use to decide whom to hack and for what exactly. I also think policymakers are not currently bottlenecked on information about the extremely fast pace of AI progress, on which the evidence is abundant about the overall big picture trajectory. But I could be wrong about some of these things.
Summing up, in some areas there is a clear public interest in knowing more than the COP currently requires, e.g., what AI systems are “optimizing for.” And conversely, some of this information should probably not have to be disclosed as it adds little to the EU’s ability to make good decisions, while potentially exposing sensitive IP.
Responsibility for misuse
In the bullet on the risk of “large-scale, illegal discrimination,” the COP says that “this risk category does not include the intentional misuse of models.” If I understand correctly, the intent is to avoid platforms from being held responsible for bad actors doing bad things with their platforms. But if we’re talking about illegal discrimination happening at a large scale, then (besides being illegal) that would be a violation of any platform’s terms of use. And elsewhere in the COP, it says that after deployment, companies should have “safeguards to prevent misuse.” So I am not sure what this caveat is accomplishing and it seems misguided to me.
What about fine-tuning?
The COP says that “During training and post-training enhancements (e.g. fine-tuning, reinforcement learning, or similar methods, but only when done by the model provider themselves or on their behalf), Signatories commit to assessing and potentially mitigating systemic risks at pre-defined milestones at which the model could potentially become significantly more capable.” I don’t know what the “but only when done by the model provider themselves or on their behalf” part means. Who is responsible for mitigating risks of an API-based fine-tuning service? It seems to me that a big chunk of the responsibility, if not all of it, should be on the part of the company making that service available, since they know the most about the model’s risks and are in the best position to implement safeguards.
Incentives for innovation on deep access
I was excited to see the following: “To facilitate model elicitation, Signatories commit to working towards giving more model evaluators, internal and external, grey- and white-box access to their general-purpose AI models with systemic risk, considering a cost-benefit analysis with regards to model security (see Measure 12.2).”
“Working towards” such deep access is hugely important, and relates to some things I’ve discussed elsewhere. But how quickly are companies supposed to work towards this? With no KPIs and no clear incentive to do more than the bare minimum here, as well as the complexity and multi-dimensionality of the problem/opportunity space being explored, it’s not clear to what extent this provision will get any traction.
I’d like to suggest that the AI Office think about this as something to incentivize, such that if you allow deeper access in one area, you are “rewarded” by a lighter paperwork burden or some other benefit. This relates to (and can be combined with) the point I mentioned above about using external assessors as “employees for a week.” Perhaps the COP could say something like “In recognition of the benefits to the larger ecosystem of innovation in this area, proactive innovation in this area and public sharing of lessons learned will be recognized by the AI Office as substituting for documentation of [Measures XYZ].” I’d have to be more intimately familiar with the COP to have good suggestions for Measures XYZ.
Tool and best practice sharing
Measure 10.8 says that signatories will make “state-of-the-art evidence collection best practices widely accessible to relevant actors in the AI ecosystem,” and gives KPIs with specific numbers of actors to share this information with. The intent of the measure is to help the wider ecosystem, including small and early stage businesses which will presumably not have the social connections to know how to get in touch with big companies in order to benefit from small-scale information sharing. So it would be more efficient and better for the ecosystem for such information to simply be published openly, and for big companies to somehow avail themselves of smaller and earlier stage companies for assistance that is not well covered in such public materials (e.g., doing office hours periodically or having a publicly-shared email address for reaching out with questions).
Security details
It’s great to see more granularity in the discussion of security this time around, though I have a few uncertainties about the current details.
In particular, the COP states that signatories should implement and advance research that could help prepare for SL4 security (discussed here; see also this post), if they expect to be targeted state-level adversaries. But what should this expectation be based on? I’m not sure anyone has a great answer other than “prepare for the worst,” which isn’t very actionable for smaller companies in particular. It’s possible that in some cases, SL3 will be overkill, and likewise, in some cases SL4 may not be enough.
As with many other details in the COP, much depends on the threshold for models with systemic risk. I’d suggest the following solution for jointly adjusting the COP, the security details, and the systemic risk cutoff: expectations of whether to expect certain categories of attackers should be based on the estimated budget of those attackers in relation to the cost of the IP in question. If, say, the systemic risk threshold corresponds to a billion dollar budget, then companies should be planning to ultimately get to SL5. If it’s much lower, then a proportionately lower level of protection makes sense. This is a very rough sketch, but may be a good starting point for a solution to this issue. Additionally, there should be some sort of explicit plan (whether reflected in the COP or not) for the EU to harmonize its decision-making with US and Five Eyes threat intelligence.
Pooling resources
There are several areas where the COP says that companies should do things that I don’t think they should be doing on their own for efficiency reasons. For example, research on SL4 (mentioned above) should ideally be done in collaboration with other companies, academia, governments, etc. Other examples include, non-exhaustively engagement with external experts and stakeholders, which requires skill and investment to do effectively; anonymous reporting channels, which could benefit from common infrastructure (e.g., like this); bug bounties, which also may benefit from shared infrastructure; and support for external scientific research.
It would be valuable to explicitly call out the permissibility and encouragement of collaboration in these specific cases, and/or in a cross-cutting manner in one of the introductory sections.
Safety versus non-safety resourcing
I mentioned above that there was one exception to my claim that it is feasible to comply.
The main exception my claim re: compliance requiring up to 3 marginal employees (for companies with mature safety/security efforts already) is the set of KPIs for Measure 10.4. These prescribe that the “[a]verage percentage of engineering hours spent on model elicitation for model evaluations used as evidence for the highest risk tiers as compared to the largest internal non-safety project [should be] 75%.” And then there’s a lower number for lower risk tiers.
That’s a super high percentage – basically this means that companies either need to nearly double their engineering staff, or dramatically decrease their effort on “non-safety projects.” I can see where they got this number – I suspect there was a mistaken assumption that the largest non-safety project can’t be that big, since companies have a lot of different projects, so how big can 75% of it be? But this misses the point that for big companies, projects are not equally sized. The largest non-safety project may involve dozens or even hundreds of people working on (e.g.) the next frontier model’s pretraining and post-training.
Also, I think the whole idea of a clean distinction between safety and non-safety projects is incorrect and harmful, since these sorts of things are and should be blended together in practice.
So I would just suggest scrapping these KPIs entirely and reconsidering what makes sense here.
Proportionality violation
As alluded to in the section on the big picture above, the COP authors are struggling with inherent tensions baked into the AI Act by legislators. Nevertheless, it’s worth stating explicitly that in many cases, the KPIs for SMEs and open source models are much more lenient for no clear reason grounded in actual safety and security considerations.
As just one example, consider the KPI for “Percentage of internal staff hours spent on exploratory safety research as part of all research hours,” which is 10% for non-SMEs and 0% for SMEs and/or providers who exclusively release open-source models. Even those who think that open source models are in some sense inherently safer, or who are worried about crushing small businesses, would, I think, be hard pressed to coherently justify an infinitely lower red teaming burden.
Incentive alignment on KPIs related to assessment results
It’s great for companies to actually be in compliance, and for assessments to accurately conclude this. But it’s bad if they’re judged to be in compliance and they’re actually not. If assessment results can’t be trusted, the whole system falls apart, so it’s important to consider the incentives created by KPIs very carefully.
This is why I worry about the lack of qualifiers on KPIs 15.2, which states “Adequacy and adherence assessments yield a positive result.”
It seems likely to me that many assessments will, and should, yield mixed results, and companies should not be encouraged to bias the process in a way that yield an unambiguously positive assessment (to be clear, such biases do not require any malicious intent on the part of companies, and may occur in small and subtle ways that add up over time to a watered down system of compliance). I would suggest, then, that there at least be some sort of qualifier such as “generally positive” or “largely positive” to make explicit the expectation of imperfection, which may be the best that can be hoped for in the near future, at least. There could also be some sort of rubric so that positiveness of assessments is objective, though this will require thoughtful design to avoid the same kinds of issues discussed above.
The role of external assessments in deployment decision-making
Given the nascent ecosystem of external assessment for AI systems, it is important to have clearly defined scopes for such assessments: there are no “one stop shops” available yet, and may never be, since truly mastering all the different considerations would essentially amount to starting a parallel company, and talent is finite. It is more realistic to have different experts assessing different things.
This mindset has several implications for the current draft:
This requirement doesn’t necessarily make sense as stated: “If Signatories disagree with any recommendations or conclusions reached by external assessors, they commit to specifying their disagreement in the Model Report.” Conclusion may sometimes make sense to respond to but I’m wary of this as a general requirement. In general, assessors should probably not be making net recommendations on deployment, since if they do, they will often be going beyond their expertise in a specific kind of risk. The most capable external assessors today, for example, do not tend to claim expertise beyond a specific area or small number of areas. I don’t want the COP to encourage companies wasting time responding to off-base recommendations and sweeping conclusions based on limited information and expertise, but rather external experts should focus on what they do best – assessing risks in specific areas and stating their conclusions clearly.
Likewise, the COP says that “a Model Report will describe whether and how external expert assessments informed a decision to place the model on the market, such as through assessment of risks.” Again, external assessors should be aiding assessment of risks, not determining final net decision-making, which involves many factors that assessors will not be privy to.
The COP says, “Where external assessors submit reports outlining identified risks, testing methodologies, and proposed mitigations to Signatories, Signatories commit to including them in their Model Reports.” Again, given the ecosystem issues described above, these reports may often be of uneven quality and it’d be burdensome to have to continually update, e.g., one’s system cards based on whatever conclusions third parties arrive at. It’s much more important that they be allowed to publish those findings themselves, and that the terms negotiated by companies do not forbid this. It’s also important that regulators have access to any findings upon request. Bundling everything together could dilute the quality, focus, and readability of documentation like system cards and Model Reports.
KIP 16.2.3 says that “Independent external assessors, under Measure 16.2, receive access to deployed models within 30 days of a formal request, barring exceptional circumstances.” This is vague regarding the type of access in question and the kind of assessors being discussed. If it’s simply black box access, then shouldn’t they have access already by virtue of the product being on the market? If it’s deeper access than that, there will likely be some costs on the company side of providing access (such as staff time) and the terms and intent should be spelled out more here before inclusion in the final COP.
Loophole re: unreported models
If an AI system truly has “systemic risk,” it does not necessarily have to be deployed on the EU market in order to cause harm. It could cause harm to the EU by being deployed commercially elsewhere in a way that still affects the EU, or due to an incident that occurs prior to any external deployment, or because the system is stolen and then misused, or because it’s a non-commercial but deployed system (e.g., for military purposes).
Yet the COP says – based on how the AI Act was designed – that signatories “may choose not to notify the AI Office of a model that meets the classification criteria if they have strong reason to believe they will not put it on the EU market.” This is intended to acknowledge the fact that “some models may be trained solely for internal purposes or as preliminary experiments, with no intention of being placed on the EU market.”
If I understand correctly, this is a big loophole, but also not necessarily fixable from within the COP, since the AI Act is only triggered if things are deployed or will be deployed on the EU market. But I wanted to include this section as a brief reminder that even if we were to achieve perfectly efficient and effective implementation of the AI Act, the problem of mitigating the risks that the AI Act is intended (in part) to address would not be fully achieved.
Perhaps this can be patched in the future, or perhaps the US and other countries will just need to sort out the risks associated with models that aren’t commercially deployed in the EU through other means. I suspect that these unreported models will constitute a growing share of risk over time since there are many reasons to develop powerful (and dangerous) AI systems, not all of which involve deployment on the EU market.
Conclusion
The people writing the Code of Practice are not in an enviable position – they have to consider a zillion stakeholders’ opinions, they have an insane timeline, and they are constrained by the AI Act in terms of which problems they can and can’t solve. I appreciate their efforts even while I point out the remaining room for improvement. 🫡
To sum up, I think compliance with the systemic risk provisions is (mostly) feasible for big companies. But just because it is technically feasible to comply, that doesn’t mean we should just accept AI regulation becoming a box-ticking exercise that ruffles feathers internally for people trying to ship things while not materially increasing the safety of development and deployment practices. So continued innovation will be needed here, and I tried to suggest some ways to do that, including different ways of assessing compliance with the COP as well as quarterly changes to the COP and the systemic risk threshold.
Thinking a bit further out (several months, since things move quickly in AI), there is a fundamental issue here that goes way beyond the AI Act. If the EU keeps ratcheting up the bar for what counts as a model with systemic risk, then yes, they will achieve the goal of focusing on the highest risk systems. But what if more and more models do in fact pose systemic risks over time? Do we focus only on models that pose systemic-er risks?
Ultimately, the world needs to be resilient to misuse and reckless use of models that were once considered to have systemic risks. If that’s not possible, then we’re in big trouble, regardless of what the AI Office does. I am hopeful that concerted investment will get us there, and regulators like the AI Office can over time focus less on absolute thresholds like the ability to hack or generate bioweapons etc., since we will have learned to deal with those, and focus more on relative capabilities. But this is a hypothesis that hasn’t been proven, and I don’t think we should bet the farm on everything being sorted out with innovation.
It’s important that we get the General-Purpose AI Code of Practice right because it’s one of the only “shots on goal” today for building the institutional muscle to do this kind of triaged AI governance, and we don’t yet know if the US will take its own shot in the near future. If you think you could help with this, consider providing feedback on the third draft which is coming soon.
Acknowledgments: Thanks to Dean Ball for helpful comments on earlier versions of this post. Views expressed here are my own.