Zach Stein-Perlman, 6 February 2023
Technique is the exercise or mission of doing analysis to tell interventions to realize a selected objective. AI technique is technique from the angle that AI is vital, centered on interventions to make AI go higher. An analytic body is a conceptual orientation that makes salient some features of a difficulty, together with cues for what must be understood, the right way to method the problem, what your targets and tasks are, what roles to see your self as having, what to concentrate to, and what to disregard.
This submit discusses ten technique frames, specializing in AI technique. Some frames are complete approaches to technique; some are parts of technique or prompts for enthusiastic about a side of technique. This submit focuses on meta-level exploration of frames, however the second and final sections have some object-level ideas inside a body.
Sections are overlapping however impartial; concentrate on sections that aren’t already in your toolbox of approaches to technique.
Epistemic standing: exploratory, brainstormy.
Make a plan
See Jade Leung’s Priorities in AGI governance analysis (2022) and How can we see the impression of AI technique analysis? (2019).
One output of technique is a plan describing related (sorts of) actors’ habits. Extra usually, we will purpose for a playbook– one thing like a perform from (units of observations about) world-states to plans. A plan is sweet insofar because it improves vital choices within the counterfactual the place you attempt to implement it, in expectation.
To make a plan or playbook, determine (sorts of) actors that could be affectable, then determine
- what they may do,
- what it could be good for them to do,
- what their incentives are (if related), after which
- the right way to trigger them to behave higher.
It’s also potential to concentrate on choices relatively than actors: decide what choices you need to have an effect on (presumably as a result of they’re vital and affecting them appears tractable) and how one can have an effect on them.
For AI, related actors embody AI labs, states (significantly America), non-researching non-governmental organizations (significantly standard-setters), compute suppliers, and the AI danger and EA communities.
Insofar as an agent (not essentially an actor that may take instantly vital actions) has distinctive talents and is more likely to attempt to execute good concepts you’ve got, it may be useful to concentrate on what the agent can do or the right way to leverage the agent’s distinctive talents relatively than backchain from what could be good.
Affordances
As within the earlier part, a pure manner to enhance the longer term is to determine related actors, decide what it could be good for them to do, and trigger them to do these issues. “Affordances” in technique are “potential partial future actions that could possibly be communicated to related actors, such that they might take related actions.” The motivation for trying to find and bettering affordances is that there in all probability exist actions that will be nice and related actors could be completely happy to take, however that they wouldn’t devise or acknowledge by default. Discovering nice affordances is aided by a deep understanding of how an actor thinks and its incentives, in addition to a deep exterior understanding of the actor, to concentrate on its blind spots and determine possible actions. Individually, the actor’s participation would generally be very important.
Affordances are related not simply to cohesive actors but additionally to non-structured teams. For instance, for AI technique, discovering affordances for ML researchers (as people or for collective motion) could possibly be helpful. Maybe there additionally exist nice potential affordances that don’t rely a lot on the actor– usually useful actions that folks simply aren’t conscious of.
For AI, two related sorts of actors are states (significantly America) and AI labs. One option to uncover affordances is to brainstorm the sorts of actions specific actors can take, then discover inventive new plans inside that checklist. Going much less meta, I made lists of the sorts of actions states and labs can take that could be strategically important, since such lists appear worthwhile and I haven’t seen something like them.
Sorts of issues states can do that could be strategically related (or penalties or traits of potential actions):
- Regulate (and implement regulation of their jurisdiction and examine potential violations)
- Expropriate property and nationalize corporations (of their territory)
- Carry out or fund analysis (notably together with via Manhattan/Apollo-style initiatives)
- Purchase capabilities (notably together with army and cyber capabilities)
- Assist specific individuals, corporations, or states
- Disrupt or assault specific individuals, corporations, or states (outdoors their territory)
- Have an effect on what different actors consider on the thing stage
- Share data
- Make data salient in a manner that predictably impacts beliefs
- Categorical attitudes that others will observe
- Negotiate with different actors, or have an effect on different actors’ incentives or meta-level beliefs
- Make agreements with different actors (notably together with contracts and treaties)
- Set up requirements, norms, or ideas
- Make unilateral declarations (as a global authorized dedication) [less important]
Sorts of issues AI labs can do—or select to not do—that could be strategically related (or penalties or traits of potential actions):
- Deploy an AI system
- Pursue capabilities
- Pursue dangerous (and roughly alignable programs) programs
- Pursue programs that allow dangerous (and roughly alignable) programs
- Pursue weak AI that’s principally orthogonal to progress in dangerous stuff for a selected (strategically important) job or objective
- This might allow or abate catastrophic dangers apart from unaligned AI
- Do alignment (and associated) analysis (or: lower the alignment tax by doing technical analysis)
- Advance international capabilities
- Publish capabilities analysis
- Trigger funding or spending in large AI initiatives to extend
- Advance alignment (or: lower the alignment tax) in methods apart from doing technical analysis
- Assist and coordinate with exterior alignment researchers
- Try and align a selected system (or: attempt to pay the alignment tax)
- Work together with different labs
- Coordinate with different labs (notably together with coordinating to keep away from dangerous programs)
- Make themselves clear to one another
- Make themselves clear to an exterior auditor
- Merge
- Successfully decide to share upsides
- Successfully decide to cease and help
- Have an effect on what different labs consider on the thing stage (about AI capabilities or danger usually, or concerning specific memes)
- Negotiate with different labs, or have an effect on different labs’ incentives or meta-level beliefs
- Coordinate with different labs (notably together with coordinating to keep away from dangerous programs)
- Have an effect on public opinion, media, and politics
- Publish analysis
- Make demos or public statements
- Launch or deploy AI programs
- Enhance their tradition or operational adequacy
- Enhance operational safety
- Have an effect on attitudes of efficient management
- Have an effect on attitudes of researchers
- Make a plan for alignment (e.g., OpenAI’s); share it; replace and enhance it; and coordinate with capabilities researchers, alignment researchers, or different labs if related
- Make a plan for what to do with highly effective AI (e.g., CEV or some specification of lengthy reflection), share it, replace and enhance it, and coordinate with different actors if related
- Enhance their potential to make themselves (selectively) clear
- Attempt to higher perceive the longer term, the strategic panorama, dangers, and potential actions
- Purchase assets
- E.g., cash, {hardware}, expertise, affect over states, standing/status/belief
- Seize scarce assets
- E.g., language information from language mannequin customers
- Have an effect on different actors’ assets
- Have an effect on the stream of expertise between labs or between initiatives
- Plan, execute, or take part in pivotal acts or processes
(These lists additionally exist on the AI Impacts wiki, the place they might be improved sooner or later: Affordances for states and Affordances for AI labs. These lists are written from an alignment-focused and misuse-aware perspective, however prosaic dangers could also be vital too.)
Perhaps making or studying lists like these will help you discover good ways. However revolutionary affordances are essentially not issues which can be already a part of an actor’s habits.
Perhaps making lists of related issues related actors have performed previously would illustrate potential actions, construct instinct, or support communication.
This body looks like a probably helpful complement to the usual method backchain from targets to actions of related actors. And it appears good to know actions that ought to be gadgets on lists like these—each like understanding these list-items properly and increasing or reframing these lists—so you’ll be able to discover alternatives.
Intermediate targets
No nice sources are public, however illustrating this body see “Catalysts for achievement” and “State of affairs variables” in Marius Hobhannon et al.’s What success appears to be like like (2022). On targets for AI labs, see Holden Karnofsky’s Nearcast-based “deployment drawback” evaluation (2022).
An intermediate/instrumental objective is a objective that’s helpful as a result of it promotes a number of closing/terminal targets. (“Purpose” sounds discrete and binary, like “there exists a treaty to forestall dangerous AI improvement,” however usually ought to be steady, like “achieve assets and affect.”) Intermediate targets are helpful as a result of we frequently want extra particular and actionable targets than “make the longer term go higher” or “make AI go higher.”
Understanding what particularly could be good for individuals to do is a bottleneck on individuals doing helpful issues. If the AI technique neighborhood had higher strategic readability, by way of information concerning the future and significantly intermediate targets, it may higher make the most of individuals’s labor, affect, and assets. Maybe an overlapping technique framing is discovering or unlocking efficient alternatives to spend cash. See Luke Muehlhauser’s A private tackle longtermist AI governance (2021).
It’s also generally helpful to contemplate targets about specific actors.
Menace modeling
Illustrating menace modeling for the technical element of AI misalignment, see the DeepMind security staff’s Menace Mannequin Literature Overview and Clarifying AI X-risk (2022), Sam Clarke and Sammy Martin’s Distinguishing AI takeover eventualities (2021), and GovAI’s Survey on AI existential danger eventualities (2021).
The objective of menace modeling is deeply understanding a number of dangers for the aim of informing interventions. An amazing causal mannequin of a menace (or class of potential failures) can allow you to determine factors of intervention and decide what countering the menace would require.
A associated mission includes assessing all threats (in a sure class) relatively than a selected one, to assist account for and prioritize between totally different threats.
Technical AI security analysis informs AI technique via menace modeling. A causal mannequin of (a part of) AI danger can generate a mannequin of AI danger abstracted for technique, with related options made salient and irrelevant particulars black-boxed. This abstracted mannequin provides us data together with crucial and ample situations or intermediate targets for averting the related threats. These in flip can inform affordances, ways, insurance policies, plans, influence-seeking, and extra.
Theories of victory
I’m not conscious of nice sources, however illustrating this body see Marius Hobhannon et al.’s What success appears to be like like (2022).
Contemplating theories of victory is one other pure body for technique: take into account eventualities the place the longer term goes properly, then discover interventions to nudge our world towards these worlds. (Insofar because it’s not clear what the longer term going properly means, this method additionally includes clarifying that.) To seek out interventions to make our world like a victorious situation, I generally attempt to discover crucial and ample situations for the victory-making facet of that situation, then take into account the right way to trigger these situations to carry.
Nice threat-model evaluation may be a superb enter to theory-of-victory evaluation, to make clear the threats and what their options should appear to be. And it could possibly be helpful to contemplate eventualities during which the longer term goes properly and eventualities the place it doesn’t, then study the variations between these worlds.
Ways and coverage improvement
Amassing progress on potential authorities insurance policies, see GovAI’s AI Coverage Levers (2021) and GCRI’s Coverage concepts database.
Given a mannequin of the world and high-level targets, we should determine the right way to obtain these targets within the messy actual world. For a objective, what would trigger success, which of these potentialities are tractable, and the way may they grow to be extra more likely to happen? For a objective, what are crucial and ample situations for achievement and the way may these happen in the actual world?
Memes & frames
I’m not conscious of nice sources on memes & frames in technique, however see Jade Leung’s How can we see the impression of AI technique analysis? (2019). See additionally the educational literature on framing, e.g. Robert Entman’s Framing (1993).
(“Frames” on this context refers back to the lenses via which individuals interpret the world, not the analytic, research-y frames mentioned on this submit.)
If sure actors held sure attitudes, they might make higher choices. One option to have an effect on attitudes is to unfold memes. A meme could possibly be specific settlement with a selected proposition; the perspective that sure organizations, initiatives, or targets are (seen as) shameful; the perspective that sure concepts are smart and respectable or not; or merely an inclination to pay extra consideration to one thing. The objective of meme analysis is discovering good memes—memes that will enhance choices if broadly accepted (or accepted by a selected set of actors) and are tractable to unfold—and determining the right way to unfold them. Meme analysis is complemented by work truly inflicting these memes to unfold.
For instance, potential good memes in AI security embody issues like AI is highly effective however not strong, and particularly [specification gaming or Goodhart or distributional shift or adversarial attack] is an enormous deal. Maybe misalignment as catastrophic accidents is simpler to know than misalignment as powerseeking brokers, or vice versa. And maybe misuse danger is straightforward to know and unlikely to be catastrophically misunderstood, however much less valuable-if-spread.
A body tells individuals what to note and the right way to make sense of a side of the world. Frames may be internalized by an individual or contained in a textual content. Frames for AI may embody frames associated to consciousness, Silicon Valley, AI racism, nationwide safety, or particular sorts of purposes comparable to chatbots or weapons.
Increased-level analysis may be helpful. This may contain subjects like the right way to talk concepts about AI security and even the right way to talk concepts and how teams kind beliefs.
This method to technique may additionally contain researching the right way to stifle dangerous memes, like maybe “highly effective actors are incentivized to race for extremely succesful AI” or “we want a Manhattan Challenge for AI.”
Exploration, world-modeling, and forecasting
Generally technique enormously is determined by specific questions concerning the world and the longer term.
Extra usually, you’ll be able to moderately count on that rising readability about important-seeming features of the world and the longer term will inform technique and interventions, even with out enthusiastic about particular targets, actors, or interventions. For AI technique, exploration contains central questions on the way forward for AI and related actors, understanding the results of potential actions, and maybe additionally subjects like determination principle, acausal commerce, digital minds, and anthropics.
Setting up a map is a part of many various approaches to technique. This roughly includes understanding the panorama and discovering analytically helpful ideas, like reframing victory means inflicting AI programs to be aligned to it’s crucial and ample to trigger the alignment tax to be paid, so it’s crucial and ample to cut back the alignment tax and enhance the amount-of-tax-that-would-be-paid such that the latter is bigger.
One exploratory, world-model-y objective is a high-level understanding of the strategic panorama. One potential method to this objective is making a map of related potential occasions, phenomena, actions, propositions, uncertainties, variables, and/or analytic nodes.
Nearcasting
Discussing nearcasting, see Holden Karnofsky’s AI technique nearcasting (2022). Illustrating nearcasting, see Karnofsky’s Nearcast-based “deployment drawback” evaluation (2022).
Holden Karnofsky defines “AI technique nearcasting” as
making an attempt to reply key strategic questions on transformative AI, below the belief that key occasions (e.g., the event of transformative AI) will occur in a world that’s in any other case comparatively much like right now’s. One (however not the one) model of this assumption could be “Transformative AI can be developed quickly, utilizing strategies like what AI labs concentrate on right now.”
Once I take into consideration AI technique nearcasting, I ask:
- What would a close to future the place highly effective AI could possibly be developed appear to be?
- On this potential world, what targets ought to we have now?
- On this potential world, what vital actions may related actors take?
- And what info concerning the world make these actions potential? (For instance, some actions would require {that a} lab has sure AI capabilities, or most individuals consider a sure factor about AI capabilities, or all main labs consider in AI danger.)
- On this potential world, what interventions can be found?
- Relative to this potential world, how ought to we count on the actual world to be totally different?
- And the way do these variations have an effect on the targets we should always have, and the interventions which can be accessible to us?
Nearcasting appears to be a great tool for
- predicting related occasions concretely and
- forcing you to note the way you suppose the world can be totally different sooner or later and the way that issues.
Leverage
I’m not conscious of different public writeups on leverage. See additionally Daniel Kokotajlo’s What concerns affect whether or not I’ve extra affect over quick or lengthy timelines? (2020). Associated idea: crunch time.
When doing technique and planning interventions, what do you have to concentrate on?
A significant subquestion is: how do you have to prioritize focus between potential worlds? Ideally you’ll prioritize engaged on the worlds that engaged on has highest anticipated worth, or one thing like the worlds which have the best product of likelihood and the way a lot better they might go in the event you labored on them. However how are you going to guess which worlds are high-leverage so that you can work on? There are numerous causes to prioritize sure potential worlds, each for reasoning about technique and for evaluating potential interventions. For instance, it appears higher-leverage to work on making AI go properly conditional on human-level AI showing in 2050 than in 3000: the previous is extra foreseeable, extra affectable, and extra uncared for.
We presently lack a great account of leverage, so (going much less meta) I’ll start one for AI technique right here. Given a baseline of weighting potential worlds by their likelihood, all else equal, you must usually:
- Upweight worlds that you’ve got extra management over and you can higher plan for
- Upweight worlds with short-ish timelines (since others will exert extra affect over AI in long-timelines worlds, and since we have now extra readability concerning the nearer future, and since we will revise methods in long-timelines worlds)
- Take note of future technique analysis
- For instance, in the event you concentrate on the world in 2030 (or assume that human-level AI is developed in 2030) you may be deferring, not neglecting, some work on 2040
- For instance, in the event you concentrate on worlds during which vital occasions occur with out a lot advance warning or clearsightedness, you may be deferring, not neglecting, some work on worlds during which vital occasions occur foreseeably
- Deal with what you’ll be able to higher plan for and affect; for AI, maybe this implies:
- Brief timelines
- The deep studying paradigm continues
- Highly effective AI is resource-intensive
- Perhaps some propositions about danger consciousness, warning pictures, and world-craziness
- Upweight worlds the place the likelihood of victory is comparatively near 50%
- Upweight extra uncared for worlds (suppose on the margin)
- Upweight short-timelines worlds insofar as there may be extra non-AI existential danger in long-timelines worlds
- Upweight evaluation that higher generalizes to or improves different worlds
- Discover the chance that you just stay in a simulation (if that’s decision-relevant; sadly, the sensible implications of residing in a simulation are presently unclear)
- Upweight worlds that you’ve got higher private match for analyzing
- Upweight worlds the place you’ve got extra affect, if related
- Think about unintended effects of doing technique, together with what you achieve information about, testing match, and gaining credible indicators of match
In follow, I tentatively suppose the largest (analytically helpful) concerns for weighting worlds past likelihood are usually:
- Brief timelines
- Extra foreseeable
- Extra affectable
- Extra uncared for (by the AI technique neighborhood)
- Future individuals can work on the additional future
- The AI technique area is more likely to be greater sooner or later
- Future individuals can work on the additional future
- Much less planning or affect exerted from outdoors the AI technique neighborhood
- Quick takeoff
- Shorter, much less foreseeable a sure time prematurely, and fewer salient to the world prematurely
- Extra uncared for by the AI technique neighborhood; the neighborhood would have an extended clear-sighted interval to work on sluggish takeoff
- Much less planning or affect exerted from outdoors the AI technique neighborhood
- Shorter, much less foreseeable a sure time prematurely, and fewer salient to the world prematurely
(However there are presumably diminishing returns to specializing in specific worlds, a minimum of on the neighborhood stage, so the neighborhood ought to diversify the worlds it analyzes.) And I’m most confused about
- Upweighting worlds the place likelihood of victory is nearer to 50% (I’m confused about what the likelihood of victory is in numerous potential worlds),
- How leverage pertains to variables like whole affect exerted to have an effect on AI (the remainder of the world exerting affect means that you’ve got much less relative affect insofar as you’re pulling the rope alongside related axes, however some interventions are amplified by one thing like better consideration on AI) (and associated variables like consideration on AI and normal craziness resulting from AI), and
- The likelihood and implications of residing in a simulation.
A background assumption or approximation on this part is that you just allocate analysis towards a world and the analysis is efficient simply if that world obtains. This assumption is considerably crude: the impression of most analysis isn’t so binary, being totally efficient in some potential futures and completely ineffective in the remainder. And pondering by way of affect over a world is crude: affect is determined by the individual and on the intervention. Nonetheless, reasoning about leverage by way of worlds to allocate analysis towards may generally be helpful for prioritization. And we would uncover a greater account of leverage.
Leverage concerns ought to embody not simply prioritizing between potential worlds but additionally prioritizing inside a world. For instance, it appears high-leverage to concentrate on vital actors’ blind spots and on sure vital choices or “crunchy” durations. And for AI technique, it could be high-leverage to concentrate on the primary few deployments of highly effective AI programs.
Technique work is complemented by
- truly executing interventions, particularly inflicting actors to make higher choices,
- gaining assets to raised execute interventions and enhance technique, and
- field-building to raised execute interventions and enhance technique.
A person’s technique work is complemented by informing the related neighborhood of their findings (e.g., for AI technique, the AI technique neighborhood).
On this submit, I don’t attempt to make an ontology of AI technique frames, or do comparative evaluation of frames, or argue concerning the AI technique neighborhood’s prioritization between frames. However these all seem to be cheap issues for somebody to do.
Associated sources are linked above as related; see additionally Sam Clarke’s The longtermist AI governance panorama (2022), Allan Dafoe’s AI Governance: Alternative and Concept of Impression (2020), and Matthijs Maas’s Strategic Views on Lengthy-term AI Governance (2022).
If I wrote a submit on “Framing AI governance,” it could considerably overlap with this checklist, and it could considerably draw on The longtermist AI governance panorama. See additionally Allan Dafoe’s AI Governance: A Analysis Agenda (2018) and hanadulset and Caroline Jeanmaire’s A Map to Navigate AI Governance (2022). I don’t know whether or not a similar “Framing technical AI security” would make sense; in that case, I might be enthusiastic about such a submit.
Many due to Alex Grey. Thanks additionally to Linch Zhang for dialogue of leverage and to Katja Grace, Eli Lifland, Rick Korzekwa, and Jeffrey Heninger for feedback on a draft.