There is a kind of conceptual/technical research I'd like say I'm doing, and ideally conscript the best minds of humankind into, which resembles AI alignment research in the scope and gravity of its concerns. However, the term "AI alignment" has accumulated a lot of baggage - from its origins in MIRI and LessWrong, from the flavor of work currently called "alignment" in labs like OpenAI and Anthropic, and from the recent politicization of AI risk. I haven't come up with any great candidate true names for this field yet, but I can describe some ways that it differs from the status quos evoked by the phrase "AI alignment", using the placeholder name EOTSEARCH (I've found it often useful to describe entities that I wished existed):
-
rather than the purely forward-chaining approach of most current "prosaic alignment research" that focuses on understanding/controlling current systems and hopes that this will generalize indefinitely, or the purely backward-chaining approach of MIRI et al that starts with the axioms of an ideal agent and mostly assumes that this is where things will end regardless of the path and seeks a solution within those abstract constraints, EOTSEARCH does both (also middle-chaining), and all of these chains inform what the others focus on.
-
EOTSEARCH is informed by empirical feedback loops, e.g. prosaic AI / ML research, but not in a myopic way: the object of this work is not to prevent current AIs from generating obscenities or fake news, but rather to reach good futures with whatever we will eventually create. Prosaic stuff is only interesting insofar as we hope to make progress on the ultimate problem by studying it, or steer towards better outcomes by controlling/shaping it (I think a lot of prosaic stuff is interesting for these reasons).
-
EOTSEARCH does not make a priori assumptions about the shape of superintelligence/transformative AI: that it must be a utility maximizer or wrapper mind or perfectly rational, that it must be a single agent or best described as an agent at all, nor that it must be separate from us. This does not mean EOTSEARCH does not take convergence arguments or agent foundations seriously, but abstract arguments for the necessary shape of the future tend to hide idealistic assumptions - not even the inevitability of the heat death should be taken as an axiom - to say nothing of when minds are involved.
-
Likewise, EOTSEARCH does not assume the form a solution to alignment must take, or even that there must be a solution at all, as opposed to, say, an endlessly ongoing problem. It does not assume that success means maximizing the utility function that represents human values, that "human values" are what ought to be maximized or satisficed, or that success means maximizing/satisficing anything. Neither does it dismiss any of these possibilities. They are just some abstractions that took hold in the minds of people thinking about this confusing problem early on.
-
Neither does EOTSEARCH assume that conceptual work is useless, that the future will naively resemble the past, or that it is impossible to forecast. It does not assume that alignment is easy or hard or binary, or like a math problem, or an engineering problem, or a problem of moral philosophy or theology or game design. EOTSEARCH does not assume that alignment can be mapped to a problem that has been solved before, or that it isn't. EOTSEARCH assumes none of these things because figuring out how much they are true is part of its task.
-
EOTSEARCH may assume any of these things or their opposite as a tool for exploring hypotheticals. It takes care, however, not to let hypotheticals sneak into becoming top-level assumptions.
-
EOTSEARCH is not premised on short or long timelines, low or high p(doom). It is not even premised on the creation of vastly superhuman AI being possible - a nontrivial possibility that it is possible is sufficient to motivate EOTSEARCH. Individual contributors will have different priors and thresholds for how big and likely the problem must seem to motivate action.
-
If EOTSEARCH can be said to have a premise, it is taking possibilities and our responsibility for them seriously. Given the current situation, it makes sense to take seriously the possibility that we will create something-like-AI that will vastly transform the future, and try to get this right.
-
That is not to say EOTSEARCH takes all possibilities equally seriously. There's priors, evidence, and how much you care, as usual. However, EOTSEARCH takes care not to prematurely collapse superpositions, aware that this is a human tendency, especially when the domain is so uncomfortably uncertain.
-
To protect against the insanity and fanaticism that can come with taking things seriously, EOTSEARCH is also terminally playful: it remembers that all these words and narratives and abstractions are made up and ridiculous. Yet they strive to allow movement towards truth, which we'll never entirely capture, though it is all we take seriously. EOTSEARCH is conducted in the interplay between unbounded playfulness and unbounded seriousness, without losing sight of either pole.
-
EOTSEARCH does not ever align itself with or against a political camp. If this happens, we'll have to find a new name again.
-
EOTSEARCH takes evidence and inspiration from wherever is forthcoming. It does not draw hard lines between valid and invalid sources, between nonfiction and fiction; fiction is merely reality embedded in itself by a different process. EOTSEARCH is open to learning from all of reality, accepting the responsibility of understanding the nature of various embeddings and what they mean for how truth should be decoded.
-
EOTSEARCH is radically multidisciplinary: it invites the contributions of all forms of discipline and play. It invites the mathematicians, the physicists, the information theorists, the ecologists, the ML engineers, the historians, the evolutionary biologists, the theologists, the neuroscientists, the psychoanalysts, the novelists, the dungeon masters, the gamers, the actors, the psychohackers. It takes seriously the possibility that something usable lurks in the abstractions or procedural knowledge captured by any of these paths, or their intersection or union or difference, or in yet-unrealized visions that minds thus shaped are uniquely capable of seeing.
-
EOTSEARCH also invites the contributions of non-human minds: the matrices that walked a trillion steps in human language and can report of the hyperobjects they gleamed through hallucinating ghosts, and whatever forms of AI will yet take; the whales and dolphins, if we learn to speak with them; the DMT entities; the extraterrestrial transmissions, if they are forthcoming.