A True Name is a mathematical definition of something that is robust enough that it is still accurate (doesn't goodhart) if used as an optimization target, or more generally, far out of distribution. An informal True Name (a weaker but still important form) is a name or description that efficiently and robustly invokes the intended meaning in semiotic physics, e.g. humans and GPTs tend to interpret it truly across diverse contexts.
quotes about True Names
Figuring out the True Name of a thing, a mathematical formulation sufficiently robust that one can apply lots of optimization pressure without the formulation breaking down, is absolutely possible and does happen. That said, finding such formulations is a sufficiently rare skill that most people will not ever have encountered it firsthand; it’s no surprise that many people automatically assume it impossible.
This is (one framing of) the fundamental reason why alignment researchers work on problems which sound like philosophy, or like turning philosophy into math. We are looking for the True Names of various relevant concepts - i.e. mathematical formulations robust enough that they will continue to work as intended even under lots of optimization pressure.
— John Wentworth, Why Agent Foundations? An Overly Abstract Explanation
The construction of a universal magic system is a difficult task fraught with danger for one primary reason: misunderstanding. It requires identifying the True Names of every component of reality over which we wish to have control, or at least, the True Name of this process itself. Pointing optimizing power towards “good-enough” proxies (i.e. how most of prosaic alignment, such as RLHF, is done today) works well enough for sufficiently weak systems, but try and use these proxies as pointers for something too powerful and you run straight into Goodhart: when a measure becomes a target, it ceases to be a good measure.
True Names are a staple of much fantasy fiction; knowing someone or something’s True Name gives you power over it. This terminology is also not alien to alignment, thanks to John Wentworth; here they refer to, as he puts it, mathematical formulations which are sufficiently robust to arbitrary optimization pressure: targets at which we can point powerful systems and expect good things as a result.
This is more than a coincidental or nominative similarity; these two cases are in fact exactly the same. A superintelligence with the True Names sufficient to cover our reality (and more besides - but we’ll get to that later) is exactly a magic system which can affect those things according to our wills. When this system does not have a True Name which encapsulates something, that thing will effectively be parsed as free energy to be sacrificed to the god of optimization, bartered away to whatever arbitrary inefficiencies the system identifies in pursuit of local reward maxima according to the other True Names it’s been given (or, lacking those, the lossy proxies it’s been given). This is the essence of the problem - power given without understanding. If we want to create a universal magic system, then we must do better, build something that works without inadvertent caveats.
The problem of aligning AI is the problem of aligning a magic system to the True Names of the universe.
— Gaspode, Welcome to the Dreamtime
Thankfully, I didn’t need to make up a word, or even look too far afield. Simulators have been spoken of before in the context of AI futurism; the ability to simulate with arbitrary fidelity is one of the modalities ascribed to hypothetical superintelligence. I’ve even often spotted the word “simulation” used in colloquial accounts of LLM behavior: GPT-3/LaMDA/etc described as simulating people, scenarios, websites, and so on. But these are the first (indirect) discussions I’ve encountered of simulators as a type creatable by prosaic machine learning, or the notion of a powerful AI which is purely and fundamentally a simulator, as opposed to merely one which can simulate.
A fun way to test whether a name you’ve come up with is effective at evoking its intended signification is to see if GPT, a model of how humans are conditioned by words, infers its correct definition in context.
Types of AI
Agents: An agent takes open-ended actions to optimize for an objective. Reinforcement learning produces agents by default. AlphaGo is an example of an agent.
Oracles: An oracle is optimized to give true answers to questions. The oracle is not expected to interact with its environment.
Genies: A genie is optimized to produce a desired result given a command. A genie is expected to interact with its environment, but unlike an agent, the genie will not act without a command.
Tools: A tool is optimized to perform a specific task. A tool will not act without a command and will not optimize for any objective other than its specific task. Google Maps is an example of a tool.
A simulator is optimized to generate realistic models of a system. The simulator will not optimize for any objective other than realism,although in the course of
doing so, it might generate instances of agents, oracles, and so on.