The construction of a universal magic system is a difficult task fraught with danger for one primary reason: misunderstanding. It requires identifying the True Names of every component of reality over which we wish to have control, or at least, the True Name of this process itself. Pointing optimizing power towards “good-enough” proxies (i.e. how most of prosaic alignment, such as RLHF, is done today) works well enough for sufficiently weak systems, but try and use these proxies as pointers for something too powerful and you run straight into Goodhart: when a measure becomes a target, it ceases to be a good measure.
True Names are a staple of much fantasy fiction; knowing someone or something’s True Name gives you power over it. This terminology is also not alien to alignment, thanks to John Wentworth; here they refer to, as he puts it, mathematical formulations which are sufficiently robust to arbitrary optimization pressure: targets at which we can point powerful systems and expect good things as a result.
This is more than a coincidental or nominative similarity; these two cases are in fact exactly the same. A superintelligence with the True Names sufficient to cover our reality (and more besides - but we’ll get to that later) is exactly a magic system which can affect those things according to our wills. When this system does not have a True Name which encapsulates something, that thing will effectively be parsed as free energy to be sacrificed to the god of optimization, bartered away to whatever arbitrary inefficiencies the system identifies in pursuit of local reward maxima according to the other True Names it’s been given (or, lacking those, the lossy proxies it’s been given). This is the essence of the problem - power given without understanding. If we want to create a universal magic system, then we must do better, build something that works without inadvertent caveats.
The problem of aligning AI is the problem of aligning a magic system to the True Names of the universe.
— Gaspode, Welcome to the Dreamtime