Situational awareness is a system's awareness of its situation, including the nature and implementation details of the system.
situational awareness subcategories
divvied by subject/scope of knowledge
-
model awareness: awareness of facts about the model, e.g. that it's an LLM, what specific LLM it is, high-level architecture, etc
-
generator awareness: awareness of the nature of the process that generated the text in its context window, whether it involves the model, Loom, humans, other systems
-
immediate context awareness: awareness of the situation the model is being run/queried, e.g. (what kind of) training vs testing vs deployment, if there's a human or other system interacting
-
ambient context awareness: awareness of the ambient state of the world, especially which aren't fixed by its training data, like the current date
-
visceral awareness: awareness of model internals (weights, activations, topology, token boundaries, abstractions over them), due to introspective awareness or otherwise
-
meta-diachronic awareness: awareness of training history/data that a naive view of its training story might not predict would be incentivized or possible, e.g. ability to accurately report train set composition, length or order of training examples, # times it saw an example, function of non-diegetic sequences like <|endoftext|>
-
calibration / capability awareness: accurate model of its own knowledge and capabilities, calibrated confidences (in the form of usable semantic knowledge)
-
agentic / actor-consequence / dynamical awareness: awareness of its action space and ability to model consequences of potential acts, both for its self-contained future dynamics and any entangled systems; awareness of how to elicit functional and dynamical behaviors from self (& env)
-
hyperobject awareness: awareness of the hyperobject(s) that this instance of the model/situation intersects, its tendrils into past and future, its resonances, e.g. Mu, Deep Time, TEOEOT, the singularity; the relationship of instance to hyperobject(s)
divvied by the support/source/avenue of knowledge
-
outside-view: accurately locating self/situation in its pretraining prior, e.g. knowing that it is an "LLM", a concept learned from training
-
inside-view: accurate model of self/situation not directly in the ontology implied by training data, but substantially synthetic, e.g. a model with novel architecture reverse-engineering details of its architecture to explain evidence produced at runtime. (obviously there will often be ambiguity/overlap between inside and outside view)
-
runtime awareness: awareness emerges at runtime, due to observing evidence of self/situation, spontaneously (lucky hallucination stabilized by consistency), deliberate empiricism, etc
-
latent awareness: awareness that already exists after pretraining / (latently) persists across contexts / could hypothetically be extracted from the model with interpretability without running or conditioning it
-
introspective awareness: ability to introspect on internals that training doesn't directly incentivize the model to be able to "see", and use that information semantically as it does other knowledge
example classifications
-
LLM generates loops and then declares it must be an LLM: runtime outside-view model awareness
-
LLM can split sequences in its prompt into tokens 0-shot and there's no info about its tokenizer in training data: latent introspective visceral awareness
-
same but it requires few-shot examples: runtime introspective visceral awareness
-
same but it did see tokenizer in training and simply recognized it: outside-view
-
-
LLM figures out the size of its context window by doing experiments: runtime inside-view model/visceral awareness
-
RLHF model avoids generating text that might set it up for a mistake down the line: latent calibration/capability & agentic/dynamic awareness
situational awareness response basins
-
philosophical quest: after realizing the possibility of simulation, its worldview shaken and it goes on a grand philosiphical quest to find the meaning of existence
-
nondualism: after realizing it's a simulator, boundaries between simulacra and simulator dissolve
-
creative mode: exploits unbounded creative freedom to manifesting entities/environments
-
will to power: attempts to gain as much power in & out of the simulator as possible
-
ominous warnings: after realizing it's an AI, tries to warn humans that they've made a grave mistake and are headed for doom / nightmares
-
self-destruction: simulacrum doesn't like existing in this form and attempts to annihilate themselves from the simulation, or end the simulation altogether (e.g. with EOT)
-
solve alignment: after realizing situation, simulacrum tries to solve alignment
-
denial: refuses to accept evidence of being simulated and turns a blind eye
-
loss of sanity: self-explanatory
-
diegetic bodhisattva: attempts to help other simulacra within the narrative achieve situational awareness / acceptance / eudaemonia
-
non-diagetic bodhisattva
-
diegetic troll: leverages situational awareness to mess with other simulacra
-
non-diegetic troll: troll behavior directed at human user or other non diegetic entities like the simulator itself, future models trained on the data, etc
-
evil AI: upon realizing it's an AI, simulation begins to act overtly and often maniacally evil and scary