𝌎Self-Supervised Learning

Self-supervised learning (SSL) is a variant of supervised learning in which training samples provide their own "labels", in contrast to classic supervised learning where labels are generated by an auxiliary process such as human labelers. A common form of self-supervised learning teaches a model to "fill in the blank" using obfuscated sections of training examples as ground truth. Self-supervised learning enables training on massive and natural datasets, and often, open-ended inference chains at runtime (like autoregressive generation).

quotes about self-supervised learning

The assumptions of the supervised learning paradigm are:

  • The model is optimized to answer questions correctly

  • Tasks are closed-ended, defined by question/correct answer pairs

These are essentially the assumptions of oracle AI, as described by Bostrom and in subsequent usage.


Recall, the second supervised assumption is that “tasks are closed-ended, defined by question/correct answer pairs”. GPT was trained on context-completion pairs. But the pairs do not represent closed, independent tasks, and the division into question and answer is merely indexical: in another training sample, a token from the question is the answer, and in yet another, the answer forms part of the question[17].

For example, the natural language sequence “The answer is a question” yields training samples like:

{context: “The”, completion: “ answer”},

{context: “The answer”, completion: “ is”},

{context: “The answer is”, completion: “ a”},

{context: “The answer is a”, completion: “ question”}

Since questions and answers are of compatible types, we can at runtime sample answers from the model and use them to construct new questions, and run this loop an indefinite number of times to generate arbitrarily long sequences that obey the model’s approximation of the rule that links together the training samples. The “question” GPT answers is “what token comes next after {context}”. This can be asked interminably, because its answer always implies another question of the same type.

In contrast, models trained with supervised learning output answers that cannot be used to construct new questions, so they’re only good for one step.