Evidence is observable information that is differentially entangled with some hypotheses more than others. Observation of evidence by an epistemic observer causes an update. The strength of evidence associated with an observation is relative to a prior.
evidence in Bayes' theorem
Bayes' theorem describes how an observation should be transformed into evidence that updates a prior:
P(A|B) = P(B|A) P(A) / P(B)
A is a hypothesis,
B is an observed event,
P(A|B) is the posterior probability of the hypothesis conditioned on the observation,
P(B|A) is the probability of the observation if the hypothesis were true,
P(A) is the prior probability of the hypothesis being true, and
P(B) is the prior probability of observing the event.
The prior probability
P(A) is scaled by the multiplicative term
P(B|A) / P(B), which represents the strength of evidence provided by
P(B|A) > P(B), which means
B is more likely to be observed in a world where
A is definitely true than it is on priors, then
P(B|A) / P(B) > 1 and observing
B is supporting evidence for
P(B|A) < P(B), then
B is evidence against
P(B|A) = P(B),
B gives no evidence about
A, as it was equally likely to be observed whether or not
A is true.
tractable evaluation of evidence through compression
In high-dimensional hypothesis and observation spaces, there are far too many hypotheses and possible events to separately compute or store all their individual and pairwise conditional likelihoods. Intelligence is thus necessary to tractably approximate the strength of evidence, by doing something like:
Also having the space of possible hypotheses and past observations compressed using abstractions. The more efficiently any given hypothesis or abstraction over hypotheses can be compressed alongside past observations, the more likely it is. (We might even say hypotheses are compressions of past observations)
If the abstractions that efficiently compress the observation also efficiently compress a given hypothesis, a world where that hypothesis is true and the observation happens is more compressible, and therefore more likely.
None of these steps need be the result of conscious reasoning, nor is abstract understanding of the algorithm required to use it (it suffices for the algorithm to be selected for). Mechanistically, the last step probably looks something like a constructive interference pattern between the representation of the observation and those of hypotheses represented with shared abstractions.
If conscious reasoning is used here, it probably looks something like activating different abstractions (over the prior and/or the data) in one's working memory via chains of associations and transformations to search for efficient compression opportunities that may not have been caught by the intuitive-interference-pattern.