Gabriel on Astral Codex Ten

2 Comments

Feb 2, 2022

The brain trains on magnitude and acts on sign.

That is to say, there are two different kinds of "module" that are relevant to this problem as you described, but they're not RL and other; they're both other. The learning parts are not precisely speaking reinforcement learning, at least not by the algorithm you described. They're learning the whole map of value, like a topographic map. Then the acting parts find themselves on the map and figure out which way leads upward toward better outcomes.

More precisely then: The brain learns to predict value and acts on the gradient of predicted value.

The learning parts are trying to find both opportunities and threats, but not unimportant mundane static facts. This is why, for example, people are very good at remembering and obsessing over intensely negative events that happened to them -- which they would not be able to do in the RL model the post describes! We're also OK at remembering intensely positive events that happened to us. But ordinary observations of no particular value mostly make no lasting impression. You could test this by a series of 3 experiments, in each of which you have a screen flash several random emoji on screen, and each time a specific emoji is shown to the subject, you either (A) penalize the subject such as with a shock, or (B) reward the subject such as with sweet liquid when they're thirsty, or (C) give the subject a stimulus that has no significant magnitude, whether positive or negative, such as changing the pitch of a quiet ongoing buzz that they were not told was relevant. I'd expect subjects in both conditions A and B to reliably identify the key emoji, whereas I'd expect quite a few subjects in condition C to miss it.

By learning associates with a degree of value, whether positive or negative, it's possible to then act on the gradient in pursuit of whatever available option has highest value. This works reliably and means we can not only avoid hungry lions and seek nice ripe bananas, but we also do compare two negative or two positives and choose appropriately: like whether you jump off a dangerous cliff to avoid the hungry lion, or whether you want to eat the nice ripe banana yourself or share it with your lover to your mutual delight. The gradient can be used whether we're in a good situation or a bad one. You could test this by adapting the previous experiment: associate multiple emoji with stimuli of various values (big shock, medium shock, little shock, plain water, slightly sweet water, more sweet water, various pitch changes in a background buzz), show two screens with several random emoji, and the subject receives the effect of the first screen unless they tap the second. I'd expect subjects to learn to act reliably to get the better of the two options, regardless of sign, and to be most reliable when the magnitude difference is large.

For an alternative way of explaining this situation, see Fox's comment, which I endorse.

OK, now to finally get around to motivated reasoning. The thoughts that will be promoted to your attention for action are those that are the predicted to lead to the best value. You can roughly separate that into two aspects as "salience = probability of being right * value achieved if right". Motivated reasoning happens when the "value achieved if right" dominates the "probability of being right". And well, that's pretty much always, in abstract issues where we don't get clear feedback on probabilities. The solution for aspiring skeptics is to heap social rewards on being right and using methods that help us be more right. Or to stick to less abstract claims. You could test this again by making the emojis no longer a certainty of reward/penalty, but varying probabilities.

Source: I trained monkeys to do neuroscience experiments.

Expand full comment

Reply (1)

apxhard

Jul 25, 2022Edited

Are there any books or sources you might point to that would go more into detail on these two different maps?

I'm wondering particularly how they are arranged, for example, and how their communication is coordinated.

Is that value map always operating? How stable is it? Is there some other map for mundane facts, separate from the value map?

Thanks in advance :)

Expand full comment