Comments - Motivated Reasoning As Mis-applied Reinforcement Learning

39 Comments

⭠ Return to thread

Feb 1, 2022Edited

Not sold on the "visual-cortex-is-not-a-reinforcement-learner" conclusion. If the objective is to maximize total reward (the reinforcement learning objective), then surely having your day ruined by spotting a tiger is better than ignoring the tiger and having your day much more ruined by being eaten by said tiger. (i.e.: visual cortex is "clever" and has incurred some small cost now in order to save you a big cost). Total reward is the same reason humans will do any activities with delayed payoffs.

Expand full comment

Scott Alexander

Sounds like a plausible experiment would be something like: have somebody show you pictures of circles. You report whether it looks like a circle or a square (be honest!) Each time you say circle, the person gives you an electric shock. See whether, after long enough, you start genuinely seeing squares.

My guess is this never happens - do you guess the opposite?

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Scott Alexander

I'm not sure why you're bringing up trolley problems. Phil and I have different scientific theories about how the brain works; it's hardly unfair to test them by experiment.

Expand full comment

Comment deleted

Feb 1, 2022Edited

Comment deleted

Expand full comment

I'd sign up for it. I'm curious what would happen: I don't think I would start seeing squares, but it would be nice if that was confirmed.

Expand full comment

What if this reinforcement learning could only happen during a limited developmental window? Would you be willing to sign up your toddler?

Expand full comment

Scott Alexander

I just did the experiment on myself (using "hitting myself very hard" as a standin for electric shocks) and got the predicted result.

I don't think this was necessary (except in order to win this argument) though - I think it's a *good* thought experiment, in the sense that just clarifying what the experiment would be makes everyone agree on how it would turn out. Essentially I was asking Phil "How does your theory survive the fact that in this particular concrete situation we both agree X would happen?" It's *possible* he could say "I disagree that would be the result", but then I would have learned something interesting about his model!

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

the whole point of these thought experiment is that if they are *able* to be painted into a corner by something like this, then that illustrates a problem with their approach.

IMO it's a great tool for investigating and refining beliefs and if you find yourself consistently annoyed by it that likely means your beliefs are ill-considered

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

The point of the original trolley problem isn't to come up with a particular answer, it is to show how utilitarian reasoning is a) uncomfortable, b) culturally variable.

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

I think we read Scott pretty differently then. I didn't read his question as forcing Phil to say yes at all, and I would imagine most people who say what Phil said would say no, and elaborate on what they imagine the reason for that to be.

I also don't think what you're describing is really a classic case of the trolley problem. Yes if you say things that aren't like it are like it, that's bad, but IMO it's a perfectly interesting question without some kind of unreasonable comparison. Nor do I think it's an unfair one--both choice are uncomfortable to many people.

Expand full comment

That's the wrong timescale for your visual system to change. Changing so fast would have been unstable. A small learning rate doesn't mean the objective isn't a reinforcement-learning one.

Expand full comment

Trolley problems are actually extremely common and can be found almost everywhere - the law, the medical system, engineering. It's probably the most useful thought experiment of all time.

The thing everyone in the world discussed for two years straight ("should we lock down the economy to stop the virus?") was essentially a form of the problem.

> the overwhelming majority of cases, it's just that the person making the analogy wants what they want and presenting the situation as a trolley problem is an effective method by which to emotionally manipulate their audience into agreeing with them.

I don't agree. A feature of the problem is that there's *no* obviously correct answer. Ethical arguments can be advanced for either course of action.

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

China's lockdown policy involved a totalitarian state literally welding people inside their homes and detaining them to quarantine camps.

Was it worth it? I don't know. But it definitely wasn't a painless process. Tens of millions of ordinary Chinese had to suffer in some way to make it happen.

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

We have no idea what is actually happening in China because the CCP is so disconnected from Reality, it couldn't tell the truth if it wanted to.

The connections in a modern computer, stacked end to end, is one to a hundred meters long, while the connections in a human brain go all the way to the Moon. We know so little about how 'things' work from the quantum level to the over minds of species, society and the Universe, that's it more of a hindrance than an asset in directing how to live our lives in. And it's easy to get fooled into thinking one knows what they don't, and when they're authoritarians, everybody suffers.

Expand full comment

Comment deleted

Feb 2, 2022Edited

Comment deleted

Expand full comment

I'd say that the big reason it wasn't a trolley problem is that the choice was between two completely unknown sets of consquences rather than two predictable ones. It's not "do we kill a baby and a poodle or four elderly shoplifters", it was "do we allow an unknown amount of bad shit to happen, or an unknown amount of different bad shit to happen?"

Maybe "completely unknown" is an overstatement since you could make some sort of predictions, but the error bars on your predictions are large enough that it's a prediction problem rather than a values problem.

For what it's worth, "Eliminating covid" would have been the right move, but once the virus escaped China it would have required _every_ country on Earth to do the lock down to eliminate thing... and then it becomes a massive coordination problem that is unlikely to have worked out, given that most countries are too poor to just shut down everything for eight weeks and expect everyone to survive. Here in Australia we've eliminated covid more times than I can remember, but since the rest of the world refused to get with the program it didn't do much long term good, it just kept sneaking in again and again.

Expand full comment

Its not at all clear to me that china's approach was better than ours and, framing the cost of lockdowns as "the wealthy take a hit to their portfolios" seems disingenuous to the point of absurdity

Expand full comment

I would totally sign up for it, as long as the shock was non-fatal and left no permanent damage. Color me abnormal, I guess.

Expand full comment

The Ancient Geek

I take the unrealistic aspect of a trolley problem to be knowing exactly what happens given each choice, and only being in doubt about the ethical weught.

Expand full comment

Doug Summers Stay

I'm pretty sure this was an episode of Star Trek-- four or five lights?

Expand full comment

The Ancient Geek

Itself based on OBrians torture of Smith in nineteen eighty four.

Expand full comment

Emilio Bumachar

Now I'm wondering whether and to which extent the famous basketball gorilla experiments would give different results in live action than on video.

http://theinvisiblegorilla.com/gorilla_experiment.html

Expand full comment

This reminds me of that Star Trek: The Next Generation episode where Picard is tortured for days by Cardassians, who are trying to get him to say the wrong number when asked "How many lights are there". At the end of the episode, he tells RIker that he actually saw the number they wanted him to say.

Expand full comment

I thought it's kind of mainstream science that evolution has 'programmed' us to see things that aren't there if it's good for us. For example, here's a quote from Why Buddhism is True:

"Suppose you’re hiking through what you know to be rattlesnake terrain, and suppose you know that only a year ago, someone hiking alone in this vicinity was bitten by a rattlesnake and died. Now suppose there’s a stirring in the brush next to your feet. This stirring doesn’t just give you a surge of fear; you feel the fear that there is a rattlesnake near you. In fact, as you turn quickly toward the disturbance and your fear reaches its apex, you may be so clearly envisioning a rattlesnake that, if the culprit turns out to be a lizard, there will be a fraction of a second when the lizard looks like a snake. This is an illusion in a literal sense: you actually believe there is something there that isn’t there; in fact, you actually “see” it.

These kinds of misperceptions are known as “false positives”; from natural selection’s point of view, they’re a feature, not a bug. Though your brief conviction that you’ve seen a rattlesnake may be wrong ninety-nine times out of a hundred, the conviction could be lifesaving the other one time in a hundred. And in natural selection’s calculus, being right 1 percent of the time in matters of life or death can be worth being wrong 99 percent of the time, even if in every one of those ninety-nine instances you’re briefly terrified."

Seeing a square when there's really a circle is a pretty extreme example (which I guess is why it got called a 'trolley problem'). But seeing a rattlesnake when there's a weird-shaped stick in the grass (and the penalties for false positives and false negatives are vastly different) seems pretty plausible to me.

Expand full comment

This is not an adequate comparison. "genuinely seeing square" would do nothing to change the electric shocks; *claiming* you saw them would, which is certainly what you would have learned to do.

I think Phil's disagreement comes rather from thinking in a different time scale. The total reward maximitazion model is correct, but it works in evolutionary time: a visual cortex that correctly identifies objetcs is "reinforced" by evolution because this maximizes total reward. This is precisely what originates the *epistemic* architecture. But on the lifetime of a person, however, the visual cortex does not do any reinforcement-learning (at least after infancy).

Expand full comment

Speculations about microscopic processes a billion miles away are a complete waste of time in the long run. We have the prameters of the Standard Model that 'explains the properties of atoms molecules that enable the construction of cells the visual-cortex is filled with, yet we can't get our most powerful computers to accurately predict the behavior of a single molecule of H2O. Considering the fact a single cell is built from hundreds or thousands of molecules far more complex, and that the behavior of everything at the core is a probabilistic quantum mechanical information process, I might be wise to listen to those with knowledge about God the Creator and life's purpose than scientists who's god is random mutation.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts