lexicap

link |

00:00:0.00

[Lex] The following is a conversation with Yann LeCun,

link |

00:00:2.75

[Lex] his second time on the podcast.

link |

00:00:4.59

[Lex] He is the chief AI scientist at Meta, formerly Facebook,

link |

00:00:9.19

[Lex] professor at NYU, touring award winner,

link |

00:00:13.07

[Lex] one of the seminal figures in the history

link |

00:00:15.63

[Lex] of machine learning and artificial intelligence,

link |

00:00:18.49

[Lex] and someone who is brilliant and opinionated

link |

00:00:21.97

[Lex] in the best kind of way,

link |

00:00:23.43

[Lex] and so is always fun to talk to.

link |

00:00:25.99

[Lex] This is the Lex Friedman podcast.

link |

00:00:27.99

[Lex] To support it, please check out our sponsors

link |

00:00:29.95

[Lex] in the description.

link |

00:00:31.23

[Lex] And now, here's my conversation with Yann LeCun.

link |

00:00:36.15

[Lex] You co-wrote the article,

link |

00:00:37.49

[Lex] "'Self-Supervised Learning,

link |

00:00:38.85

[Lex] the Dark Matter of Intelligence."

link |

00:00:40.89

[Lex] Great title, by the way, with Ishan Mizra.

link |

00:00:43.71

[Lex] So let me ask, what is self-supervised learning,

link |

00:00:46.61

[Lex] and why is it the dark matter of intelligence?

link |

00:00:49.90

[Yann] I'll start by the dark matter part.

link |

00:00:51.70

[Yann] There is obviously a kind of learning

link |

00:00:55.70

[Yann] that humans and animals are doing

link |

00:00:59.90

[Yann] that we currently are not reproducing properly

link |

00:01:2.86

[Yann] with machines or with AI, right?

link |

00:01:4.70

[Yann] So the most popular approaches

link |

00:01:6.30

[Yann] to machine learning today are,

link |

00:01:8.50

[Yann] or paradigms, I should say,

link |

00:01:9.70

[Yann] are supervised learning and reinforcement learning.

link |

00:01:12.74

[Yann] And they are extremely inefficient.

link |

00:01:15.16

[Yann] Supervised learning requires many samples

link |

00:01:17.66

[Yann] for learning anything,

link |

00:01:19.78

[Yann] and reinforcement learning requires

link |

00:01:21.82

[Yann] a ridiculously large number of trial and errors

link |

00:01:25.24

[Yann] for a system to learn anything.

link |

00:01:29.32

[Yann] And that's why we don't have self-driving cars.

link |

00:01:32.95

[Lex] That was a big leap from one to the other.

link |

00:01:34.79

[Lex] Okay, so to solve difficult problems,

link |

00:01:38.77

[Lex] you have to have a lot of human annotation

link |

00:01:42.35

[Lex] for supervised learning to work.

link |

00:01:44.07

[Lex] And to solve those difficult problems

link |

00:01:45.53

[Lex] with reinforcement learning,

link |

00:01:46.69

[Lex] you have to have some way to maybe simulate that problem

link |

00:01:50.23

[Lex] such that you can do that large-scale kind of learning

link |

00:01:52.71

[Lex] that reinforcement learning requires.

link |

00:01:54.51

[Yann] Right, so how is it that most teenagers

link |

00:01:58.01

[Yann] can learn to drive a car in about 20 hours of practice?

link |

00:02:2.29

[Yann] Whereas even with millions of hours of simulated practice,

link |

00:02:7.49

[Yann] a self-driving car can't actually learn

link |

00:02:9.21

[Yann] to drive itself properly.

link |

00:02:10.71

[Yann] And so, obviously, we're missing something, right?

link |

00:02:12.63

[Yann] And it's quite obvious for a lot of people

link |

00:02:14.35

[Yann] that the immediate response you get from many people is,

link |

00:02:18.51

[Yann] well, humans use their background knowledge

link |

00:02:21.51

[Yann] to learn faster, and they're right.

link |

00:02:24.47

[Yann] Now, how was that background knowledge acquired?

link |

00:02:26.95

[Yann] And that's the big question.

link |

00:02:28.78

[Yann] So now you have to ask,

link |

00:02:31.14

[Yann] how do babies in the first few months of life

link |

00:02:33.78

[Yann] learn how the world works?

link |

00:02:35.90

[Yann] Mostly by observation,

link |

00:02:36.90

[Yann] because they can hardly act in the world.

link |

00:02:39.34

[Yann] And they learn an enormous amount

link |

00:02:41.02

[Yann] of background knowledge about the world

link |

00:02:42.30

[Yann] that may be the basis of what we call common sense.

link |

00:02:46.50

[Yann] This type of learning is not learning a task,

link |

00:02:49.74

[Yann] it's not being reinforced for anything,

link |

00:02:52.02

[Yann] it's just observing the world and figuring out how it works.

link |

00:02:56.73

[Yann] Building world models, learning world models.

link |

00:02:59.69

[Yann] How do we do this?

link |

00:03:0.53

[Yann] And how do we reproduce this in machines?

link |

00:03:3.01

[Yann] So self-supervised learning is a very, very important thing

link |

00:03:6.01

[Yann] is one instance or one attempt

link |

00:03:10.29

[Yann] at trying to reproduce this kind of learning.

link |

00:03:13.07

[Lex] Okay, so you're looking at just observation,

link |

00:03:16.35

[Lex] so not even the interacting part of a child.

link |

00:03:18.67

[Lex] It's just sitting there watching mom and dad walk around,

link |

00:03:21.55

[Lex] pick up stuff, all of that.

link |

00:03:23.43

[Lex] That's what we mean by background knowledge.

link |

00:03:25.49

[Yann] Perhaps not even watching mom and dad,

link |

00:03:29.96

[Lex] Just having eyes open or having eyes closed

link |

00:03:31.88

[Lex] or the very act of opening and closing eyes

link |

00:03:34.44

[Lex] that the world appears and disappears,

link |

00:03:36.24

[Lex] all that basic information.

link |

00:03:39.10

[Lex] And you're saying in order to learn to drive,

link |

00:03:43.10

[Lex] like the reason humans are able to learn to drive quickly,

link |

00:03:45.82

[Lex] some faster than others,

link |

00:03:47.34

[Lex] is because of the background knowledge

link |

00:03:51.71

[Lex] in the many years leading up to it,

link |

00:03:53.59

[Lex] the physics of basic objects, all that.

link |

00:03:55.72

[Yann] That's right.

link |

00:03:56.56

[Yann] I mean, the basic physics of objects,

link |

00:03:57.40

[Yann] you don't even need to know how a car works

link |

00:04:0.84

[Yann] because that you can learn fairly quickly.

link |

00:04:2.47

[Yann] I mean, the example I use very often

link |

00:04:3.79

[Yann] is you're driving next to a cliff.

link |

00:04:6.62

[Yann] And you know in advance because of your understanding

link |

00:04:10.54

[Yann] of intuitive physics that if you turn the wheel

link |

00:04:13.14

[Yann] to the right, the car will veer to the right,

link |

00:04:15.02

[Yann] will run off the cliff, fall off the cliff,

link |

00:04:17.54

[Yann] and nothing good will come out of this, right?

link |

00:04:20.38

[Yann] But if you are a sort of tabularized

link |

00:04:23.74

[Yann] reinforcement learning system

link |

00:04:25.06

[Yann] that doesn't have a model of the world,

link |

00:04:28.14

[Yann] you have to repeat falling off this cliff

link |

00:04:30.46

[Yann] thousands of times before you figure out it's a bad idea.

link |

00:04:32.78

[Yann] And then a few more thousand times

link |

00:04:34.54

[Yann] before you figure out how to not do it.

link |

00:04:37.06

[Yann] And then a few more million times before you figure out

link |

00:04:39.18

[Yann] how to not do it in every situation you ever encounter.

link |

00:04:42.52

[Lex] So self-supervised learning still has to have

link |

00:04:45.80

[Lex] some source of truth being told to it by somebody.

link |

00:04:50.56

[Lex] So you have to figure out a way without human assistance

link |

00:04:54.56

[Lex] or without significant amount of human assistance

link |

00:04:56.60

[Lex] to get that truth from the world.

link |

00:04:59.08

[Lex] So the mystery there is how much signal is there,

link |

00:05:3.96

[Lex] how much truth is there that the world gives you,

link |

00:05:6.28

[Lex] whether it's the human world, like you watch YouTube

link |

00:05:9.12

[Lex] or something like that, or it's the more natural world.

link |

00:05:12.97

[Lex] So how much signal is there?

link |

00:05:14.90

[Yann] So here's the trick, there is way more signal

link |

00:05:18.55

[Yann] in sort of a self-supervised setting

link |

00:05:20.59

[Yann] than there is in either a supervised

link |

00:05:22.51

[Yann] or reinforcement setting.

link |

00:05:24.51

[Yann] And this is going to my analogy of the cake.

link |

00:05:28.39

[Yann] The luck cake, as someone has called it,

link |

00:05:32.35

[Yann] where when you try to figure out how much information

link |

00:05:36.03

[Yann] you ask the machine to predict and how much feedback

link |

00:05:38.71

[Yann] you give the machine at every trial.

link |

00:05:41.00

[Yann] In reinforcement learning,

link |

00:05:41.84

[Yann] you give the machine a single scalar.

link |

00:05:43.28

[Yann] You tell the machine you did good, you did bad.

link |

00:05:45.36

[Yann] And you only tell this to the machine once in a while.

link |

00:05:49.60

[Yann] When I say you, it could be the universe

link |

00:05:51.40

[Yann] telling the machine, right?

link |

00:05:54.08

[Yann] But it's just one scalar.

link |

00:05:55.80

[Yann] And so as a consequence,

link |

00:05:57.12

[Yann] you cannot possibly learn something very complicated

link |

00:05:59.56

[Yann] without many, many, many trials

link |

00:06:1.08

[Yann] where you get many, many feedbacks of this type.

link |

00:06:4.72

[Yann] Supervised learning, you give a few bits to the machine

link |

00:06:8.84

[Yann] at every sample.

link |

00:06:11.22

[Yann] Let's say you're training a system

link |

00:06:13.70

[Yann] on recognizing images on ImageNet,

link |

00:06:16.30

[Yann] there is 1,000 categories,

link |

00:06:17.66

[Yann] that's a little less than 10 bits of information per sample.

link |

00:06:22.15

[Yann] But self-supervised learning, here is the setting.

link |

00:06:23.81

[Yann] You ideally, we don't know how to do this yet,

link |

00:06:26.33

[Yann] but ideally you would show a machine a segment of a video

link |

00:06:31.63

[Yann] and then stop the video and ask the machine to predict

link |

00:06:34.17

[Yann] what's going to happen next.

link |

00:06:37.56

[Yann] And so you let the machine predict

link |

00:06:38.68

[Yann] and then you let time go by

link |

00:06:41.36

[Yann] and show the machine what actually happened

link |

00:06:44.32

[Yann] and hope the machine will learn to do a better job

link |

00:06:47.90

[Yann] at predicting next time around.

link |

00:06:49.36

[Yann] There's a huge amount of information you give the machine

link |

00:06:51.56

[Yann] because it's an entire video clip.

link |

00:06:53.78

[Yann] It's of the future after the video clip

link |

00:06:57.92

[Yann] you fed it in the first place.

link |

00:07:0.24

[Lex] So both for language and for vision,

link |

00:07:2.84

[Lex] there's a subtle, seemingly trivial construction,

link |

00:07:6.92

[Lex] but maybe that's representative

link |

00:07:8.52

[Lex] of what is required to create intelligence,

link |

00:07:10.62

[Lex] which is filling the gap.

link |

00:07:13.75

[Lex] So. Filling the gaps.

link |

00:07:13.78

[Yann] Filling the gaps.

link |

00:07:12.70

[Lex] It sounds dumb, but can you,

link |

00:07:18.34

[Lex] it is possible that you could solve

link |

00:07:21.18

[Lex] all of intelligence in this way.

link |

00:07:23.02

[Lex] Just for both language,

link |

00:07:25.28

[Lex] just give a sentence and continue it

link |

00:07:28.80

[Lex] or give a sentence and there's a gap in it.

link |

00:07:32.08

[Lex] Some words blanked out and you fill in what words go there.

link |

00:07:35.70

[Lex] For vision, you give a sequence of images

link |

00:07:39.18

[Lex] and predict what's going to happen next

link |

00:07:40.94

[Lex] or you fill in what happened in between.

link |

00:07:43.82

[Lex] Do you think it's possible that formulation alone

link |

00:07:48.61

[Lex] as a signal for self-supervised learning

link |

00:07:50.96

[Lex] can solve intelligence for vision and language?

link |

00:07:53.60

[Yann] I think that's our best shot at the moment.

link |

00:07:56.28

[Yann] So whether this will take us all the way

link |

00:07:59.12

[Yann] to human level intelligence or something

link |

00:08:1.76

[Yann] or just cat level intelligence is not clear,

link |

00:08:4.84

[Yann] but among all the possible approaches

link |

00:08:7.32

[Yann] that people have proposed, I think it's our best shot.

link |

00:08:9.50

[Yann] So I think this idea of an intelligence system

link |

00:08:14.64

[Yann] filling in the blanks, either predicting the future,

link |

00:08:18.86

[Yann] inferring the past, filling in missing information.

link |

00:08:23.76

[Yann] I'm currently filling the blank of what is behind your head

link |

00:08:26.66

[Yann] and what your head looks like from the back

link |

00:08:30.60

[Yann] because I have basic knowledge about how humans are made.

link |

00:08:33.76

[Yann] And I don't know if you're gonna,

link |

00:08:35.52

[Yann] what word you're gonna say,

link |

00:08:36.36

[Yann] which point you're gonna speak,

link |

00:08:37.28

[Yann] whether you're gonna move your head this way or that way,

link |

00:08:38.96

[Yann] which way you're gonna look,

link |

00:08:40.26

[Yann] but I know you're not gonna just dematerialize

link |

00:08:42.08

[Yann] and reappear three meters down the hall

link |

00:08:46.26

[Yann] because I know what's possible and what's impossible

link |

00:08:50.92

[Lex] So you have a model of what's possible,

link |

00:08:52.64

[Lex] what's impossible and then you'd be very surprised

link |

00:08:54.52

[Lex] if it happens and then you'll have to reconstruct your model.

link |

00:08:57.85

[Yann] Right, so that's the model of the world.

link |

00:08:59.61

[Yann] It's what tells you, you know, what fills in the blanks.

link |

00:09:2.29

[Yann] So given your partial information

link |

00:09:4.49

[Yann] about the state of the world, given by your perception,

link |

00:09:8.09

[Yann] your model of the world fills in the missing information

link |

00:09:11.37

[Yann] and that includes predicting the future,

link |

00:09:13.77

[Yann] re-predicting the past, you know,

link |

00:09:16.21

[Yann] filling in things you don't immediately perceive.

link |

00:09:18.39

[Lex] And that doesn't have to be purely generic vision

link |

00:09:22.27

[Lex] or visual information or generic language.

link |

00:09:24.35

[Lex] You can go to specifics like predicting

link |

00:09:28.91

[Lex] what control decision you make

link |

00:09:30.29

[Lex] when you're driving in a lane.

link |

00:09:31.63

[Lex] You have a sequence of images from a vehicle

link |

00:09:35.63

[Lex] and then you have information if you record it on video

link |

00:09:39.67

[Lex] where the car ended up going so you can go back in time

link |

00:09:43.67

[Lex] and predict where the car went

link |

00:09:45.53

[Lex] based on the visual information.

link |

00:09:46.69

[Lex] That's very specific, domain specific.

link |

00:09:49.43

[Yann] Right, but the question is whether we can come up

link |

00:09:51.47

[Yann] with sort of a generic method for, you know,

link |

00:09:56.39

[Yann] training machines to do this kind of prediction

link |

00:09:58.53

[Yann] or filling in the blanks.

link |

00:09:59.85

[Yann] So right now, this type of approach

link |

00:10:3.27

[Yann] has been unbelievably successful

link |

00:10:5.59

[Yann] in the context of natural language processing.

link |

00:10:8.19

[Yann] Every modern natural language processing

link |

00:10:9.71

[Yann] is pre-trained in self-supervised manner

link |

00:10:12.30

[Yann] to fill in the blanks.

link |

00:10:13.74

[Yann] You show it a sequence of words, you remove 10% of them

link |

00:10:16.42

[Yann] and then you train some gigantic neural net

link |

00:10:17.94

[Yann] to predict the words that are missing.

link |

00:10:20.34

[Yann] And once you've pre-trained that network,

link |

00:10:22.74

[Yann] you can use the internal representation learned by it

link |

00:10:26.62

[Yann] as input to, you know, something that you train,

link |

00:10:29.78

[Yann] supervised or whatever.

link |

00:10:32.22

[Yann] That's been incredibly successful.

link |

00:10:33.40

[Yann] Not so successful in images, although it's making progress

link |

00:10:37.62

[Yann] and it's based on sort of manual data augmentation.

link |

00:10:42.46

[Yann] We can go into this later,

link |

00:10:43.58

[Yann] but what has not been successful yet

link |

00:10:45.70

[Yann] is training from video.

link |

00:10:47.22

[Yann] So getting a machine to learn,

link |

00:10:48.54

[Yann] to represent the visual world, for example,

link |

00:10:51.58

[Yann] by just watching video.

link |

00:10:52.84

[Yann] Nobody has really succeeded in doing this.

link |

00:10:54.84

[Lex] Okay, well, let's kind of give a high-level overview.

link |

00:10:57.60

[Lex] What's the difference in kind and in difficulty

link |

00:11:2.43

[Lex] between vision and language?

link |

00:11:4.00

[Lex] So you said people haven't been able

link |

00:11:6.27

[Lex] to really kind of crack the problem of vision open

link |

00:11:10.51

[Lex] in terms of self-supervised learning,

link |

00:11:12.00

[Lex] but that may not be necessarily

link |

00:11:13.84

[Lex] because it's fundamentally more difficult.

link |

00:11:15.89

[Lex] Maybe like when we're talking about achieving,

link |

00:11:18.75

[Lex] like passing the Turing test in the full spirit

link |

00:11:22.36

[Lex] of the Turing test in language might be harder than vision.

link |

00:11:24.96

[Lex] That's not obvious.

link |

00:11:26.43

[Lex] So in your view, which is harder,

link |

00:11:29.46

[Lex] or perhaps are they just the same problem?

link |

00:11:33.22

[Lex] The farther we get to solving each,

link |

00:11:34.88

[Lex] the more we realize it's all the same thing.

link |

00:11:36.74

[Lex] It's all the same cake.

link |

00:11:40.22

[Yann] that make them look essentially like the same cake,

link |

00:11:43.62

[Yann] but currently they're not.

link |

00:11:44.82

[Yann] And the main issue with learning world models

link |

00:11:48.52

[Yann] or learning predictive models is that the prediction

link |

00:11:53.18

[Yann] is never a single thing

link |

00:11:55.94

[Yann] because the world is not entirely predictable.

link |

00:11:59.26

[Yann] It may be deterministic or stochastic.

link |

00:12:0.74

[Yann] We can get into the philosophical discussion about it,

link |

00:12:2.98

[Yann] but even if it's deterministic,

link |

00:12:5.30

[Yann] it's not entirely predictable.

link |

00:12:7.46

[Yann] And so if I play a short video clip

link |

00:12:11.78

[Yann] and then I ask you to predict what's going to happen next,

link |

00:12:14.18

[Yann] there's many, many plausible continuations

link |

00:12:16.38

[Yann] for that video clip.

link |

00:12:18.38

[Yann] And the number of continuation grows

link |

00:12:20.58

[Yann] with the interval of time that you're asking the system

link |

00:12:23.94

[Yann] to make a prediction for.

link |

00:12:26.50

[Yann] And so one big question with self-supervised learning

link |

00:12:29.92

[Yann] is how you represent this uncertainty,

link |

00:12:32.38

[Yann] how you represent multiple discrete outcomes,

link |

00:12:35.26

[Yann] how you represent a sort of continuum

link |

00:12:37.14

[Yann] of possible outcomes, et cetera.

link |

00:12:40.46

[Yann] And if you are sort of a classical machine learning person,

link |

00:12:45.26

[Yann] you say, oh, you just represent a distribution, right?

link |

00:11:59.08

[Lex] Yeah.

link |

00:12:49.17

[Yann] And that we know how to do when we're predicting words,

link |

00:12:52.61

[Yann] missing words in the text,

link |

00:12:53.75

[Yann] because you can have a neural net give a score

link |

00:12:56.89

[Yann] for every word in the dictionary.

link |

00:12:58.65

[Yann] It's a big list of numbers,

link |

00:13:0.85

[Yann] maybe 100,000 or so.

link |

00:13:2.53

[Yann] And you can turn them into a probability distribution

link |

00:13:5.29

[Yann] that tells you when I say a sentence,

link |

00:13:9.89

[Yann] the cat is chasing the blank in the kitchen.

link |

00:13:13.19

[Yann] There are only a few words that make sense there.

link |

00:13:15.89

[Yann] It could be a mouse or it could be a lizard spot

link |

00:13:18.43

[Yann] or something like that, right?

link |

00:13:21.59

[Yann] And if I say the blank is chasing the blank in the savanna,

link |

00:13:25.87

[Yann] you also have a bunch of plausible options

link |

00:13:27.87

[Yann] for those two words, right?

link |

00:13:30.99

[Yann] Because you have kind of an underlying reality

link |

00:13:33.67

[Yann] that you can refer to sort of fill in those blanks.

link |

00:13:38.13

[Yann] So you cannot say for sure in the savanna,

link |

00:13:42.07

[Yann] if it's a lion or a cheetah or whatever,

link |

00:13:44.51

[Yann] you cannot know if it's a zebra or a do or whatever,

link |

00:13:49.59

[Yann] wildebeest, the same thing.

link |

00:13:56.86

[Yann] by just a long list of numbers.

link |

00:14:1.81

[Yann] and I ask you to predict a video clip,

link |

00:14:4.33

[Yann] it's not a discrete set of potential frames.

link |

00:14:7.41

[Yann] You have to have somewhere representing

link |

00:14:10.01

[Yann] a sort of infinite number of plausible continuations

link |

00:14:13.57

[Yann] of multiple frames in a high dimensional continuous space.

link |

00:14:17.49

[Yann] And we just have no idea how to do this properly.

link |

00:14:20.53

[Lex] Finite high dimensional.

link |

00:14:22.88

[Lex] So like you,

link |

00:14:25.28

[Lex] Just like the words,

link |

00:14:26.24

[Lex] they try to get it down to a small finite set

link |

00:14:32.20

[Lex] of like under a million, something like that.

link |

00:14:34.20

[Lex] Something like that.

link |

00:14:35.04

[Lex] I mean, it's kind of ridiculous

link |

00:14:36.00

[Lex] that we're doing a distribution

link |

00:14:39.00

[Lex] of every single possible word for language and it works.

link |

00:14:42.88

[Lex] It feels like that's a really dumb way to do it.

link |

00:14:34.22

[Yann] Something like that.

link |

00:14:43.83

[Lex] I mean, it seems to be like there should be

link |

00:14:49.71

[Lex] some more compressed representation

link |

00:14:52.91

[Lex] of the distribution of the words.

link |

00:14:54.92

[Yann] You're right about that.

link |

00:14:56.12

[Lex] And so- I agree.

link |

00:14:57.16

[Lex] Do you have any interesting ideas

link |

00:14:58.88

[Lex] about how to represent all of reality in a compressed way

link |

00:15:1.84

[Lex] such that you can form a distribution over it?

link |

00:15:3.79

[Yann] That's one of the big questions.

link |

00:15:5.03

[Yann] How do you do that?

link |

00:15:6.19

[Yann] I mean, what's kind of,

link |

00:15:7.99

[Yann] another thing that really is stupid about,

link |

00:15:12.11

[Yann] I shouldn't say stupid,

link |

00:15:13.07

[Yann] but like simplistic about current approaches

link |

00:15:15.51

[Yann] to self-supervised running in NLP in text

link |

00:15:19.31

[Yann] is that not only do you represent

link |

00:15:21.91

[Yann] a giant distribution over words,

link |

00:15:23.79

[Yann] but for multiple words that are missing,

link |

00:15:25.63

[Yann] those distributions are essentially independent

link |

00:15:27.67

[Yann] of each other.

link |

00:15:30.15

[Yann] And you don't pay too much of a price for this.

link |

00:15:33.03

[Yann] So you can't, so the system,

link |

00:15:36.75

[Yann] in the sentence that I gave earlier,

link |

00:15:39.71

[Yann] if it gives a certain probability for lion and cheetah,

link |

00:15:43.63

[Yann] and then a certain probability for gazelle,

link |

00:15:47.75

[Yann] wildebeest and zebra,

link |

00:15:50.27

[Yann] those two probabilities are independent of each other.

link |

00:15:55.41

[Yann] And it's not the case that those things are independent.

link |

00:15:57.89

[Yann] Lions actually attack like bigger animals than cheetahs.

link |

00:16:1.29

[Yann] So, you know, there's a huge independent hypothesis

link |

00:16:4.41

[Yann] in this process, which is not actually true.

link |

00:16:7.65

[Yann] The reason for this is that we don't know

link |

00:16:9.73

[Yann] how to represent properly distributions

link |

00:16:12.85

[Yann] over combinatorial sequences of symbols,

link |

00:16:16.09

[Yann] essentially when they're,

link |

00:16:17.21

[Yann] because the number grows exponentially

link |

00:16:18.85

[Yann] with the length of the symbols.

link |

00:16:21.17

[Yann] And so we have to use tricks for this,

link |

00:16:22.61

[Yann] but those techniques can, you know, get around,

link |

00:16:26.25

[Yann] like don't even deal with it.

link |

00:16:27.65

[Yann] So the big question is like,

link |

00:16:30.33

[Yann] would there be some sort of abstract

link |

00:16:33.25

[Yann] latent representation of text that would say that,

link |

00:16:36.85

[Yann] you know, when I switch lion for gazelle,

link |

00:16:45.48

[Lex] Yeah, so this independence assumption,

link |

00:16:48.72

[Lex] let me throw some criticism at you that I often hear

link |

00:16:52.93

[Lex] So this kind of filling in the blanks is just statistics.

link |

00:16:56.01

[Lex] You're not learning anything

link |

00:16:58.89

[Lex] like the deep underlying concepts.

link |

00:17:1.61

[Lex] You're just mimicking stuff from the past.

link |

00:17:5.65

[Lex] You're not learning anything new

link |

00:17:7.53

[Lex] such that you can use it to generalize about the world.

link |

00:17:11.97

[Lex] Or, okay, let me just say the crude version,

link |

00:17:14.09

[Lex] which is just statistics.

link |

00:17:16.21

[Lex] It's not intelligence.

link |

00:17:18.33

[Lex] What do you have to say to that?

link |

00:17:19.61

[Lex] What do you usually say to that

link |

00:17:22.62

[Yann] I don't get into those discussions

link |

00:17:23.94

[Yann] because they are kind of pointless.

link |

00:17:26.74

[Yann] So first of all, it's quite possible

link |

00:17:28.74

[Yann] that intelligence is just statistics.

link |

00:17:30.46

[Yann] It's just statistics of a particular kind.

link |

00:17:32.70

[Yann] Yes.

link |

00:17:35.29

[Lex] Is it possible that intelligence is just statistics?

link |

00:17:40.28

[Yann] Yeah.

link |

00:17:41.57

[Yann] But what kind of statistics?

link |

00:17:43.49

[Yann] So if you're asking the question,

link |

00:17:47.17

[Yann] are the models of the world that we learn,

link |

00:17:50.65

[Yann] do they have some notion of causality?

link |

00:17:52.29

[Yann] Yes.

link |

00:17:53.37

[Yann] So if the criticism comes from people who say,

link |

00:17:56.21

[Yann] current machine learning system don't care about causality,

link |

00:17:59.21

[Yann] which by the way is wrong,

link |

00:18:2.13

[Yann] I agree with them.

link |

00:18:3.37

[Yann] You should, your model of the world

link |

00:18:5.17

[Yann] should have your actions as one of the inputs.

link |

00:18:8.89

[Yann] And that will drive you to learn causal models of the world

link |

00:18:11.21

[Yann] where you know what intervention in the world

link |

00:18:14.85

[Yann] will cause what result.

link |

00:18:16.49

[Yann] Or you can do this by observation of other agents

link |

00:18:19.21

[Yann] acting in the world and observing the effect,

link |

00:18:22.29

[Yann] other humans, for example.

link |

00:18:24.01

[Yann] So I think at some level of description,

link |

00:18:28.21

[Yann] intelligence is just statistics.

link |

00:17:52.16

[Lex] Yes.

link |

00:18:31.44

[Yann] But that doesn't mean you don't have models

link |

00:18:35.00

[Yann] that have deep mechanistic explanation for what goes on.

link |

00:18:39.88

[Yann] The question is how do you learn them?

link |

00:18:41.56

[Yann] That's the question.

link |

00:18:42.48

[Yann] I'm interested in.

link |

00:18:49.20

[Yann] say that those mechanistic model

link |

00:18:50.84

[Yann] have to come from someplace else.

link |

00:18:52.48

[Yann] They have to come from human designers,

link |

00:18:53.88

[Yann] they have to come from I don't know what.

link |

00:18:56.04

[Yann] And obviously we learn them.

link |

00:18:59.13

[Yann] Or if we don't learn them as an individual,

link |

00:19:1.65

[Yann] nature learn them for us using evolution.

link |

00:19:4.77

[Yann] So regardless of what you think,

link |

00:19:7.01

[Yann] those processes have been learned somehow.

link |

00:19:10.10

[Lex] So if you look at the human brain,

link |

00:19:12.78

[Lex] just like when we humans introspect

link |

00:19:14.50

[Lex] about how the brain works,

link |

00:19:16.18

[Lex] it seems like when we think about what is intelligence,

link |

00:19:20.10

[Lex] we think about the high level stuff

link |

00:19:22.30

[Lex] like the models we've constructed,

link |

00:19:23.78

[Lex] concepts like cognitive science,

link |

00:19:25.42

[Lex] like concepts of memory and reasoning module,

link |

00:19:28.54

[Lex] almost like these high level modules.

link |

00:19:31.50

[Lex] Does this serve as a good analogy?

link |

00:19:36.09

[Lex] Are we ignoring the dark matter,

link |

00:19:40.73

[Lex] the basic low level mechanisms,

link |

00:19:43.57

[Lex] just like we ignore the way the operating system works,

link |

00:19:45.81

[Lex] we're just using the high level software.

link |

00:19:49.65

[Lex] We're ignoring that at the low level,

link |

00:19:52.77

[Lex] the neural network might be doing something like statistics.

link |

00:19:57.97

[Lex] Sorry to use this word.

link |

00:19:59.05

[Lex] It probably.

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258