1
00:00:00,080 --> 00:00:01,360
So this is a moment of truth.

2
00:00:01,360 --> 00:00:06,200
There's going to be a lot of moment of truth moments this week, but I do hope that you still have to

3
00:00:06,240 --> 00:00:07,040
hand somewhere.

4
00:00:07,080 --> 00:00:12,920
The prop from last week, with all of your notes written down of where we got to in our price, we were

5
00:00:12,920 --> 00:00:21,280
able to get a three handle with 39.85, was our best price from our specialist model, and that that

6
00:00:21,280 --> 00:00:23,360
is what we are up against here.

7
00:00:23,680 --> 00:00:31,400
How does the strongest model on the planet equipped with a Rag knowledge base lookup, compare with

8
00:00:31,600 --> 00:00:38,680
a model that is more than a thousand times smaller and quantized down to four bit, but fine tuned.

9
00:00:38,880 --> 00:00:41,520
So this is definitely the battle of the Titans.

10
00:00:41,880 --> 00:00:43,720
This is the function.

11
00:00:43,720 --> 00:00:44,640
It's as simple as this.

12
00:00:44,640 --> 00:00:46,680
Let's first do a quick experiment.

13
00:00:46,680 --> 00:00:51,280
You remember that the price of that distortion pedal is $219.

14
00:00:51,360 --> 00:00:52,480
Let's do Ragnar.

15
00:00:52,480 --> 00:00:57,480
So when I run this one thing, it's going to it's going to first look up in chroma, find those similar

16
00:00:57,480 --> 00:01:03,550
items, and then it's going to make that call to GPT five one in the cloud to come up with an estimate,

17
00:01:03,550 --> 00:01:06,030
and it comes up with $229.

18
00:01:06,030 --> 00:01:08,230
So that one is $10 off.

19
00:01:08,510 --> 00:01:10,430
That would that's certainly very close.

20
00:01:10,430 --> 00:01:12,350
That is that is surprising.

21
00:01:12,350 --> 00:01:17,190
But what we don't know is, is whether this is going to whether this is going to be true across the

22
00:01:17,190 --> 00:01:19,230
board, whether they're going to be outliers.

23
00:01:19,270 --> 00:01:20,390
Let's find out.

24
00:01:20,430 --> 00:01:22,950
Let's run this off it goes.

25
00:01:23,110 --> 00:01:24,030
It's it's going off.

26
00:01:24,030 --> 00:01:26,510
There's already there's some greens and there's some reds.

27
00:01:26,510 --> 00:01:28,470
You can see it's moving nice and quickly.

28
00:01:28,710 --> 00:01:33,950
And I'm too this is too exciting that I'm not going to do the thing of putting you on pause and coming

29
00:01:33,950 --> 00:01:39,350
back because because we need to to live this moment together and be part of of all of this as it ticks

30
00:01:39,350 --> 00:01:39,710
up.

31
00:01:39,910 --> 00:01:45,110
Um, and, uh, yeah, I should say that even though we've set the seed here, I noticed that there

32
00:01:45,110 --> 00:01:49,390
is some a bit of randomness in here, so I don't know exactly what answer we're going to get.

33
00:01:49,390 --> 00:01:51,310
It's a bit different every time I run it.

34
00:01:51,350 --> 00:01:52,710
We are two thirds the way through.

35
00:01:52,750 --> 00:01:58,870
We're in the home stretch coming up 30 left to go, 20 left to do ten.

36
00:01:59,270 --> 00:02:04,100
The answer is about to appear, which is going to be better training versus inference.

37
00:02:04,100 --> 00:02:05,500
Here's the result visually.

38
00:02:05,900 --> 00:02:11,180
And there's the number 30.19 rag for the win.

39
00:02:11,500 --> 00:02:12,180
Wow.

40
00:02:12,620 --> 00:02:21,820
So, uh, shockingly and amazingly, uh, the frontier model plus rag has crushed even our fine tuned

41
00:02:21,820 --> 00:02:22,380
model.

42
00:02:22,380 --> 00:02:27,620
And this is a different result to this time last year, where last year the specialist fine tuned model

43
00:02:27,620 --> 00:02:29,940
was still stronger even than Rag.

44
00:02:30,060 --> 00:02:31,180
But not anymore.

45
00:02:31,220 --> 00:02:38,020
Now, the combination of having the frontier model with GPT five, one's power, with all of its worldly

46
00:02:38,060 --> 00:02:43,660
knowledge and all of its abilities, combined with the with the expert knowledge of being able to look

47
00:02:43,660 --> 00:02:50,460
up relevant content in a chroma database has allowed it to be really, really sharp.

48
00:02:50,500 --> 00:02:52,620
Be within $30.

49
00:02:52,780 --> 00:02:57,300
Uh, not within within $31 $30.19 is the error.

50
00:02:57,500 --> 00:02:59,720
This is really tremendous.

51
00:02:59,720 --> 00:03:01,160
The visual says it all.

52
00:03:01,320 --> 00:03:05,520
We have just built we have just surpassed all of our previous numbers.

53
00:03:05,520 --> 00:03:08,840
This is now the strongest model on the planet.

54
00:03:09,120 --> 00:03:15,040
Yes, I hear you talking me down the strongest model on the planet for this very specific task.

55
00:03:15,080 --> 00:03:15,920
And it's not really that.

56
00:03:15,920 --> 00:03:17,480
It's the model that's the strongest.

57
00:03:17,480 --> 00:03:23,840
It's that this workflow, the model plus the rag lookup together, has allowed us to be the strongest

58
00:03:23,840 --> 00:03:24,680
on the planet.

59
00:03:24,680 --> 00:03:28,640
So maybe I'm getting ahead of myself, but still, it's really awesome.

60
00:03:28,640 --> 00:03:35,760
It's a terrific result, but it turns out that we can do slightly better.

61
00:03:36,760 --> 00:03:37,640
You're like, what?

62
00:03:37,960 --> 00:03:38,360
Ha!

63
00:03:38,920 --> 00:03:41,160
How could we possibly do better than this?

64
00:03:41,400 --> 00:03:42,600
Only slightly better.

65
00:03:42,600 --> 00:03:45,120
But we can do slightly better.

66
00:03:45,480 --> 00:03:49,880
And it's because of this mysterious thing called an ensemble.

67
00:03:50,240 --> 00:03:57,240
If you have different models that come up with different answers, then you can often build another

68
00:03:57,240 --> 00:04:03,070
model, which is a combination of those models and have it beat all of their performance.

69
00:04:03,390 --> 00:04:05,790
Now, something about that might sound counterintuitive to you.

70
00:04:05,790 --> 00:04:11,870
You might say, hang on, how could that possibly work if all of these models are out by more than than,

71
00:04:11,910 --> 00:04:17,430
I don't know, let's say 100, then how could some combination of them get an error that's less than

72
00:04:17,430 --> 00:04:18,190
100?

73
00:04:18,350 --> 00:04:20,150
How could that possibly work?

74
00:04:20,710 --> 00:04:24,350
And that that is the magic of something called an ensemble.

75
00:04:24,510 --> 00:04:28,150
And and I mean, the data scientists amongst you for this is all old hat.

76
00:04:28,190 --> 00:04:32,310
But famously, ensembles really, really came into being with something called the Netflix prize, which

77
00:04:32,310 --> 00:04:40,190
is when Netflix famously won or offered $1 million to anyone that could could be the best, uh, beating

78
00:04:40,310 --> 00:04:44,430
the existing model they had to recommend what movie you might want to watch next.

79
00:04:44,470 --> 00:04:48,870
And the winners of the of the Netflix prize did it by building a big ensemble model.

80
00:04:48,990 --> 00:04:52,390
Um, but ensembles are things that data scientists love and build all the time.

81
00:04:52,510 --> 00:04:57,380
And in fact, the random forest model that we that we saw before is in itself an ensemble model.

82
00:04:57,660 --> 00:05:03,740
So how is it that bringing together multiple models is able to do better than any of the constituents?

83
00:05:03,780 --> 00:05:09,260
Well, one easy way to explain it is this supposing that we do have a model that is estimating prices,

84
00:05:09,260 --> 00:05:14,420
and suppose that actually the price of all of our goods are exactly $100.

85
00:05:14,460 --> 00:05:17,100
All of them $100 for every single product.

86
00:05:17,380 --> 00:05:20,620
And this model is always off by exactly $10.

87
00:05:20,660 --> 00:05:23,780
It's either $10 too much or $10 too little.

88
00:05:23,780 --> 00:05:26,660
So it always either guesses 110 or 90.

89
00:05:26,700 --> 00:05:28,700
Every single time, it's always off by ten.

90
00:05:28,940 --> 00:05:34,380
Then when we run this model, the total average error is ten because it's off by ten every single time.

91
00:05:34,380 --> 00:05:35,700
So the average is ten.

92
00:05:36,140 --> 00:05:40,340
And let's say we have another model and it has much the same properties.

93
00:05:40,340 --> 00:05:42,300
It's also always off by ten.

94
00:05:42,300 --> 00:05:46,220
And it's also always either gets it wrong by too much or too little.

95
00:05:46,380 --> 00:05:52,660
So both of these two models, model A and model B, both have an error of ten because they're both always

96
00:05:52,660 --> 00:05:54,020
guessing off by ten.

97
00:05:54,340 --> 00:05:59,410
Now let's suppose we combine these two models simply by taking the average.

98
00:05:59,410 --> 00:06:03,330
We just take this model plus this model divided by two, the average of those two models.

99
00:06:03,570 --> 00:06:06,330
What's going to be the error of the combined model?

100
00:06:06,770 --> 00:06:15,650
Well, if it's if we're most unlucky and every time model A guesses too much by $10, model B also guesses

101
00:06:15,650 --> 00:06:16,930
too much by $10.

102
00:06:16,930 --> 00:06:19,370
And every time A is down, B is down.

103
00:06:19,410 --> 00:06:20,730
Worst case.

104
00:06:21,090 --> 00:06:27,850
In that eventuality, we would always again be off by ten in the average, and so the combined error

105
00:06:27,890 --> 00:06:28,890
would be ten.

106
00:06:29,170 --> 00:06:35,370
But if there's ever a case that that they go in opposite directions, then immediately the average error

107
00:06:35,410 --> 00:06:37,290
would be less than ten.

108
00:06:37,690 --> 00:06:42,250
And that that hopefully I give you this example to give you some intuition.

109
00:06:42,250 --> 00:06:46,530
It sounds it sounds a bit counterintuitive, but but actually it makes sense.

110
00:06:46,530 --> 00:06:51,570
It's quite possible there'll be ways to combine different models that would iron out the noise between

111
00:06:51,570 --> 00:06:56,760
them, particularly if they use different methodologies, and it would allow the hole to be better than

112
00:06:56,760 --> 00:06:58,200
the sum of the parts in some way.

113
00:06:58,400 --> 00:07:01,320
And that's the sort of theory behind ensembles.

114
00:07:01,320 --> 00:07:03,960
And that that is what we're going to do right now.

115
00:07:04,080 --> 00:07:09,120
So let's start by going back to our our modal pricer our specialist.

116
00:07:09,280 --> 00:07:16,720
I got a little function here specialist that is going to to to call modal in the cloud and return a

117
00:07:17,720 --> 00:07:18,800
summary from modal.

118
00:07:19,040 --> 00:07:20,320
I've also got this little thing here.

119
00:07:20,360 --> 00:07:24,760
Get price that strips out a number when some of our models return text.

120
00:07:25,320 --> 00:07:25,880
Okay.

121
00:07:26,080 --> 00:07:31,800
Now there's another model that that that we're going to use as well, which is the neural network that

122
00:07:31,800 --> 00:07:37,360
I unveiled at the end of week six that you may have also trained yourself if you wish to.

123
00:07:37,640 --> 00:07:45,200
I've uploaded the weights to a file, deep neural networks, PyTorch, and it's up on the Google Drive

124
00:07:45,200 --> 00:07:47,120
right here that you can download those weights.

125
00:07:47,120 --> 00:07:50,400
And you should put it right here in this directory deep neural network.

126
00:07:51,160 --> 00:07:52,320
It's just over a gigabyte.

127
00:07:52,320 --> 00:07:53,800
So it might take a little bit of time to download.

128
00:07:53,800 --> 00:07:58,380
And I'm not going to put it in git because it's too big for GitHub, so you can download it and put

129
00:07:58,380 --> 00:07:58,980
it there.

130
00:07:59,180 --> 00:08:04,460
And then I've made a class deep neural network inference, which you can just load in.

131
00:08:04,500 --> 00:08:07,140
We don't actually need I don't think we need that.

132
00:08:07,340 --> 00:08:12,940
Uh, we can just load this in and then we can load in our weights.

133
00:08:13,100 --> 00:08:17,500
And that will give us another way to, uh, to to run this.

134
00:08:17,820 --> 00:08:21,220
This will give us a way to, to to run deep neural network.

135
00:08:21,220 --> 00:08:26,660
So now I've got a, a specialist model and a deep neural network model.

136
00:08:26,660 --> 00:08:29,780
And we can now have an ensemble model.

137
00:08:29,980 --> 00:08:37,260
And it's going to take the frontier rag as price one, the specialist model as price two and the deep

138
00:08:37,260 --> 00:08:39,260
neural network as price three.

139
00:08:39,700 --> 00:08:44,940
Now, what you're supposed to do in these things is now do a linear regression.

140
00:08:44,940 --> 00:08:53,700
Now run this over a ton of data points and run linear regression to say, okay, what weighted combination

141
00:08:53,700 --> 00:08:59,570
of these different models would give you the best outcome on some validation data.

142
00:08:59,850 --> 00:09:03,050
And in fact, that is what I did in this course last year.

143
00:09:03,370 --> 00:09:05,690
But it's a bit of a of a distraction.

144
00:09:05,690 --> 00:09:06,970
It takes some time to do.

145
00:09:07,010 --> 00:09:11,130
If you're interested in doing it, then I would I would very much encourage you to give that a try.

146
00:09:11,170 --> 00:09:15,810
It doesn't take long, but it's about half an hour of faffing around with a linear regression model.

147
00:09:15,810 --> 00:09:20,330
And I don't think it's important to illustrate the point because I could just quickly rough, rough

148
00:09:20,370 --> 00:09:27,570
strokes say, look, let's pick 80% of the first price, 10% of the second price, 10% of the third

149
00:09:27,570 --> 00:09:28,170
price.

150
00:09:28,170 --> 00:09:36,370
Combining these three prices, the 80% goes to our frontier model with Rag, 10% goes to our specialist

151
00:09:36,370 --> 00:09:39,210
model, 10% goes to the deep neural network.

152
00:09:39,210 --> 00:09:42,570
Let's just try combining them and see what happens.

153
00:09:42,650 --> 00:09:51,490
So what we can then do is simply run evaluate our usual test with the ensemble model with this test

154
00:09:51,570 --> 00:09:54,210
test data set and see how it performs.