1
00:00:00,080 --> 00:00:01,480
Welcome to day three.

2
00:00:01,520 --> 00:00:02,920
I have bad news for you.

3
00:00:02,920 --> 00:00:07,640
Today is one of the very rare days when we are not actually going to do a lab.

4
00:00:07,680 --> 00:00:10,960
There's no lab today, but I have something better.

5
00:00:11,880 --> 00:00:13,120
As you will see.

6
00:00:13,240 --> 00:00:15,480
But first, let's quickly do what we always do.

7
00:00:15,520 --> 00:00:17,280
Recap what we've done so far.

8
00:00:17,320 --> 00:00:19,440
You know the progression to become proficient.

9
00:00:19,600 --> 00:00:26,000
You've got your first impression of frontier models OpenAI, Gemini, llama, various models through

10
00:00:26,120 --> 00:00:29,200
llama like Deep Distilled and Llama 3.2.

11
00:00:29,680 --> 00:00:36,400
And you've built something that can summarize web pages using OpenAI API and hopefully your llama as

12
00:00:36,400 --> 00:00:36,680
well.

13
00:00:36,960 --> 00:00:41,280
What you're going to be able to do by the end of today is compare a bunch of frontier models.

14
00:00:41,280 --> 00:00:44,480
You'll know the difference between chat models and reasoning models.

15
00:00:44,480 --> 00:00:47,480
You'll appreciate what they do well and where they struggle.

16
00:00:47,480 --> 00:00:48,600
Let's get to it.

17
00:00:48,640 --> 00:00:55,000
So there are three different breeds of llms that that reflect what they've been trained to do, the

18
00:00:55,000 --> 00:00:57,040
tasks that they've set out to accomplish.

19
00:00:57,240 --> 00:01:05,530
And the starting point is known as a base model, is an LM that's just there to take a sequence of information

20
00:01:05,530 --> 00:01:09,330
as the input and to predict what would come next.

21
00:01:09,530 --> 00:01:11,090
That's all it does.

22
00:01:11,250 --> 00:01:13,730
And these base models, you don't come across them very often.

23
00:01:13,730 --> 00:01:16,770
We will do later in the course when we get to work with our own.

24
00:01:16,890 --> 00:01:21,250
Um, but they are, they are before it's been taught how to do things like chat with someone.

25
00:01:21,250 --> 00:01:23,290
It's just about completing the sequence.

26
00:01:23,610 --> 00:01:28,570
But you have a base model yourself and you use it probably quite often and it's in your pocket.

27
00:01:28,610 --> 00:01:33,810
If you if you bring up your phone and you go to to write a text message and you use predictive text,

28
00:01:33,810 --> 00:01:37,850
if you say sort of hello there, and then you see what it prompts, you would come next and you repeatedly

29
00:01:37,850 --> 00:01:40,730
press that button that is like you're using a base model.

30
00:01:40,730 --> 00:01:46,810
It's just completing the sequence, giving you the most likely output that could come next after a particular

31
00:01:46,810 --> 00:01:47,410
input.

32
00:01:47,410 --> 00:01:48,130
Repeatedly.

33
00:01:48,130 --> 00:01:52,690
And every time you select a word that goes into the sequence, and then it's predicting the next thing

34
00:01:52,730 --> 00:01:55,890
to come after this sequence, that is a base model.

35
00:01:55,890 --> 00:01:59,860
And before ChatGPT came out in 2022.

36
00:02:00,100 --> 00:02:04,300
Before that was was the earlier versions of the models, like GPT three.

37
00:02:04,540 --> 00:02:10,260
That was just a base model and people that were using it like myself at the time, we were very familiar

38
00:02:10,260 --> 00:02:14,060
with, like, you put in some text and then it would start predicting what would come next, and there

39
00:02:14,060 --> 00:02:17,380
were various ways that you could force it to try and answer questions.

40
00:02:17,380 --> 00:02:19,900
You'd say, like question in the prompt.

41
00:02:19,900 --> 00:02:26,140
You'd give it like Q and A question and then a colon and the answer Q colon, another question, a colon,

42
00:02:26,180 --> 00:02:26,820
an answer.

43
00:02:26,820 --> 00:02:30,540
And then you'd put q colon and the actual question you wanted to ask.

44
00:02:30,540 --> 00:02:35,900
And then you'd put the letter A, and that would prompt it to be thinking more like it's in question

45
00:02:35,900 --> 00:02:38,180
answer mode and to give you an answer.

46
00:02:38,460 --> 00:02:41,420
And and that was the way a lot of people used these base models.

47
00:02:41,420 --> 00:02:48,580
And OpenAI had a brainwave and thought, hang on, we could train our models a bit more with data structured

48
00:02:48,580 --> 00:02:55,540
in this way with kind of one message response, one message response, and and doing that became known

49
00:02:55,540 --> 00:03:03,070
as making a chat variant or also known as an Instruct variant, a chat model, or an Instruct model.

50
00:03:03,070 --> 00:03:07,310
It's a model that's been trained to work in this kind of prompt style.

51
00:03:07,470 --> 00:03:11,990
And they came up with this idea that there should be one overall piece of information that describes

52
00:03:11,990 --> 00:03:17,550
the whole chat, and then there should be like a message from, from, from one user to the AI and its

53
00:03:17,550 --> 00:03:22,950
reply and then another message and that, that first that overall construct became known as the system

54
00:03:22,950 --> 00:03:23,550
prompt.

55
00:03:23,870 --> 00:03:25,830
Uh, you knew I was going to say that.

56
00:03:25,870 --> 00:03:28,630
And the the next message was called the user prompt.

57
00:03:28,630 --> 00:03:31,710
And then the assistant reply and user prompt assistant reply.

58
00:03:31,990 --> 00:03:33,110
And that's training.

59
00:03:33,110 --> 00:03:36,830
A model in this way became known as making a chat variant.

60
00:03:37,030 --> 00:03:42,790
There was a particular approach they used called reinforcement learning from human feedback RL, and

61
00:03:42,790 --> 00:03:47,430
it was that that got from normal GPT to ChatGPT.

62
00:03:47,470 --> 00:03:48,870
It was the chat variant.

63
00:03:48,870 --> 00:03:50,710
And so then chat models were all the rage.

64
00:03:50,710 --> 00:03:53,030
And we will be chatting with with ChatGPT.

65
00:03:53,310 --> 00:03:57,880
And quite quickly people noticed that there were some tricks, some prompt engineering tricks to get

66
00:03:57,880 --> 00:03:59,160
more out of the model.

67
00:03:59,160 --> 00:04:02,200
And one of them, which became known as chain of thought.

68
00:04:02,200 --> 00:04:08,040
Prompting was just as simple as if you ask the model to do something, you add as the last sentence

69
00:04:08,240 --> 00:04:10,480
uh, please think step by step.

70
00:04:10,680 --> 00:04:16,920
And just by by virtue of saying that you'd get something that would apparently do better, it would

71
00:04:16,920 --> 00:04:23,000
go through things methodically, and the sequence that it would predict would end up being more likely

72
00:04:23,000 --> 00:04:27,160
to solve the problem just because you told it to think step by step.

73
00:04:27,160 --> 00:04:29,880
Which seems super hokey, but it kind of works.

74
00:04:29,880 --> 00:04:32,240
And so again, that gave people a brainwave.

75
00:04:32,280 --> 00:04:38,120
Maybe we could train the model with lots of examples that show it thinking step by step, and then show

76
00:04:38,120 --> 00:04:39,640
it getting to a conclusion.

77
00:04:39,640 --> 00:04:42,080
And so doing that training.

78
00:04:42,280 --> 00:04:46,560
Training a model so that it would a chat model would then would then think through what it was doing

79
00:04:46,560 --> 00:04:47,400
before it did.

80
00:04:47,400 --> 00:04:52,120
It became known as making a reasoning model or making a thinking model.

81
00:04:52,120 --> 00:04:54,930
And that led to these reasoning models.

82
00:04:54,930 --> 00:05:02,450
They are models that have been trained to first output their thinking steps and then give the answer.

83
00:05:02,450 --> 00:05:08,210
And we saw that I think in the very beginning, in the first day, the instant gratification part,

84
00:05:08,210 --> 00:05:14,770
when when we saw OpenAI's OSS model thinking things through before it gave its reply, and that that

85
00:05:14,770 --> 00:05:16,690
is what a reasoning model is.

86
00:05:16,690 --> 00:05:22,370
It does the thought process first and now the the some of the most modern models are what's known as

87
00:05:22,410 --> 00:05:23,290
hybrid models.

88
00:05:23,290 --> 00:05:24,330
But that's not really.

89
00:05:24,370 --> 00:05:29,890
It's just really a variation on the reasoning thinking model that's that's able to decide how much thinking

90
00:05:29,890 --> 00:05:30,730
it does.

91
00:05:30,890 --> 00:05:34,570
And in some modes it's much more similar to just being a chat model.

92
00:05:34,570 --> 00:05:36,850
It hardly does any reasoning at all.

93
00:05:36,850 --> 00:05:42,810
And in others it will go ahead and reason quite a lot, and it decides how much to reason based on the

94
00:05:42,810 --> 00:05:49,090
question you ask how how much of a puzzle it is, or if it's just in a chat mode, kind of just just

95
00:05:49,090 --> 00:05:52,740
a simple hi there, then it won't bother reasoning the answer to that.

96
00:05:52,780 --> 00:05:57,380
And so this kind of model that's able to decide how much thinking to do is what's known as a hybrid

97
00:05:57,380 --> 00:05:57,860
model.

98
00:05:57,860 --> 00:06:00,060
And Gemini Pro 25 is a hybrid model.

99
00:06:00,060 --> 00:06:02,740
So it's got four and so is GPT five.

100
00:06:02,780 --> 00:06:06,860
These latest models are all examples of hybrid models.

101
00:06:07,100 --> 00:06:11,940
Um, and and then the latest version of the open source model, when they have both a hybrid model and

102
00:06:11,940 --> 00:06:17,260
they have model which, which is just chat and reasoning in case you want to select that, you just

103
00:06:17,260 --> 00:06:19,260
want one in a chat mode.

104
00:06:19,420 --> 00:06:25,340
And the amount of reasoning that a reasoning model does is sometimes called its reasoning budget or

105
00:06:25,340 --> 00:06:26,980
its reasoning effort.

106
00:06:26,980 --> 00:06:30,140
And there's a technique called budget forcing.

107
00:06:30,260 --> 00:06:35,500
When you make a model a reasoning model, think longer, you make it do more reasoning.

108
00:06:35,500 --> 00:06:38,180
And there are various tricks to achieving this.

109
00:06:38,380 --> 00:06:43,500
And they are quite remarkably, many of them are remarkably hacky and hokey.

110
00:06:43,700 --> 00:06:48,380
And it's I think when I, when I explain this to you, you'll be like, really?

111
00:06:48,620 --> 00:06:53,030
Uh, but it turns out that there's a great paper called S1 that you can Google.

112
00:06:53,030 --> 00:06:55,310
I'll put, put put a link in the resources.

113
00:06:55,470 --> 00:07:00,390
Um, S1 explains that it was a discovery from January of 2025.

114
00:07:00,630 --> 00:07:05,470
Uh, that what you could do, uh, when a model comes up with its thinking.

115
00:07:05,510 --> 00:07:12,270
Trace, if you wanted to do some more thinking, not not just to get to its conclusion, but to to to

116
00:07:12,310 --> 00:07:18,510
go back and do more thinking, there was this very complicated mathematical trick that involved some

117
00:07:18,510 --> 00:07:20,390
really complex calculus.

118
00:07:20,950 --> 00:07:21,910
No it didn't.

119
00:07:22,470 --> 00:07:27,150
It involved just add the word weight in to the thinking trace.

120
00:07:27,150 --> 00:07:30,830
So it's come up with something it said like like I should I should consider blah, blah, blah, blah

121
00:07:30,830 --> 00:07:31,110
blah.

122
00:07:31,310 --> 00:07:34,950
And then you just insert in this sequence the word weight.

123
00:07:35,430 --> 00:07:41,390
And by virtue of doing that, it kind of continues that sequence as the most likely words to come after

124
00:07:41,390 --> 00:07:41,910
weight.

125
00:07:41,910 --> 00:07:44,950
And it says like, wait, I should rethink this, am I sure?

126
00:07:44,990 --> 00:07:46,270
Let me reconsider.

127
00:07:46,630 --> 00:07:54,480
And So they discovered these scientists that just simply by adding the word weight periodically into

128
00:07:54,480 --> 00:08:00,880
the sequence as you're making it predict the next tokens, causes it to reflect on its reasoning and

129
00:08:00,880 --> 00:08:07,320
reason a bit deeper, and challenge itself and weigh up the arguments that it's made and consider whether

130
00:08:07,320 --> 00:08:08,320
they're still accurate.

131
00:08:09,040 --> 00:08:12,920
And so just just adding the word weight got better outcomes.

132
00:08:12,920 --> 00:08:15,160
And that technique is known as budget forcing.

133
00:08:15,160 --> 00:08:17,360
And there are sort of various other ways of doing it.

134
00:08:17,360 --> 00:08:19,240
But that's that's the best known one.

135
00:08:19,440 --> 00:08:23,840
And the first time you hear that, you're like, surely it's got to be something more sophisticated

136
00:08:23,840 --> 00:08:24,520
than that.

137
00:08:24,560 --> 00:08:26,400
Well it's not read the paper and you'll see.

138
00:08:26,440 --> 00:08:33,160
And so, generally speaking, reasoning models with a high reasoning budget perform better in all of

139
00:08:33,160 --> 00:08:36,240
the different benchmarks really almost across the board.

140
00:08:36,360 --> 00:08:40,400
And we'll be looking at benchmarks and leaderboards and everything else in week four.

141
00:08:40,400 --> 00:08:41,840
So there'll be plenty of time to do that.

142
00:08:41,840 --> 00:08:44,680
But generally they do very well these these reasoning models.

143
00:08:44,680 --> 00:08:48,570
And so you might be saying to yourself, why don't we always use a reasoning model?

144
00:08:48,570 --> 00:08:49,970
Why even have the hybrids?

145
00:08:49,970 --> 00:08:51,530
Why have chat at all?

146
00:08:51,730 --> 00:08:53,210
And there are some some reasons.

147
00:08:53,210 --> 00:08:56,970
It's not always better reasoning models are better for problem solving.

148
00:08:57,170 --> 00:08:59,330
Uh, particularly if you don't mind wasting a bit.

149
00:08:59,370 --> 00:09:04,850
They will always perform better at puzzles, for sure, and they're just just generally more intelligent,

150
00:09:04,890 --> 00:09:10,730
like they score higher on any of the kind of intelligence related tests, chat models.

151
00:09:10,730 --> 00:09:15,010
They are, of course faster because they don't need to produce all of this reasoning a trace.

152
00:09:15,330 --> 00:09:20,690
So they're better for interactive use cases if you want to have chat, and the chat models are better,

153
00:09:20,690 --> 00:09:21,330
shockingly.

154
00:09:21,490 --> 00:09:23,690
Uh, so they are they are better in that use case.

155
00:09:23,690 --> 00:09:27,810
They are obviously faster and cheaper because they don't need to produce all of these, these, these,

156
00:09:27,850 --> 00:09:30,970
these, uh, these bits in the middle, the thinking tokens.

157
00:09:31,010 --> 00:09:33,210
We'll talk more about the costs and so on later.

158
00:09:33,370 --> 00:09:35,690
Um, but so they are they are better at that.

159
00:09:35,850 --> 00:09:41,370
And, uh, this is I put a question mark here because people aren't sure about this, but it does seem

160
00:09:41,370 --> 00:09:47,540
that chat models are often better at more creative kind of just just content generation.

161
00:09:47,540 --> 00:09:52,420
If you want something to write an email for you, uh, sometimes you find that the reasoning models

162
00:09:52,460 --> 00:09:53,780
kind of overthink.

163
00:09:53,780 --> 00:10:01,020
Perhaps what they come up with is quite cold and and analytical, whereas chat models tend to produce

164
00:10:01,060 --> 00:10:04,100
more, um, fluid content.

165
00:10:04,460 --> 00:10:05,820
But this sounds hand-wavy.

166
00:10:05,820 --> 00:10:08,140
There's not any there's not really good metrics on this.

167
00:10:08,180 --> 00:10:11,660
It's more of an anecdotal way that that people think about it.

168
00:10:11,660 --> 00:10:14,380
You should try it yourself and see if you agree with that.

169
00:10:14,780 --> 00:10:21,980
And a base model is better in a very specific case of when you are trying to train a model to do something

170
00:10:21,980 --> 00:10:26,940
different, to give it a new skill, which is something we'll be doing later in the course.

171
00:10:26,940 --> 00:10:30,180
And when you're doing that, it's better to start with a base model.

172
00:10:30,180 --> 00:10:34,460
You don't necessarily want to be in a chat construct or in a reasoning construct.

173
00:10:34,460 --> 00:10:37,700
You want the opportunity to train it to have some different construct.

174
00:10:37,700 --> 00:10:40,940
And in that case, better to start with a base.

175
00:10:40,940 --> 00:10:43,980
So that's when you use each of those three flavors of model.