1
00:00:00,120 --> 00:00:01,960
Okay, a fine.

2
00:00:01,960 --> 00:00:03,400
Welcome to week eight.

3
00:00:03,440 --> 00:00:04,520
Day three.

4
00:00:05,120 --> 00:00:10,360
Today is about leveling up yet again with a new skill.

5
00:00:10,600 --> 00:00:12,280
Structured outputs.

6
00:00:12,280 --> 00:00:18,960
Such a such an important skill in the way that that LMS work and the way that particularly with with

7
00:00:19,000 --> 00:00:24,760
Agentic workflows already, of course you can use frontier and open source models, you can use tools

8
00:00:24,760 --> 00:00:29,560
and rag, you can fine tune both frontier models and open source models.

9
00:00:29,560 --> 00:00:33,960
Importantly, but also you can now deploy a fine tuned model with modal.

10
00:00:34,240 --> 00:00:42,000
You can use Rag with frontier models with a big setup with 800,000, uh, similar items and and add

11
00:00:42,040 --> 00:00:45,920
together an ensemble model that combines the best of multiple models.

12
00:00:46,080 --> 00:00:46,520
Okay.

13
00:00:46,560 --> 00:00:50,560
Today, today it's going to be about structured outputs.

14
00:00:50,800 --> 00:00:55,720
And and importantly, one of the applications of this, there are many, many, many applications of

15
00:00:55,720 --> 00:00:56,360
structured outputs.

16
00:00:56,360 --> 00:01:03,950
One of them that we will do today is using it as a way to parse unstructured data and turn unstructured

17
00:01:03,950 --> 00:01:08,510
data into a structured form, which is a business task, a commercial task that you come up against

18
00:01:08,510 --> 00:01:09,430
all the time.

19
00:01:09,470 --> 00:01:11,790
And llms are just great at it.

20
00:01:12,150 --> 00:01:13,070
Let's get into it.

21
00:01:13,110 --> 00:01:15,630
So structured outputs then.

22
00:01:15,750 --> 00:01:20,990
So there's there's a way that that it feels like structured outputs work.

23
00:01:20,990 --> 00:01:22,510
And then there's what they actually are.

24
00:01:22,550 --> 00:01:23,790
It's a bit like tool calling.

25
00:01:23,790 --> 00:01:29,150
When tool calling feels like the LLM is able to suddenly make calls to tools, but in practice it's

26
00:01:29,150 --> 00:01:31,150
something a little bit more pedestrian.

27
00:01:31,230 --> 00:01:38,910
So the way it feels with structured outputs is that you can define a Python class, and you have to

28
00:01:38,910 --> 00:01:45,750
make it like a subclass of this, this thing called base model, which is a pedantic object which,

29
00:01:45,750 --> 00:01:48,110
which we've touched on a couple of times.

30
00:01:48,110 --> 00:01:48,510
So.

31
00:01:48,510 --> 00:01:52,550
So you make a Python class, it's a subclass of something called base model.

32
00:01:52,550 --> 00:01:58,710
And when you're doing that, it means that you define each of the attributes of this class fairly carefully.

33
00:01:59,030 --> 00:02:00,910
So so you define a class that way.

34
00:02:01,350 --> 00:02:08,670
And you can tell OpenAI OpenAI invented this and now most of the other providers support it as well,

35
00:02:08,670 --> 00:02:09,390
but not all.

36
00:02:09,590 --> 00:02:15,950
But you can tell a provider, hey, when you when you do, do what I'm about to ask you to do in my

37
00:02:15,950 --> 00:02:19,830
prompting, when you respond, don't respond with natural language.

38
00:02:19,830 --> 00:02:25,910
Respond with an object, a populated object, a Python object with a fields filled in.

39
00:02:26,310 --> 00:02:27,470
That's weird right?

40
00:02:27,550 --> 00:02:29,390
It's like it's a natural language model.

41
00:02:29,430 --> 00:02:35,550
How is it suddenly able to not predict tokens, but instead craft an object, a Python object, and

42
00:02:35,550 --> 00:02:36,750
that's what you get back?

43
00:02:36,830 --> 00:02:38,550
Well, that's the magic of structured outputs.

44
00:02:38,550 --> 00:02:39,950
And that's how it feels.

45
00:02:39,950 --> 00:02:41,110
And that's that's really cool.

46
00:02:41,110 --> 00:02:42,430
And it allows you to do so much.

47
00:02:42,470 --> 00:02:49,950
It allows you to put structure around these, these somewhat very flexible calls to llms.

48
00:02:49,990 --> 00:02:52,550
It allows you to constrain what you get back.

49
00:02:52,550 --> 00:02:55,070
It allows you to then take actions based on it.

50
00:02:55,070 --> 00:03:01,910
So it allows you to to put so much, uh, organization and structure around how you orchestrate between

51
00:03:01,910 --> 00:03:02,950
multiple llms.

52
00:03:03,270 --> 00:03:06,710
Uh, but, um, yeah, this this is how it feels.

53
00:03:06,830 --> 00:03:10,540
But the reality, as is often the case, is a little bit more pedestrian.

54
00:03:10,540 --> 00:03:11,780
So here's the reality.

55
00:03:11,780 --> 00:03:18,580
The reason why you make a subclass of a pedantic base model is that what pedantic is all about is defining

56
00:03:18,620 --> 00:03:20,540
a JSON schema.

57
00:03:20,940 --> 00:03:26,740
If you if you have something that's a that's a subclass of base model, you can convert it to JSON and

58
00:03:26,740 --> 00:03:29,380
you can convert the JSON back to the object again.

59
00:03:29,380 --> 00:03:37,300
So the pedantic class that you set up is used to generate a JSON schema, something which says that

60
00:03:37,300 --> 00:03:41,140
the JSON must conform to the following spec and it lays it out.

61
00:03:41,140 --> 00:03:42,780
It needs to have these fields.

62
00:03:42,780 --> 00:03:47,340
And you put like a natural language description by your fields, and that gets included in the JSON

63
00:03:47,340 --> 00:03:47,980
schema.

64
00:03:48,460 --> 00:03:54,620
And OpenAI just shoves something extra in the system prompt that says, hey, when you're generating

65
00:03:54,620 --> 00:04:00,340
your response, you have to respond in JSON and it needs to conform to this schema, please.

66
00:04:00,380 --> 00:04:02,540
And so the model just generates tokens.

67
00:04:02,540 --> 00:04:03,980
All that generates tokens.

68
00:04:03,980 --> 00:04:05,820
And the tokens it generates are JSON.

69
00:04:05,820 --> 00:04:09,380
And models love generating JSON because they're trained on tons of it.

70
00:04:09,380 --> 00:04:12,500
And it conforms to the schema because it's in the system prompt.

71
00:04:12,500 --> 00:04:20,140
And then the OpenAI Python client library then goes and takes that JSON and converts it to an instance

72
00:04:20,140 --> 00:04:24,380
of the class that you want populated with those JSON fields.

73
00:04:24,500 --> 00:04:28,860
And lo and behold, you get back, apparently a Python object.

74
00:04:29,260 --> 00:04:31,100
That's that's how it actually works.

75
00:04:31,100 --> 00:04:36,260
It's not particularly magical, but it gives you this feeling that you can talk Python objects with

76
00:04:36,260 --> 00:04:39,060
models instead of just talking natural language.

77
00:04:39,100 --> 00:04:44,260
Except there is one thing about it that actually is quite magical.

78
00:04:44,420 --> 00:04:49,420
There's one thing about it that is like an invention that is like, wow, that's clever.

79
00:04:49,620 --> 00:04:53,900
One thing about structured outputs that you, even if you knew about everything so far, you may not

80
00:04:53,900 --> 00:04:55,860
have known this, this little, little feature.

81
00:04:55,900 --> 00:04:59,540
At least this is how it's implemented with OpenAI and with some other providers.

82
00:04:59,540 --> 00:05:01,460
And it's really, really clever.

83
00:05:01,460 --> 00:05:05,740
And to tell you about it, I need to remind you of the way that inference works.

84
00:05:05,860 --> 00:05:11,660
Remember, when a model generates the next token, it doesn't literally generate a token.

85
00:05:11,660 --> 00:05:13,740
That's not what comes out of the neural network.

86
00:05:13,740 --> 00:05:20,130
What comes out after you apply the softmax function thing is a set of probabilities, all the probabilities

87
00:05:20,130 --> 00:05:22,330
of all the possible next tokens.

88
00:05:22,330 --> 00:05:29,730
And then there is some code which says okay, I'll pick the most likely or okay, I will sample from

89
00:05:29,730 --> 00:05:31,970
these according to their probability distributions.

90
00:05:32,570 --> 00:05:34,250
Well there is a trick.

91
00:05:34,410 --> 00:05:39,610
And there's a trick that's called inference time constrained decoding.

92
00:05:39,650 --> 00:05:41,450
That's the technical name for it.

93
00:05:41,450 --> 00:05:44,290
And even from those words you probably already know what I'm going to say.

94
00:05:44,770 --> 00:05:50,970
Basically when when the model generates this probability distribution, OpenAI have written some Python

95
00:05:50,970 --> 00:05:53,170
code that says, hang on a second.

96
00:05:53,450 --> 00:05:55,370
This model needs to conform.

97
00:05:55,370 --> 00:05:59,530
This needs to generate something which conforms to a particular JSON spec.

98
00:05:59,730 --> 00:06:05,250
Let me look through all the possible next tokens and see if there are any tokens here that would break

99
00:06:05,290 --> 00:06:06,530
the spec a token.

100
00:06:06,530 --> 00:06:12,970
That would mean that what comes out wouldn't conform to the spec, and I would simply zero out the probability

101
00:06:12,970 --> 00:06:17,730
so that the model, whatever happens, it's not going to generate a token that will break the JSON spec.

102
00:06:17,770 --> 00:06:19,930
It's only going to be able to generate.

103
00:06:19,930 --> 00:06:25,930
I'm only going to keep probabilities for things which would stay in in line with the spec.

104
00:06:26,330 --> 00:06:27,970
And then and then we'll generate the.

105
00:06:28,170 --> 00:06:31,010
Then we'll select our next token and that trick.

106
00:06:31,050 --> 00:06:31,850
And isn't that clever.

107
00:06:31,850 --> 00:06:32,770
It's so simple.

108
00:06:32,770 --> 00:06:35,730
Once you've heard it you're like, I would have thought of that, but you didn't think of it.

109
00:06:36,130 --> 00:06:39,130
But once you heard it, it's like, of course, of course.

110
00:06:39,130 --> 00:06:46,170
And that means that actually the model is constrained, that it has to generate text that conforms to

111
00:06:46,210 --> 00:06:46,810
the spec.

112
00:06:46,930 --> 00:06:50,570
The worst case is that it doesn't generate it never generates a conclusion.

113
00:06:50,570 --> 00:06:51,890
It sort of keeps going or something.

114
00:06:51,890 --> 00:06:54,130
So there's still some failure scenarios.

115
00:06:54,130 --> 00:07:01,290
But but aside from that, when you're using OpenAI, it will always generate an output that is consistent

116
00:07:01,290 --> 00:07:03,650
with the Pydantic JSON schema.

117
00:07:03,650 --> 00:07:10,170
And so it is a way to guarantee that the response from the model will populate your Python object.

118
00:07:10,170 --> 00:07:11,610
And that's really clever.

119
00:07:11,650 --> 00:07:11,930
Okay.

120
00:07:11,970 --> 00:07:13,690
And with that we're going to go straight to the lab.

121
00:07:13,690 --> 00:07:16,610
We're going to get back to to coding.

122
00:07:16,610 --> 00:07:20,770
And we're going to focus on the scanner agent I will see you in cursor.