1
00:00:00,760 --> 00:00:01,360
Okay.

2
00:00:01,400 --> 00:00:07,040
And so what we have now is the worker node that is defined right here.

3
00:00:07,040 --> 00:00:08,280
And it's quite long.

4
00:00:08,600 --> 00:00:11,720
Um, and uh, it doesn't even fit on one screen.

5
00:00:12,240 --> 00:00:16,480
I think I expanded the font size a bit so that, uh, but that does have that drawback.

6
00:00:16,480 --> 00:00:20,360
Maybe I will make it a little bit smaller for a second, and we'll make it bigger again in a minute.

7
00:00:20,560 --> 00:00:24,000
Uh, go down a little bit so you can at least see it all in one screen.

8
00:00:24,280 --> 00:00:28,640
Uh, so this here is the is the, uh, is worker.

9
00:00:29,000 --> 00:00:35,600
So it's got a pretty meaty system message, which I have built up based on my experiments.

10
00:00:35,600 --> 00:00:37,720
And you will need to keep doing so too.

11
00:00:38,080 --> 00:00:41,400
I added in here the current date and time.

12
00:00:41,520 --> 00:00:46,960
I actually, for a bit made another tool to to come up with the current date and time, but I realized

13
00:00:46,960 --> 00:00:51,920
that then I had to come in here and prompt it to be sure to use that tool.

14
00:00:51,920 --> 00:00:53,880
And then I thought to myself, okay, this is silly.

15
00:00:54,400 --> 00:00:59,400
I always want it to know the current date and time I put that in as a tool, and then I'm telling it

16
00:00:59,400 --> 00:01:01,920
in the prompt to be sure to use that tool.

17
00:01:01,920 --> 00:01:03,360
And that's just such a waste.

18
00:01:03,360 --> 00:01:06,390
This is an example of something that doesn't need to be a tool.

19
00:01:06,390 --> 00:01:09,030
It needs to be inserted in the prompt every time.

20
00:01:09,030 --> 00:01:12,990
It is what we'll call a resource when we when we get to MCP time.

21
00:01:12,990 --> 00:01:16,710
But it's like it's like a piece of information that we want to add in there.

22
00:01:16,710 --> 00:01:20,790
So I just shove in the current date and time in the prompt.

23
00:01:21,150 --> 00:01:28,750
I also, I had an interesting thing happen when I was using the Python tool that I saw that for some

24
00:01:28,750 --> 00:01:33,990
reason, GPT four and mini misunderstood how the tool worked and thought that the code it would put

25
00:01:33,990 --> 00:01:39,670
in there, whatever that evaluates to, would be what it would receive in the response from the tool,

26
00:01:39,670 --> 00:01:44,390
where in fact, you had to put in a print statement in that code if you want to actually get back some

27
00:01:44,390 --> 00:01:45,030
text.

28
00:01:45,190 --> 00:01:46,310
And it didn't do that.

29
00:01:46,510 --> 00:01:51,070
Uh, and so as a result, it was going backwards and forwards, trying to rerun again and again and

30
00:01:51,070 --> 00:01:53,470
not getting any results and seeming confused.

31
00:01:53,790 --> 00:01:59,030
Uh, and uh, so I added in this line, you have a tool to run Python code, but note that you need

32
00:01:59,030 --> 00:02:02,230
to include a print statement if you want to receive output.

33
00:02:02,270 --> 00:02:04,190
And then it started to work immediately.

34
00:02:04,190 --> 00:02:07,870
So maybe that's just a GPT four a mini anomaly.

35
00:02:07,910 --> 00:02:10,220
Maybe if I use the big GPT four, it would be fine.

36
00:02:10,220 --> 00:02:12,100
It would have figured it out or it would just know.

37
00:02:13,060 --> 00:02:17,580
But or maybe the way that the tool, the description in the tool isn't clear enough.

38
00:02:17,580 --> 00:02:21,580
And I could always wrap it in a, in a, in another tool that would make that clearer.

39
00:02:21,820 --> 00:02:25,620
Um, but uh, but regardless, this, this fixed it.

40
00:02:25,620 --> 00:02:30,460
And this is such a good example of the experimental nature of this kind of work that you need to be

41
00:02:30,500 --> 00:02:32,820
able to come in and just shove something like that in the prompt.

42
00:02:32,860 --> 00:02:36,620
Maybe this won't be needed for what you do, but it was needed for me.

43
00:02:37,140 --> 00:02:41,660
Um, and there was another example of, of where I came across something similar.

44
00:02:41,900 --> 00:02:47,460
Um, and, uh, yeah, you'll probably find it yourself if you if you look through this, you'll see

45
00:02:47,500 --> 00:02:52,460
other times when I had to tweak things, uh, to, to handle some cases.

46
00:02:52,820 --> 00:03:00,260
Um, so anyway, other than that, this is all identical to what we have in Jupyter in the, uh, notebook.

47
00:03:00,500 --> 00:03:02,300
Let me expand here.

48
00:03:03,340 --> 00:03:04,980
Um, okay.

49
00:03:05,020 --> 00:03:11,580
And now this is the router, the worker router, the decision, the condition on whether or not the

50
00:03:11,620 --> 00:03:14,320
worker should go to tools or to evaluate it.

51
00:03:14,680 --> 00:03:20,720
This is the utility method that converts our messages into a nice user assistant.

52
00:03:20,720 --> 00:03:21,680
User assistant.

53
00:03:22,160 --> 00:03:24,400
And finally not not not finally.

54
00:03:24,400 --> 00:03:29,480
But the big the big guy is the other node the evaluator.

55
00:03:29,720 --> 00:03:35,680
And this has again, a lot of pretty substantive prompting that I have tweaked over time.

56
00:03:35,920 --> 00:03:38,880
That describes you're an evaluator, what you're there to do.

57
00:03:39,240 --> 00:03:44,480
Uh, it's got the formatted conversation, the success criteria, the last response.

58
00:03:44,640 --> 00:03:47,520
And then, uh, responding with your feedback.

59
00:03:47,880 --> 00:03:55,600
Uh, and, um, I remember now, I added in this, I noticed that the evaluator was quite harsh and

60
00:03:55,600 --> 00:04:01,480
never seemed to sort of trust that what the, um, assistant said it had done, what the worker said

61
00:04:01,480 --> 00:04:02,240
it did.

62
00:04:02,240 --> 00:04:05,560
The evaluator always said, you know, I don't know if this actually happened.

63
00:04:05,680 --> 00:04:09,760
So I put I put in here the assistant has access to a tool to write files.

64
00:04:09,760 --> 00:04:13,640
If the assistant says they've written a file, then you can assume that they've done so.

65
00:04:13,920 --> 00:04:18,350
Overall, you should give the assistant the benefit of the doubt if they say they've done something,

66
00:04:18,350 --> 00:04:20,830
but you should reject if you feel that more work is needed.

67
00:04:20,830 --> 00:04:22,390
So you know, again.

68
00:04:22,430 --> 00:04:26,270
And maybe this time I've gone too far and I'm going to make it accept too many times.

69
00:04:26,270 --> 00:04:32,870
It's something that requires constant tweaking and refinement as you find cases that are that are that

70
00:04:32,870 --> 00:04:33,870
work or don't work.

71
00:04:33,910 --> 00:04:37,830
And of course, it's always good to add an example to give real concrete examples.

72
00:04:38,110 --> 00:04:43,110
And there's some trade off, because the more information you put in here, uh, the, the harder it

73
00:04:43,110 --> 00:04:48,750
is for Gpt4 to be coherent because it's just got a lot more information to absorb.

74
00:04:48,950 --> 00:04:55,150
Uh, but, um, yeah, it's definitely it's definitely something that I found that giving these kinds

75
00:04:55,150 --> 00:04:58,550
of hints and examples has helped me get better outcomes.

76
00:04:58,550 --> 00:05:00,030
And you will need to experiment.

77
00:05:00,830 --> 00:05:01,350
Okay.

78
00:05:01,390 --> 00:05:07,150
So that is the evaluator we've then got at the end of it, I'll just mention again, remember at the

79
00:05:07,150 --> 00:05:11,510
end of the evaluator, we, we we evoke the LLM with output.

80
00:05:11,550 --> 00:05:16,590
And because it's, it's one that has a structured outputs which is what that with output means, uh,

81
00:05:16,630 --> 00:05:23,340
it returns back an object, an eval result object populated, and then we pluck out the fields of that

82
00:05:23,340 --> 00:05:30,700
object, and we populate them in our new state, and we return the new state as all nodes take an old

83
00:05:30,700 --> 00:05:36,380
state, return a new state, and then this route based on evaluation, this is again another of these

84
00:05:36,380 --> 00:05:37,580
condition branches.

85
00:05:37,740 --> 00:05:44,500
We take, uh, we see whether either the success criteria is met or user input is needed.

86
00:05:44,500 --> 00:05:49,860
In either of those situations we need to end, but otherwise we're going to bounce back to the worker

87
00:05:49,900 --> 00:05:51,340
to give it another shot.

88
00:05:52,060 --> 00:05:52,420
Okay.

89
00:05:52,460 --> 00:05:53,820
And then here is the build graph.

90
00:05:53,820 --> 00:05:59,620
And this after I made such a song and dance about this in the first couple of, of uh, of days of this

91
00:05:59,620 --> 00:06:00,140
week.

92
00:06:00,140 --> 00:06:02,700
Now this is like the easy part of the whole thing.

93
00:06:02,940 --> 00:06:09,340
We create our graph builder for, for the state of the class that we have created.

94
00:06:09,580 --> 00:06:14,740
And then we add our worker, we add our tools, we add our evaluator, the three nodes.

95
00:06:15,020 --> 00:06:17,380
We add our our edges.

96
00:06:17,820 --> 00:06:19,980
Uh, um, conditional edge.

97
00:06:20,220 --> 00:06:21,380
This is not a conditional.

98
00:06:21,420 --> 00:06:26,130
This is the if a tool is run, it needs to come back to the worker a conditional edge to choose between

99
00:06:26,130 --> 00:06:29,610
the worker and ending and the start going into the worker.

100
00:06:29,770 --> 00:06:32,130
And then we compile our graph.

101
00:06:33,050 --> 00:06:33,690
Okay.

102
00:06:33,810 --> 00:06:40,770
And then I've got this run super step function which is the one that actually invokes the graph.

103
00:06:41,050 --> 00:06:44,650
So run super step then is pretty straightforward.

104
00:06:44,890 --> 00:06:51,290
Uh, I've got uh, the uh, the random Uuid I've set as, as an instance variable sidekick ID.

105
00:06:51,290 --> 00:06:53,130
So I set up the config this way.

106
00:06:53,330 --> 00:06:57,930
And then the state the initial state that we will use to invoke our graph.

107
00:06:58,130 --> 00:07:01,850
It is the message from the user for the success criteria.

108
00:07:01,850 --> 00:07:06,730
It's either the success criteria that's passed in or if that's if that's not set, if it's null or an

109
00:07:06,730 --> 00:07:08,970
empty string, then I use this default.

110
00:07:08,970 --> 00:07:10,810
The answer should be clear and accurate.

111
00:07:11,170 --> 00:07:13,410
Um, feedback on work is set to none.

112
00:07:13,410 --> 00:07:15,210
And these two are both false initially.

113
00:07:15,210 --> 00:07:19,210
And then we call our graph a invoke to kick it off.

114
00:07:19,530 --> 00:07:26,090
And then we pluck back the user's thing, the user's message, the reply, and the feedback from it.

115
00:07:26,250 --> 00:07:29,590
And we construct our history and that is what we reply.

116
00:07:30,590 --> 00:07:35,670
And then I've also got this at the end like a clean up function.

117
00:07:35,910 --> 00:07:40,510
And as I say, it's kind of as you'll see this gets called when resources get cleaned up.

118
00:07:40,510 --> 00:07:43,670
And I'm not 100% sure if this is always cleaning everything up.

119
00:07:43,670 --> 00:07:46,470
And so over time I will I will try and keep an eye on this.

120
00:07:46,470 --> 00:07:52,190
And I may refine this as as we go, as I get, get to see whether this is properly cleaning things up.

121
00:07:52,190 --> 00:07:55,630
So if it looks a bit different when you're looking at this code, then I might have found a better way

122
00:07:55,630 --> 00:07:58,830
to do this that's more reliably cleaning resources.

123
00:07:59,070 --> 00:08:01,270
Um, after after they've been used.

124
00:08:01,710 --> 00:08:07,310
And I'm talking particularly about, of course, about the browser that we spawn this headless browser.

125
00:08:07,310 --> 00:08:12,790
And the thing to be aware of is, okay, once we've done that, if we then kick off a new sidekick process,

126
00:08:12,830 --> 00:08:14,350
it spawns another browser.

127
00:08:14,390 --> 00:08:16,070
What have we done to that first browser?

128
00:08:16,070 --> 00:08:16,950
Have we closed it?

129
00:08:16,990 --> 00:08:20,070
Have we quit the browser that's running behind the scenes?

130
00:08:20,350 --> 00:08:23,230
Uh, or running in front of the scenes as it would happen?

131
00:08:23,430 --> 00:08:29,430
Uh, so, um, yeah, I've, uh, put this in to do that.

132
00:08:29,830 --> 00:08:30,310
Okay.

133
00:08:30,510 --> 00:08:33,310
And now on to the user interface, the app.