1
00:00:00,080 --> 00:00:03,800
I know it feels like we've been on this slide forever, but it's such an important slide.

2
00:00:03,840 --> 00:00:05,360
Ten Juicy techniques.

3
00:00:05,520 --> 00:00:06,440
Number six.

4
00:00:06,720 --> 00:00:08,240
Query expansion.

5
00:00:08,280 --> 00:00:10,080
This is this is one that a lot of people use.

6
00:00:10,080 --> 00:00:11,000
This is just saying.

7
00:00:11,040 --> 00:00:11,640
Oh, okay.

8
00:00:11,680 --> 00:00:16,240
Look, um, um, we could we could, uh, add to step five.

9
00:00:16,400 --> 00:00:19,120
Don't just ask the model to write one query.

10
00:00:19,160 --> 00:00:23,880
Ask it to write a bunch of them, a bunch of things that we could call our database lookup for.

11
00:00:23,920 --> 00:00:28,360
Let's have three different Rag queries and collect similar documents for each.

12
00:00:28,400 --> 00:00:30,480
That's called query expansion.

13
00:00:30,640 --> 00:00:31,640
Super common.

14
00:00:32,000 --> 00:00:38,160
And then Reranking Reranking is saying we might have ended up with lots of different chunks from our

15
00:00:38,160 --> 00:00:42,680
data store, particularly if we've done number six query expansion, we might have a lot.

16
00:00:42,720 --> 00:00:49,080
We will have people like to say we have k times n, you have k the, the, the, the amount that you

17
00:00:49,080 --> 00:00:54,120
retrieve every time times n the number of queries you've got a potentially a bunch of chunks.

18
00:00:54,320 --> 00:01:00,560
Well, we could then take all of these chunks and try and reorder them in terms of the ones that's most

19
00:01:00,560 --> 00:01:05,120
relevant for the question at the top and all the way down to the least relevant for the question.

20
00:01:05,120 --> 00:01:06,320
And how are we going to do that?

21
00:01:06,360 --> 00:01:08,000
Well, we could have an LLM do that.

22
00:01:08,000 --> 00:01:11,640
We could call an LLM and say, you're not responsible for answering the question.

23
00:01:11,640 --> 00:01:12,960
Someone else is going to do that.

24
00:01:13,000 --> 00:01:19,640
You're responsible simply for ordering these chunks in order of most important to least important.

25
00:01:19,640 --> 00:01:20,920
Do that step.

26
00:01:21,080 --> 00:01:26,520
And then potentially we might chop off some of the unimportant ones when we finally send it to an LLM

27
00:01:26,560 --> 00:01:28,360
to actually craft the answer.

28
00:01:28,520 --> 00:01:30,600
So that is reranking.

29
00:01:30,680 --> 00:01:32,320
Again, super popular.

30
00:01:32,320 --> 00:01:36,640
And you can imagine, particularly if you've done some of the previous steps, Reranking can be really

31
00:01:36,640 --> 00:01:40,760
important to avoid polluting the context with tons and tons of chunks.

32
00:01:40,920 --> 00:01:41,520
Okay.

33
00:01:41,560 --> 00:01:45,280
Number eight, we're on the home stretch hierarchical rag.

34
00:01:45,560 --> 00:01:51,960
This is trying to fix a particular Achilles heel with rag that I mentioned before, which is that rag

35
00:01:51,960 --> 00:01:56,480
is really bad at answering questions which span lots of documents.

36
00:01:56,480 --> 00:01:58,080
It has a really hard time with that.

37
00:01:58,080 --> 00:02:03,830
And so there are some, pretty simple, uh, hacks that try and get around this.

38
00:02:03,830 --> 00:02:10,750
And one of them is to summarize your knowledge base at various levels of roll up, like a hierarchy,

39
00:02:10,910 --> 00:02:14,910
have a kind of a summary of all products, a summary of all employees.

40
00:02:15,110 --> 00:02:21,830
And then when you do your Rag lookup, do a Rag lookup across the summaries first to get the kind of

41
00:02:21,870 --> 00:02:27,670
coarse grained information and then drill down to to chunks that come from the fine grained information.

42
00:02:27,670 --> 00:02:33,230
And that means that if, for example, you could have an employee summary document that lists all employees

43
00:02:33,230 --> 00:02:36,510
and their salaries in one condensed document.

44
00:02:36,870 --> 00:02:43,750
And that way, if someone asks a question like how many employees have a salary of under X, then hopefully

45
00:02:43,750 --> 00:02:49,310
that chunk with this compressed data, this summarized data about salaries would be surfaced.

46
00:02:49,310 --> 00:02:54,030
And so the the the Rag, the LLM would have a hope of answering it otherwise.

47
00:02:54,030 --> 00:02:56,590
Without that, it's impossible to answer.

48
00:02:56,630 --> 00:03:02,470
Uh, how many of kind of question because it would require chunks from so much of the knowledge base.

49
00:03:02,750 --> 00:03:05,030
So that's a pro move hierarchical rag.

50
00:03:05,070 --> 00:03:11,310
And you can probably tell like it is very hand-wavy, very hacky because you could roll up in one way,

51
00:03:11,310 --> 00:03:13,870
but that might not be aligned with the question that someone asked.

52
00:03:13,870 --> 00:03:17,990
They might ask a question that's like how many documents starting with a letter A do you have?

53
00:03:18,030 --> 00:03:20,310
And obviously you haven't you haven't figured that one out.

54
00:03:20,310 --> 00:03:24,190
You haven't put that in in some hierarchy, uh, by by alphabet.

55
00:03:24,190 --> 00:03:27,190
So, so, you know, it's it's always a whack a mole.

56
00:03:27,190 --> 00:03:30,470
It's always someone asks a question that spans your documents.

57
00:03:30,470 --> 00:03:31,710
You don't have a good answer to it.

58
00:03:31,710 --> 00:03:36,390
So you think about a new hierarchical summary document that you could slip in.

59
00:03:36,390 --> 00:03:39,190
And that's that's the point of hierarchical rag okay.

60
00:03:39,230 --> 00:03:42,790
And we end with two fancy ones that people love to talk about.

61
00:03:42,790 --> 00:03:45,030
One of them is Graph Rag.

62
00:03:45,790 --> 00:03:46,910
Graph rag.

63
00:03:47,350 --> 00:03:53,550
It's actually it's it sounds really, really fancy, but it's a fairly simple idea, which is that often

64
00:03:53,590 --> 00:03:56,790
the documents in your knowledge base aren't isolated.

65
00:03:56,790 --> 00:03:59,100
They have some relationship with each other.

66
00:03:59,100 --> 00:04:03,220
So like an employee might might have a boss, a manager, which is another employee.

67
00:04:03,460 --> 00:04:09,340
So you could really think that that your documents frequently or even your, your chunks are related

68
00:04:09,340 --> 00:04:10,660
to other chunks.

69
00:04:10,860 --> 00:04:17,060
And when you're doing your your rag query, if, for example, you remember there's a metadata associated

70
00:04:17,060 --> 00:04:23,420
with each chunk, if we'd been careful in that metadata to specify other chunks that are related could

71
00:04:23,420 --> 00:04:28,020
be as simple as other chunks in the same document, or it could be something like the chunk associated

72
00:04:28,020 --> 00:04:29,460
with a boss or something like that.

73
00:04:29,700 --> 00:04:30,260
Then.

74
00:04:30,300 --> 00:04:36,820
Then when you look up that chunk, you could also then shove in the context the related chunks that

75
00:04:36,820 --> 00:04:42,540
are kind of one hop or two hops away in the in terms of these relationships that you've recorded in

76
00:04:42,540 --> 00:04:43,420
the metadata.

77
00:04:43,740 --> 00:04:48,060
And of course, taking it a step further, as you probably guessed, I was going to say there are special

78
00:04:48,060 --> 00:04:55,900
databases called graph databases like Neo4j's and many of them that are specially designed to store

79
00:04:56,220 --> 00:04:59,140
Information as as nodes with edges.

80
00:04:59,180 --> 00:05:02,940
A way to think about things in terms of relationships between entities.

81
00:05:02,940 --> 00:05:07,980
And if you store it in a graph database like that, then it's even easier for you to find a chunk that's

82
00:05:07,980 --> 00:05:13,300
close to a vector, and then include that and include all the chunks that are the sort of 1 or 2 neighbors

83
00:05:13,300 --> 00:05:15,460
away in the graph database.

84
00:05:15,820 --> 00:05:17,300
So it's a trendy thing to do.

85
00:05:17,500 --> 00:05:20,100
It tends to work only in reasonably.

86
00:05:20,140 --> 00:05:25,340
It tends to work well only in quite specific situations where your data does have a lot of relationships

87
00:05:25,340 --> 00:05:26,300
between it.

88
00:05:26,300 --> 00:05:31,100
And I would argue that most of the time you can handle this just by using the metadata.

89
00:05:31,140 --> 00:05:37,740
The need for a graph database, trendy though they are, are not as huge unless you very specifically

90
00:05:37,740 --> 00:05:42,740
got a problem that lends itself well to that kind of relationship based database.

91
00:05:42,740 --> 00:05:46,220
So definitely one to watch and potentially experiment with.

92
00:05:46,380 --> 00:05:50,020
Uh, but but uh, yeah, it's it's quite a fancy one.

93
00:05:50,020 --> 00:05:53,580
And then last but not least is a gentic rag.

94
00:05:53,780 --> 00:05:54,940
You may have heard of it.

95
00:05:54,980 --> 00:05:56,260
What is this?

96
00:05:56,460 --> 00:05:59,660
Well, it's it's a different way of thinking about it.

97
00:05:59,700 --> 00:06:01,060
It's much the same stuff.

98
00:06:01,060 --> 00:06:07,500
But what it's saying is, look, we've designed this, this very linear idea that the user asks a question.

99
00:06:07,700 --> 00:06:10,180
Then we go and do a vector based lookup.

100
00:06:10,180 --> 00:06:12,740
And we always shove that in the context.

101
00:06:12,740 --> 00:06:14,500
And then we call the LLM.

102
00:06:14,900 --> 00:06:19,380
Well that's a bit you know 2024 whatever.

103
00:06:19,780 --> 00:06:24,180
It's you know, things in ancient history is 2024 these days.

104
00:06:24,340 --> 00:06:32,700
Uh, it's a bit 2024, uh, that nowadays the way that we like to, to think about things is, is let

105
00:06:32,700 --> 00:06:34,820
the LLM make the decisions.

106
00:06:34,900 --> 00:06:39,820
So instead of just doing a vector lookup and putting all that in the context, call an LLM and give

107
00:06:39,820 --> 00:06:45,260
it some tools, give it a tool that could do a vector lookup based on a query, maybe give it some more

108
00:06:45,260 --> 00:06:50,460
tools, like a SQL tool that can run SQL on a database, or if it's the files that we've got in our

109
00:06:50,460 --> 00:06:56,330
knowledge base, just do a string lookup in the files if it wants to give it access to a bunch of tools

110
00:06:56,330 --> 00:07:02,210
and then call this like your retrieval agent or something, and then say, here's a user question.

111
00:07:02,250 --> 00:07:04,370
Go to town, go and figure this out.

112
00:07:04,410 --> 00:07:08,410
You can come up with your queries, you can make your vector lookups.

113
00:07:08,410 --> 00:07:10,170
And you can also do a SQL query.

114
00:07:10,170 --> 00:07:13,730
You can also look for ticket prices by by calling this API.

115
00:07:13,890 --> 00:07:18,170
You can do all these different things and use this as your way of retrieving context.

116
00:07:18,530 --> 00:07:25,410
Now in large part it is it is equivalent to some of the steps before, uh, if you just give it a vector

117
00:07:25,410 --> 00:07:27,610
database then it's going to call back.

118
00:07:27,770 --> 00:07:33,970
It's very similar you can imagine to doing query expansion, query rewriting and then query expansion,

119
00:07:33,970 --> 00:07:38,210
and then it's running the queries and then it's basically reranking because it's going to figure out

120
00:07:38,210 --> 00:07:39,290
which ones matter.

121
00:07:39,290 --> 00:07:42,250
And that's what will eventually be used to answer the question.

122
00:07:42,250 --> 00:07:44,890
So it's doing the ones before number ten.

123
00:07:44,930 --> 00:07:51,410
Number ten is about doing the ones before it, but letting the LM decide which to do and in which order.

124
00:07:51,570 --> 00:07:56,330
And potentially it can be in a loop so that if it does a few of them and it's not getting results,

125
00:07:56,330 --> 00:07:57,490
it can keep going.

126
00:07:57,770 --> 00:08:04,890
So a genetic rack has all of the great things and some of the caveats about working with this kind of

127
00:08:04,930 --> 00:08:07,010
more, more flexible approach.

128
00:08:07,290 --> 00:08:12,650
It can be very powerful and get answers where you've not been able to get them before, because it can

129
00:08:12,650 --> 00:08:13,570
keep working at it.

130
00:08:13,610 --> 00:08:18,890
Keep experimenting until the model decides to stop making tool calls and then it satisfies.

131
00:08:19,130 --> 00:08:22,450
But also it's of course, less predictable.

132
00:08:22,490 --> 00:08:23,410
It's harder.

133
00:08:23,410 --> 00:08:27,690
You can put metrics around it and measure it and get good results, but then you could run the same

134
00:08:27,690 --> 00:08:30,530
thing tomorrow and get bad results and that would be horrible.

135
00:08:30,650 --> 00:08:36,650
So, you know, it has the downsides in terms of repeatability, robustness and so on that you often

136
00:08:36,650 --> 00:08:40,130
find with these more what we call autonomous kind of kind of approaches.

137
00:08:40,290 --> 00:08:44,290
And if you're interested in more about this then of course my agenda course covers some of this.

138
00:08:44,330 --> 00:08:51,090
In the last week we have a graph based uh, memory, along with uh, also a vector based memory as well.

139
00:08:51,090 --> 00:08:56,760
We cover that a few times that that is the sort of that the new agentic approach to doing the same thing.

140
00:08:56,880 --> 00:09:03,520
But but very much keep in mind that it is still in large part carrying out many of the previous steps

141
00:09:03,520 --> 00:09:09,720
on this list, just doing it through tool calling rather than through R code orchestrating step by step.

142
00:09:09,760 --> 00:09:11,120
Finally, I will end this slide.

143
00:09:11,120 --> 00:09:13,880
I know I've been on this slide for 20 minutes.

144
00:09:14,240 --> 00:09:19,760
I'm going to end this by saying a lot of people are saying that rag is dead.

145
00:09:19,800 --> 00:09:20,760
You hear that a lot.

146
00:09:20,800 --> 00:09:22,080
You see lots of memes on this.

147
00:09:22,080 --> 00:09:23,840
And there's typically two different reasons for this.

148
00:09:23,840 --> 00:09:25,440
People come at it from two directions.

149
00:09:25,440 --> 00:09:29,400
One of them is that rag is dead because context windows are so big.

150
00:09:29,400 --> 00:09:35,600
Now that you can put all your knowledge base in a context window and let the transformer use attention

151
00:09:35,600 --> 00:09:38,960
to figure out what's what's relevant, that's surely the best.

152
00:09:38,960 --> 00:09:41,640
That's one reason people say that rag is dead.

153
00:09:42,360 --> 00:09:45,080
And I think that's that's, uh, that's nonsense.

154
00:09:45,080 --> 00:09:50,800
Because, uh, sure, you could say that, but but clearly there are going to be cases when your knowledge

155
00:09:50,800 --> 00:09:55,720
base is much, much, much bigger, like in week eight of this course, when we're going to have a monstrously

156
00:09:55,720 --> 00:09:56,840
large knowledge base.

157
00:09:57,000 --> 00:10:02,440
And okay, maybe you could put all that into an LLM call, or maybe you could carve it up and make many

158
00:10:02,480 --> 00:10:05,680
LLM calls, one after another and then another one to judge it.

159
00:10:05,680 --> 00:10:08,480
But wow, that's not going to be an efficient use of time.

160
00:10:08,760 --> 00:10:15,560
At the very least you could get down to you could throw out 90% very easily with some kind of a vector

161
00:10:15,560 --> 00:10:18,440
based search or any other, uh, on this list.

162
00:10:18,440 --> 00:10:24,640
So it just seems to me that however big the context window gets, this, this kind of approach to try

163
00:10:24,640 --> 00:10:28,680
and remove irrelevant context is always going to be important.

164
00:10:29,080 --> 00:10:35,480
But the other reason people say that rag is dead is because of number ten, because of a genetic rag.

165
00:10:35,600 --> 00:10:44,200
Now that we have agents, this kind of of pipeline where you do a vector based lookup and then you surface

166
00:10:44,200 --> 00:10:47,600
relevant content, that approach is kind of old school.

167
00:10:47,720 --> 00:10:55,030
The new approach is to equip an agent with the ability to, uh, to itself decide how to dig into the

168
00:10:55,030 --> 00:11:01,710
data to do the research itself using any of these techniques and figure out how best to retrieve relevant

169
00:11:01,710 --> 00:11:03,750
context and then answer the question.

170
00:11:04,070 --> 00:11:06,030
That's a super interesting point.

171
00:11:06,030 --> 00:11:10,950
But my answer to that would be, uh, predictably enough, you knew I was going to say long live, right?

172
00:11:11,190 --> 00:11:13,150
It's such a familiar trope.

173
00:11:13,350 --> 00:11:18,590
Uh, but the reason I say long live is that come on, that's just rag by a different name.

174
00:11:18,790 --> 00:11:24,790
Sure, you could call it a rag and sound like you're so modern, so fashionable.

175
00:11:25,110 --> 00:11:30,590
But but okay, it's still vector based or encoder based retrieval of relevant context.

176
00:11:30,590 --> 00:11:32,110
Or maybe you're using a graph database.

177
00:11:32,110 --> 00:11:38,630
Whatever you're doing, you're using techniques to retrieval techniques to augment your generation from

178
00:11:38,630 --> 00:11:39,350
an LLM.

179
00:11:39,350 --> 00:11:41,950
And so it's still retrieval augmented generation.

180
00:11:41,950 --> 00:11:48,350
It's still rag even if it's rag, whatever it is, by whatever name you choose to call it, as far as

181
00:11:48,350 --> 00:11:50,470
I'm concerned, that's still rag.

182
00:11:50,670 --> 00:11:51,750
Long live rag.