Cohere, RAG, and LLMs

I recently posted about a Canadian AI startup that secured a $450 million dollar funding round. It was a big deal. Even the Canadian Prime Minister posted on X congratulating Cohere and proudly letting  the world know that the team is based in Toronto: https://x.com/JustinTrudeau/status/1815456500221100319

One of the really interesting things about Cohere is that Patrick Lewis is the director of Machine Learning. It’s significant because Patrick is the lead author on what is one of the most important articles on AI since the famous Attention Is All You Need article in 2017 that introduced transformer architectures which gave us LLMs like ChatGPT.

In the 2020 article Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis and his colleagues introduced a new model that addressed some of the limitations of LLMs and dramatically improved their utility.

You might recall in the early days of ChatGPT that it could not provide current information in response to queries. There was a kind of cut-off date. This was sometimes framed as a safety feature. But it is also a consequence of the way LLMs work. They provide responses using pre-trained parametric memory. This is a technical way of saying that LLMs don’t search for answers: they generate them. This is done using what amounts to a static source, which was created at a set point in time.

The LLMs are incredibly impressive even if they are not super current. But what if it was possible to provide your LLM with more current information? Or information not in the training data?

One way that this can be done is through user input. There was a lot of excitement earlier this year about the expansion of the ‘context window’. We have moved from a few thousand tokens to millions of tokens in a context window. This means we can now give an LLM an entire text book and ask questions about that text.

In fact, if you are using Adobe Acrobat, this feature has been integrated already. You can ask Adobe questions about the PDF you are reading. I found a very early platform called Sharly.AI a couple of years ago that would allow you to have a conversation about several PDF documents and would even propose questions the user might like to ask based on the content.

The exciting thing about RAG is that it enables an LLM to retrieve relevant information from much larger data sets and it also optimizes the workflow. It uses a ‘pre-trained neural retriever’ and the architecture and training mechanisms are quite fascinating. It’s a little technical so I’m going to explore that in depth in a separate post. For now, if you are using a tool like Perplexity, you can see just how powerful the combination of LLMs with RAG can be. If not, I highly recommend you check it out now. If you have lost interest in LLMs, this will be an exciting discovery!

Scroll to Top