RAG vs Fine Tuning

Intro: Brief Overview of RAG and Fine Tuning

RAG (Retrieval Augmented Generation) and Fine Tuning are both powerful ways of enhancing Large Language Models (LLMs). However, each has their own strengths and weaknesses, as well as use cases and particular situations when to choose one over the other. One of the biggest issues with generative AI right now is enhancing the models, but another large issue is dealing with limitations.

If a model isn't trained on a distinct set of information or data, it won't be able to give an accurate or up-to-date answer at that time. The popular LLM models of today are very general, so the question becomes: "how do we think about specializing them for specific use cases and adapt them in enterprise applications?" Your data is one of the most important things that you can work with in the field of AI, and using techniques such as RAG or Fine Tuning will allow you to supercharge the capabilities that your application delivers.

RAG (Retrieval Augmented Generation)

RAG is a way to increase the capabilities of a model through retrieving external and up-to-date information, augmenting the original prompt that was given to the model, and then generating a response back using that context and information. This is powerful because one of the limitations a LLM is if model doesn't have the information in context to provide an answer. But this is mitigated (in a way) with RAG.

Instead of having an incorrect or possibly hallucinated answer, we are able to work with what's known as a corpus of information. (This could be data, PDF document, spreadsheet, things that are relevant to our specific organization, or knowledge that we need to specialize in). So, when a query comes in, we're working with what's known as a retriever, which is able to pull the correct documents and relative context to what the question is, and then pass that knowledge as well as the original prompt to a large language model. Then, with its intuition and data, it is able to give a response back based on that contextualized information.

This is powerful because we can start to see that we can get better responses back from a model with our proprietary and confidential information without needing to do any retraining on the model. This is a great and popular way to enhance the capabilities of the model without having to do any fine-tuning.

Fine Tuning

As the name implies, Fine Tuning involves taking a large language foundational model and specializing it in a certain domain or area so we are working with labeled and targeted data. The data is going to be provided to the model, and after some processing, we have a specialized model for a specific use case. (E.g. To talk in a certain style, and to have a certain tone that represents our organization or company.)

So then, when a model is queried from a user or any other type of way, we have a response that gives the correct tone and output, or specialty and a domain that we'd like to receive. Essentially, this can be thought of as "baking in" context and intuition into the model - which is now apart of the model's weights, in contrast to being supplemented on top with a technique like RAG.


RAG vs Fine Tuning

Both of these techniques can enhance a model's accuracy, output, and performance. However, depending on the use case, each has their own strengths and weaknesses. The direction that you go in can greatly effect the performance, accuracy, output, compute cost(s), and much more.

RAG

With Retrieval Augmented Generation, because we're working with a corpus of information and data, it is perfect for dynamic data sources such as databases, and other data repositories where we want to continuously pull information and have that data up-to-date for the model to use and understand. At the same time, hallucinations are addressed because we are working with this retriever system and passing in the information as context in the prompt. Providing the sources for this information is important in systems where we need trust and transparency when we're using AI.

Looking at the whole system, having an efficient retrieval system is essential in how we select and pick the data that we want to provide in that limited context window. With this in mind, maintaining this is also something that needs to be thought about. At the same, what is happening in the system is effectively supplementing that information on top of the model. So, essentially we are not enhancing the base model itself, we are just giving it the relative and a contextual information it needs.

Fine Tuning

Fine Tuning is different from RAG in that we are actually "baking in" the context and intuition into the model. Because of this "baking in", we have greater influence in how the model behaves and reacts in different situations. (E.g. Is it an insurance adjuster? Can it summarize documents? Etc...) Whatever we want the model to do, we can essentially use Fine Tuning in order to help with that process.

Because this context and intuition is "baked" into the model's weights itself, this can greatly optimize speed and inference cost, as well as a variety of other factors that come with running models. For example, we can use smaller prompt context windows in order to get the responses that we want from the model. Additionally, as we begin to specialize these models, they can get smaller and smaller for specific use cases. For these reasons, Fine Tuning is great for running specific specialized models in a variety of cases.

It is important to keep in mind we have the same issue as in RAG of cutoffs. Meaning, up until the point where the model is trained, after that we have no more additional information that we can give to the model.


Use Cases

When choosing between RAG and Fine-Tuning, it is key to consider your AI enabled application's priorities and requirements. This starts off with the data, and is the data you're working with slow moving, or is it fast. For example, if we need to use up-to-date external information, and have that ready contextually every time we use a model, this could be a great use case for RAG. For instance, a product documentation chatbot where we can continually update the responses with up-to-date information.

Additionally, it is important to be aware of the industry that you might be. Fine-tuning is powerful for specific industries that have nuances in their writing styles, terminology, and vocabulary. For example, a legal document summarizer could be a perfect use case for fine-tuning.

Sources

Another important point is sources, and having transparency behind our models. With RAG, being able to provide the context and where the information came from is great, and this could be an excellent case for the chatbot for retail insurance example mentioned previously. It would be optimal in a variety of other specialties as well, where having the source and information in the context of the prompt is very important.

However, at the same time, we may have things such as past data in our organization that we can use to train a model. So, we can let it be accustomed to the data that we're gonna be working with. Using the previously mentioned legal summarizer example, we could have past data on different legal cases and documents that we feed the model, so it understands the situation that it's working in and we have better more desirable outputs.

RAG plus Fine-Tuning

The best situation may be a combination of both RAG and Fine-Tuning. For example, say we have a financial news reporting service. We can fine-tune it to be native to the industry of finance and understand all the "lingo" in that industry, as well as could give it past data of financial records and let it understand how we work in that specific industry. It would also be able to provide the most up-to-date sources for news and data and be able to provide that with the confidence, transparency, and trust to the end user who's making that decision, and needs to know the source.

This is where a combination of Fine-Tuning and RAG is noteworthy and optimal, because we can build applications taking advantage of both RAG as a way to retrieve that information and have it up-to-date, but Fine-Tuning to specialize our data and hone our model in a certain domain.


Terms

  • Hallucinations - where AI generates coherent but inaccurate information.
  • Corpus - Refers to the collection of documents or information that serves as the knowledge base for the system. This corpus is the foundation of a RAG system, containing the data from which relevant information is retrieved to augment the generation process.
  • Retriever - A retriever in Retrieval Augmented Generation (RAG) is a component responsible for finding and extracting relevant information from an external knowledge base to support the generation process.
  • Weights - LLM model weights are numerical values that define the strength of connections between neurons across different layers in the model. These weights are a crucial component of the model's parameters, which determine its behavior and performance.

Written: January 05, 2025