Skip to main content

[experiment] [backlog] Remembering facts: RAG vs Fine Tuning

· 3 min read

My use case is to have the llm query my personal conversations & other digital activity with questions like:

  • Tell me what were the major events in my relationship with X hehe during Feb-Oct 2023?
  • What have I learned from that period? Aka what mistakes that showed up then I make less, if any, in my relationship with Y based on our Telegram conversations?

My understanding is there are at least these ways to solve this problem:

RAG

Very standard RAG - which I have yet to do still.

Chunk the messages. For example each chunk is a message [chat name, time stamp, author, message] Embed all these chunks, finally ask the question, embed the question, get relevant chunks, add them to the prompt, get the answer.

Fine Tuning

If the goal for the fine tuned model to reply as I would, or be able to simulate my contacts, then simply feeding all conversations as linear text makes sense. Also actually never did this, so a good idea to try.

Fine-tuning algorithm ideas:

  • A sliding window of 100 messages trying to predict the 101st message.

My hypothesis is that its not aligned with the goal. The goal is to remember facts. So the fine tuning should be done by feeding the model very limited information about prior conversation so it could not simply emulate the style of the person we are predicting a response for, but rather remember the actual message they sent.

Algorithm ideas:

  • Pre-process all messages and classify them as important / not important. For the not important ones figure out a class of it, maybe we want to query how often we talk about class C, without the details of the conversation, because its not substantial.
  • Alternative pre-process: Find batches of related messages and summarize them into a single text chunk, with possible quotes and references to original messages.

Thoughts:

  • Sliding window might be perfectly fine for remembering. The way to decrease the loss on the test set is to remember the messages. So if you run it often enough - it should remember all the messages. You can have a very short sliding window - even of 1 - the message you are trying to predict, and the prompt would be [chat, time, author] -> message. In a sense the followup algorithm is more of an optimization.
  • Would be interesting to benchmark different approaches
  • I am also sure someone did this already TODO find posts / papers on this topic