RAG 流程及论文串烧

文档切片

文档切片的五个层次

https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d#b123

Basic RAG 与 Advanced RAG

https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6

Basic RAG 的一般流程

basic RAG pipeline

Vanilla RAG case in brief looks the following way: you split your texts into chunks, then you embed these chunks into vectors with some Transformer Encoder model, you put all those vectors into an index and finally you create a prompt for an LLM that tells the model to answers user’s query given the context we found on the search step.
In the runtime we vectorise user’s query with the same Encoder model and then execute search of this query vector against the index, find the top-k results, retrieve the corresponding text chunks from our database and feed them into the LLM prompt as context.

prompt 设计：

def question_answering(context, query):prompt = f"""Give the answer to the user query delimited by triple backticks ```{query}```\using the information given in context delimited by triple backticks ```{context}```.\If there is no relevant information in the provided context, try to answer yourself, but tell user that you did not have any relevant context to base your answer on.Be concise and output the answer of size less than 80 tokens."""
response = get_completion(instruction, prompt, model="gpt-3.5-turbo")
answer = response.choices[0].message["content"]
return answer