Summary
In this chapter, we learned how to add and retrieve information from memory, and how to easily include the memory in your prompts. LLMs are stateless and limited by their prompt sizes, and in this chapter, we learned techniques to save information between sessions and reduce prompt sizes while still including relevant portions of the conversation in the prompt.
In the next chapter, we will see how to use a vector database to retrieve a lot more information from memory and use a technique called retrieval-augmented generation (RAG) to organize and present that information in a useful way. This technique is often used in enterprise applications, as you trade off a little bit of the creativity offered by LLMs, but get back additional precision, the ability to show references, and the ability to use a lot of data that you own and have control over.
For our application, we are going to load thousands of academic articles into a vector database and have Semantic Kernel search...