What is summarization?
Summarization is one of the common natural language processing tasks. The goal of this task is to condense a long original piece, (text, audio, video, etc.) into a shorter text version while retaining important information from the original one.
The same limitations of retrieval-augmented generation (RAG) that we have discussed in the previous chapters apply to summarization tasks: hallucinations, lack of “real” reasoning capabilities, and challenges with processing complex formatting. Paying close attention to the input document, and, for example, implementing a rewriting step to improve legibility may help you generate better results.
In practice, when we work on summarization tasks, we typically want to control for a few factors:
- Tone of voice
- Formatting output (either using tables or lists with bullet points with multiple indentations, or just a concise text, etc.)
- Keeping the most important points of the original documents...