Exploring LLMs
An LLM is a model with a lot of parameters that’s been trained on a huge amount of text data. When we say “a lot of parameters,” we usually mean billions. For example, GPT-3 from OpenAI, which was a model used by ChatGPT, had 175 billion parameters, whereas LaMDA from Google had 137 billion parameters. These days, the boundary is shifting, which means we can use smaller LLMs – for example, Phi-2 from Microsoft consists of 2.7 billion parameters [5], and Gemini Nano-1 from Google has 1.6 billion parameters [6].
What’s so special about these models? They’re auto-regressive statistical models that treat a piece of text as a sequence and are trained to predict the next (or masked) word in such a sequence. Now is a good time to mention that LLMs don’t exactly operate with words and that they transform a piece of text into tokens (a process known as tokenization). A token is a unit that LLMs use to operate – it’...