In general, a one-hot vector is used to represent categorical variables that take in values from a predefined list of values. These help in representing tokens as vectors that are required in certain use cases. In such vectors, all values are 0 except the one where the token is present, and this entry is marked 1. As you may have guessed, these are binary vectors.
For example, weather can be represented as a categorical variable with the values hot and cold. In this scenario, the one-hot vectors would be as follows:
vec(hot) = <0, 1>
vec(cold) = <1, 0>
There are two bits in hereāthe second bit is 1, to denote hot, and the first bit is 1, to denote cold. The size of the vector is 2 since there are only two possibilities available in terms of hot and cold.
Hey! Where does this work similarly in NLP?
In NLP, each of the terms present in the vocabulary can be thought of as a category, just as we had two categories to represent weather conditions. Now...