Word Representation in Vector Space
This section will cover the different architectures for computing a continuous vector representation of words from a corpus. These representations will depend on the similarity of words, in terms of meaning. Also, there will be an introduction to a new Python library (Gensim) to do this task.
Word Embeddings
Word embeddings are a collection of techniques and methods to map words and sentences from a corpus and output them as vectors or real numbers. Word embeddings generate a representation of each word in terms of the context in which the word appears. The main task of word embeddings is to perform a dimension reduction from a space with one dimension per word to a continuous vector space.
To better understand what that means, let's have a look at an example. Imagine we have two similar sentences, such as these:
I am good.
I am great.
Now, encoding these sentences as one-hot vectors, we have something like this:
I à [1,0,0,0]
Am à [0,1,0,0]
Good à [0,0,1,0]
Great...