Introducing graph datasets
The graph datasets we’re going to use in this chapter are richer than Zachary’s Karate Club: they have more nodes, more edges, and include node features. In this section, we will introduce them to give us a good understanding of these graphs and how to process them with PyTorch Geometric. Here are the two datasets we will use:
- The
Cora
dataset - The
Facebook
Page-Page
dataset
Let’s start with the smaller one: the popular Cora
dataset.
The Cora dataset
Introduced by Sen et al. in 2008 [1], Cora
(no license) is the most popular dataset for node classification in the scientific literature. It represents a network of 2,708 publications, where each connection is a reference. Each publication is described as a binary vector of 1,433 unique words, where 0
and 1
indicate the absence or presence of the corresponding word, respectively. This representation is also called a binary bag of words in natural language processing...