Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Neo4j Cookbook

You're reading from   Neo4j Cookbook Harness the power of Neo4j to perform complex data analysis over the course of 75 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in May 2015
Publisher
ISBN-13 9781783287253
Length 226 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ankur Goel Ankur Goel
Author Profile Icon Ankur Goel
Ankur Goel
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Getting Started with Neo4j 2. Connecting to Neo4j FREE CHAPTER 3. The Cypher Query Language 4. Data Modeling and Leveraging with Neo4j 5. Mining the Social Treasure 6. Developing Location-based Services with Neo4j 7. Visualization of Graphs 8. Industry Usages of Neo4j 9. Neo4j Administration and Maintenance 10. Scaling Neo4j Index

Importing data from the CSV format to Neo4j

Graph data comes in different formats, and sometimes it's a combination of two or more formats. It is very important to learn about the various ways to import data, which is in different formats into Neo4j. In this recipe, you will learn how to import data present in the CSV file format into the Neo4j graph database server. A sample CSV file is shown as follows:

Importing data from the CSV format to Neo4j

Getting ready

To get started with this recipe, install Neo4j by using the steps from the earlier recipes of this chapter.

How to do it...

There are several methods that you can use to import data which is in the CSV format or Excel into Neo4j, which are described in the sections that follow.

Using a batch importer

There is excellent tool written by Michael Hunger, which can be cloned from https://github.com/jexp/batch-import.

The CSV file has to be converted into the format specified in the readme file. The tool is very flexible in terms of the number of properties and the types of each property. The nodes and relationships can be within the same file or within multiple files. The example file format is present in the sample directory. To run the tool, use the following command:

$ wget https://dl.dropboxusercontent.com/u/14493611/batch_importer_22.zip
$ unzip batch_importer_22.zip
# Download sample nodes.csv and rels.csv from the github repo under sample
$ import.sh test.db nodes.csv rels.csv
$ cp test.db ${NEO4J_ROOT}/data/graph.db

Each parameter in the command has been fully explained in the readme file.

Note

The batch import tool also supports a parallel batch inserter, which can speed up the process of importing data from a large number of nodes and relationships.

Benchmark figures claimed by the batch importer tool are 2 billion nodes and 20 billion relationships in 11 hours (500K elements/second).

This is claimed over the EC2 high I/O instance.

Using custom scripts

Custom scripts can be written in any language to import data from CSV files. Custom scripts give you the advantages of checking various erroneous scenarios, leaving out redundant columns, and other flexibilities. For a smaller number of nodes and relationships, custom scripts can be written in any language of your choice.

The exact format of the script will depend on the CSV file. You can write the script as follows:

#Bash Script for importing nodes
NEO4J_ROOT="/var/lib/neo4j"
while read LINE
do
  name=`echo $LINE | awk -F "," '{print $3}'`
  ${NEO4J_ROOT}/bin/neo4j-shell -c mknode --np \"{'name':$name}\" -v
done

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Similar scripts can be written for relationships too, as shown here:

#Bash Script for creating relationships
#Format of csv should be startnode,endnode,type,direction
NEO4J_ROOT="/var/lib/neo4j"
IFS=","
while read LINE
do
  echo $LINE
  array=($LINE)
  ${NEO4J_ROOT}/bin/neo4j-shell -c cd -a ${array[0]} mkrel -d ${array[3]} -t ${array[2]} ${array[1]}
done

This task can also be achieved in Python using the py2neo module, as shown in the following script:

#Sample Python code to create nodes from csv file
import csv
from py2neo import neo4j, cypher
from py2neo import node, rel
graph_db = neo4j.Graph("http://localhost:7474/db/data/")
ifile = open('nodes.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
  nodes = graph_db.create({"name":row[2]})
ifile.close()

A similar Python code can be written for creating relationships, too. The py2neo module can also be used to create a batch request, wherein there's a whole list with parameters as shown in the following code:

records = [(101, "A"), (102, "B"), (103, "C")]
graph_db = neo4j.Graph ("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
for emp_no, name in records:
  batch.get_or_create_indexed_node("Employees", "emp_no", emp_no,{
  "emp_no": emp_no, "name": name
})
nodes = batch.submit()

How it works...

Batch import performance is achieved by skipping all the transactional behavior and losing ACID guarantees. If the batch import fails, the database will be broken, possibly irrecoverably, and lead to the loss of all the information.

See also

Custom scripts can be written for REST as well as for the embedded interfaces of Neo4j. For the full cookbook on py2neo recipes, refer to http://py2neo.org/2.0/cookbook.html.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image