Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Neo4j 3.x

You're reading from   Learning Neo4j 3.x Effective data modeling, performance tuning and data visualization techniques in Neo4j

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781786466143
Length 316 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Jerome Baton Jerome Baton
Author Profile Icon Jerome Baton
Jerome Baton
Rik Van Bruggen Rik Van Bruggen
Author Profile Icon Rik Van Bruggen
Rik Van Bruggen
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Graph Theory and Databases FREE CHAPTER 2. Getting Started with Neo4j 3. Modeling Data for Neo4j 4. Getting Started with Cypher 5. Awesome Procedures on Cypher - APOC 6. Extending Cypher 7. Query Performance Tuning 8. Importing Data into Neo4j 9. Going Spatial 10. Security 11. Visualizations for Neo4j 12. Data Refactoring with Neo4j 13. Clustering 14. Use Case Example - Recommendations 15. Use Case Example - Impact Analysis and Simulation 16. Tips and Tricks

The Property Graph model of graph databases

The NoSQL category of graph databases, as we have seen previously, is in a class of its own. In many ways, this is because the underlying data model of graph databases--the Property Graph data model--is of a very specific nature, which we will explain a bit further.

First of all, let's define the Property Graph data model. Essentially, it means that we will be storing our data in the graph database.

A graph structure means that we will be using vertices and edges (or nodes and relationships, as we prefer to call these elements) to store data in a persistent manner. As a consequence, the graph structure enables us to perform the following:

  • Represent data in a much more natural way, without some of the distortions of the relational data model
  • Apply various types of graph algorithms on these structures

In short, it enables us to treat the graph nature of that data as an essential part of our capabilities. One of the key capabilities that we will find in the remainder of this book is the capability to traverse the graph: to walk on its nodes and relationships and hop from one node to the next by following the explicit pointers that connect the nodes. This capability--sometimes also referred to as index free adjacency, which essentially means that you can find adjacent/neighbouring nodes without having to do an index lookup--is key to the performance characteristics that we will discuss in later paragraphs.

However, it is important to realize that the property graph model is not suited for all graph structures. Specifically, it is optimized for the following:

  • Directed graphs: The links between nodes (also known as the relationships) have a direction.
  • Multirelational graphs: There can be multiple relationships between two nodes that are the same. These relationships, as we will see later, will be clearly distinct and of a different type.
  • Storing key-value pairs as the properties of the nodes and relationships.

In the different kinds of properties that can belong to the different elements of the graph structure, the most basic ones, of course, are properties assigned to vertices and edges:

An example of a simple property graph

Let's investigate this model in a bit more detail. When looking closer at this, we find the following interesting aspects of this model:

  • There is no fixed schema. The database, in and of itself, does not impose that you have to have a schema; although most software professionals will agree that having some kind of schema as you move closer to production is probably not a bad idea.
  • Partly because of the schema-less nature of the database, it seems to be a very nice fit for dealing with semi-structured data. If one node or relationship has fewer or more properties, we do not have to alter the design for this; we can just deal with that difference in structure automatically and work with it in exactly the same way.
  • Nodes and node properties seem to be quite easy to understand. In relational terms, one can easily compare nodes with records in a table. It's as if the property graph contains lots and lots of single-row tables, that is, the nodes of the graph. Nodes will have properties just like records/rows in a table will have fields/columns.
  • Relationships are a bit different. They always have a start and endpoint, and therefore have a direction. They cannot be dangling, but can be self-referencing (the same node as the start and endpoint). However, the real power lies in the following facts:
    • Relationships are explicit: They are not inferred by some kind of constraint or established at query time through a join operation. They are equal citizens in the database; they have the same expressive power as the nodes representing the entities in the database.
    • Relationships can have properties too: They can have values associated with them that can specify the length, capacity, or any other characteristic of that relationship. This is terribly important and very different from anything we know from the relational world.

In Neo4j then, this data model has been enriched with a couple of key concepts that extend the core property graph model. Two concepts are important, and related but different: node labels and relationship types.

Node labels

Node labels are a way of semantically categorizing the nodes in your graph. A node can have zero, one, or more labels assigned to it--similar to how you would use labels in something like your Gmail inbox. Labels are essentially a set-oriented concept in a graph structure: they allow you to easily and efficiently create subgraphs in your database, which can be useful for many different purposes, such as querying on only a part of your database content. One of the most important things that you can do with labels is create some kind of typing structure or schema in your database without having to do this yourself (which is what people used to do all the time before the advent of labels in Neo4j 2.0). A node with one label is comparable to a row in a table. There is no comparison with the (so-called) relational world for a node with several labels. Although not required, a node should have at least one label.

Relationship types

Relationship types achieve something similar to what you do with node labels, but with relationships. The purpose of doing so, however, is mostly very different. Relationship types are mandatory properties of relationships (every relationship must have one and only one type--two nodes can be linked by several relations) and will be used during complex, deep traversals across the graph, when only certain kinds of paths from node to node are deemed important by a specific query.

This should give you a good understanding of the basic data model that we will be using during the remainder of this book. Neo4j implements a very well-documented version of the property graph database, and as we will see later, is well-suited for a wide variety of different use cases. Let's explore the reasons for using a graph database like Neo4j a bit more before proceeding.

You have been reading a chapter from
Learning Neo4j 3.x - Second Edition
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781786466143
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image