Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Learning Redis
Learning Redis

Learning Redis: Design efficient web and business solutions with Redis

eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Learning Redis

Chapter 1. Introduction to NoSQL

In this chapter, you will learn about the emerging realm of NoSQL and get introduced to various classifications in the NoSQL domain. We will also understand the position of Redis in the NoSQL domain. We'll cover the following topics:

  • Data in Enterprise
  • NoSQL
  • Use cases for NoSQL

An Internet-enabled world

We live in interesting times; in the last decade, a lot of changes have happened that have changed the way we experience the world of the Internet and the ecosystem around it. In this chapter, we will focus on some of the reasons that led to progress and discuss the developments happening in the world of data storage.

The following figure is a rough sketch of the evolution process that happened in cyberspace, the data for which is collected from the Internet, and gives a rough idea of the growth experienced in Internet-based services:

An Internet-enabled world

Evolution: Social media, processors and cores, databases (NoSQL)

The preceding chart indicates that the hardware industry saw a paradigm shift during the middle half of the first decade. Instead of new processors coming out with increased clock speeds, the newer generation of processors came with multiple cores and their numbers increased in processors with a subsequent release. Gone were the days when a big machine with lots of memory and a powerful processor could solve any problem or, in other words, when an Enterprise depended on vertical scaling to solve their performance issues. What it signaled, in a way, was that parallel computing was the future and it will be deployed on commodity-based machines.

With the hardware industry signaling the arrival of parallel computing, the newer generation of solutions had to be distributed and parallel in nature. This means that they needed to have logic executed in parallel and data stored in distributed datastores; in other words, horizontal scaling was the way to go. Moreover, with Web 2.0, there was an emergence of social media, online gaming, online shopping, collaborative computing, cloud computing, and so on. The Internet was becoming a ubiquitous platform.

The popularity of the Internet and the number of people using the Internet was increasing by the day, and the amount of time spent on the Internet was also increasing. Another important aspect to be looked at was that users across geographies were coming together in this Internet-enabled world. There are many reasons for this; for one, websites were becoming intelligent and in a way, were engaging end users far effectively than their predecessors. Another factor that was making Internet adoption faster and easier were innovative handheld devices, such as smartphones, tablets, and so on. Nowadays, the kind of compute power these handheld devices have can be compared to that of computers. In this dynamically changing world, Internet-based software solutions and services are expanding the horizon of social media, which brings people together on a common platform. This created a new business domain like social-Enterprise media, where social media bridges with Enterprise. This was definitely going to have an impact on traditional Enterprise solutions.

The Internet effect made Enterprise solutions undergo a metamorphic shift. The shift in Enterprise architecture went from a nuanced set of requirements, typically expected from Enterprise solutions, to adopting newer requirements, which were the bastion of social media solutions. Nowadays, Enterprise solutions are integrating with social media sites to know what their customers are talking about; they themselves have started creating platforms and forums where the customer can come and contribute their impressions about products and services. All this data exchange happens in real time and needs a highly concurrent and scalable ecosystem. To sum it up, Enterprise solutions want to adopt the features of social media solutions, and this has a direct and proportional bearing on the nonfunctional requirements of their architectures. Features such as fault management, real-time big data crunching, eventual consistency, high numbers of reads and writes, responsiveness, horizontal scalability, manageability, maintainability, agility, and so on, and their impact on Enterprise architecture, are being looked at with renewed interest. Techniques, paradigms, frameworks, and patterns that were used in social media architecture are being studied and reapplied in Enterprise architecture.

One of the key layers in any solution (social media or Enterprise) is the data layer. Data, the way it is arranged and managed, and the choice of datastore forms the data layer. From a designer's perspective, data handling in any datastore is governed by perspectives such as consistency, availability, and partition tolerance, or better known as Eric Brewer's CAP theorem. While it is desirable to have all the three, in reality, any data layer can have a combination of two of the mentioned perspectives. What this means is that the data in a solution can have many combinations of perspectives, such as availability-partition tolerance (this combination has to forego consistency in data handling), availability-consistency (this combination has to forego partition tolerance which will impact the amount of data that the data layer can handle), and consistency-partition tolerance (this combination has to forego availability).

The CAP theorem has a direct bearing on the behavior of the system, read/write speeds, concurrency, maintainability, clustering patterns, fault tolerance, data loads, and so on.

The most common approach when designing the data model is to arrange it in a relational and normalized way. This works well when the data is in transactional mode, needs consistency, and is structured, that is, it has a fixed schema. This approach of normalizing data appears over-engineered when the data is semistructured, has a tree-like structure, or is schema-less, where consistency can be relaxed. The end result of making semistructured data fit into a structured data model is the explosion of tables and a complicated data model to store simple data.

Due to the lack of alternatives, the solutions have been overtly relying on RDBMS to address concerns regarding data handling. The problem with this approach is RDBMS, which was primarily designed to address consistency and the availability perspective of data handling, also started to store data, which had concerns of partition tolerance. The end result was a bloated RDBMS with a very complex data model. This started impacting the nonfunctional requirements of a solution negatively, in the areas of fault management, performance, scalability, manageability, maintainability, and agility.

Another area of concern was Data Interpretation, which is very important while designing the data layer. In a solution, the same data is viewed and interpreted differently by a different concerned group. To give a better idea, let's say that we have an e-commerce website that sells products. Three basic functional domains come into play in the design of this data layer; they are inventory management, account management, and customer management. From a core business standpoint, all the domains need atomicity, consistency, isolation, durability (ACID) properties in their data management, and from the CAP theorem point of view, they need consistency and availability. However, if the website needs to understand its customer in real time, an analytics team needs to analyze data from the inventory management, account management, and customer management domains. Apart from other data, it might collect separately at real time. The way the analytics team views the same data is totally different from the way other teams view it; for them, consistency is less of a concern, as they are more interested in the overall statistics, and a little inconsistent data will have no impact on the overall report. If all the data required for analytics from these domains is kept in the same data model as that for core business, the analytics will run into difficulty because it has to now work with this highly normalized and optimized structured data for business operations. The analytics team will also like to have their data denormalized for faster analysis.

Now, running real-time analytics on this normalized data on a RDBMS system will require heavy compute resources, which will impact the performance of core business during business hours. So, it is better for overall business if separate data models are created for these domains, one for business and one for analytics, where each is maintained separately as they have separate concerns. We will see in subsequent topics why RDBMS is not the right fit for analytics and some other use cases and how NoSQL solves the problem of explosion of data.

The NoSQL primer

Not only SQL or NoSQL, as it is popularly called, was coined by Carlo Strozzi in 1998 and was reintroduced by Eric Evans in 2009. This is an exciting area in data handling which, in a way, has filled up the many gaps existing in the data handling layer. Before the emergence of NoSQL as an alternate choice to store data, SQL-oriented databases (RDBMS) were the only choice available for the developers to position or retrofit their data. In other words, RDBMS was one hammer to nail all data problems. When NoSQL and its different categories started emerging, data models and data sizes that were not meant for RDBMS started finding NoSQL as a perfect datastore. There was also a shift in attention from a consistency standpoint; there was a shift was from ACID to BASE properties.

ACID properties represent the consistency and availability of the CAP theorem. These properties are exhibited by RDBMS and stand for the following:

  • Atomicity: In a transaction, all operations will complete or none will be completed (rollback)
  • Consistency: The database will be in a consistent state during the start and end of a transaction and cannot leave the state in between
  • Isolation: There will be no interference among the concurrent transactions
  • Durability: Once a transaction commits, it will remain so even after the server restarts or fails

BASE properties are exhibited by NoSQL; they represent the availability and partition tolerance of the CAP theorem. They basically give up on the strong consistency shown by RDBMS. BASE stands for following features:

  • Basically available: This guarantees a response to a request even if the data is in the stale state.
  • Soft state: The state of the data is always in a position to accept change even when there is no request to change its state. What this means is that suppose there are two nodes holding the same state of a data (the replication of data), if there is a request to change the state in one of the nodes, the state in the other node will not change during the lifespan of the request. The data in the other node will change its state due to an asynchronous process triggered by the datastore, thus making the state soft.
  • Eventually consistent: Due to the distributed nature of the nodes, the system will eventually become consistent.

    Note

    The data write and reads should be faster and easier.

Another interesting development took place in the field of software development. Vertical scalability had reached its limit and solutions had to be designed that were horizontally scalable in nature, so the data layer also had to be distributed and partition tolerant. Apart from that social media solution, online gaming and game theory-based websites (where target marketing was done, that is, users are rewarded based on their purchase history with the site. These kind of sites need real-time analytics) started gaining prominence. Social media wanted the synching of huge amount of data from across geographies in the shortest possible time, and the gaming world was interested in high performance. E-commerce sites were interested in knowing about their customers and products in real time, as well as profiling their customers to know their needs before they could realize the need for it. The categories in NoSQL that emerged based on different data models are as follows:

  • Graph-oriented NoSQL
  • Document-oriented NoSQL
  • Key-value oriented NoSQL
  • Column-oriented NoSQL

Graph-oriented NoSQL

Graph databases are a special kind of NoSQL databases. The data models stored by graph databases are graph structures, which are a bit different from other datastores. A graph structure consists of a node, edges, and properties. The way to understand graph databases is to think of them as mindmaps with bidirectional relationships. What this means is that if A is related to B and B is related to C, then C is related to A. Graph databases tend to solve the problems that arise out of relationships formed among unstructured entities at runtime, which can be bidirectional. As compared to this, RDBMS also has a concept of relationships called table joins, but these relationships are on structured data and cannot be bidirectional.

Moreover, these table joins add complexity to the data model with foreign keys and have performance penalties on table join-based queries when the dataset grows over a period time. A few of the most promising graph datastores are Neo4i, FlockDB, OrientDB, and so on.

To understand this better, let's take a sample use case and see how easy it becomes to solve complex graph-based business use cases with graph-oriented NoSQL. The following figure is a sample use case, which an e-commerce website might be interested in solving. The use case is to capture visitors' purchase history and people's relationships in the microblogging component of the website.

Graph-oriented NoSQL

Sample module for graph DB

Business entities such as the publisher, author, customer, product, and so on are represented as nodes in the graph. Relationships such as authored by, author, publisher, published by, and so on are represented by edges in the graph. Interestingly, a nonbusiness node, such as user-1, which is from the blogging site, can be represented in the graph along with its relationship, follows, with the other node, user-2. By combining the business and nonbusiness entities, the website can find target customers for the products. In the graph, both nodes and edges have properties that are used while running analytics.

The following set of questions can be easily answered by a graph database based on the relationships stored in the systems:

  • Who authored Learning Redis?

    Answer: Vinoo Das

  • How are Packt Publishing and Learning Redis related?

    Answer: Publisher

  • Who has their own NoSQL book published by Packt Publishing?

    Answer: user-2

  • Who is following the customer who has purchased Learning Redis and is interested in NoSQL?

    Answer: user-1

  • List all the NoSQL books that cost less than X USD and that can be bought by the followers of user-2.

    Answer: Learning Redis

Document-oriented NoSQL

Document-oriented datastores are designed to store data with the philosophy of storing a document. To understand this simplistically, the data here is arranged in the form of a book. A book can be divided into any number of chapters, where each chapter can be divided into any number of topics, and each topic is further divided into subtopics and so on and so forth.

Document-oriented NoSQL

Composition of a book

If the data has a similar structure, that is, it is hierarchical and does not have a fixed depth or schema, then document-oriented datastores are the perfect option to store such data. MongoDB and CouchDB (Couchbase) are two well-known document-oriented datastores that are getting a lot of attention these days. Like a book, which has indexes for faster searches, these datastores also have the indexes of keys stored in memory for faster searches.

Document-oriented datastores have data stored in the XML, JSON, and other formats. They can hold scalar values, maps, lists, and tuples as values. Unlike RDBMS, where the data is viewed as rows of data stored in a tabular form, the data stored here is in a hierarchical tree-like structure where every value stored in these datastores is always associated with a key. Another unique feature is that document-oriented datastores are schema-less. The following screenshot shows an example which shows how the data is stored in document-oriented datastores. The format in which the data is stored is JSON. One of the beauties of document-oriented datastores is that the information can be stored in the way you think of the data. This, in a way, is a paradigm shift from RDBMS, where the data is broken into various smaller parts and then stored in rows and columns in a normalized way.

Document-oriented NoSQL

Composition of sample data in JASON format

The two most famous document-oriented stores in use are MongoDB and CouchDB, and it will be interesting to pit them against each other in order to have a better overview.

Salient features of MongoDB and CouchDB

Well, the fact that both MongoDB and CouchDB are document-oriented is established, but both differ in various aspects, which will be of interest to people who want to learn about document-oriented datastores and adopt them in their projects. Following are some features of MongoDB and CouchDB:

  • Insertion of small and large data sets: Both MongoDB and CouchDB are very good for the insertion of small data sets. MongoDB is a tad better than CouchDB when it comes to the insertion of large data sets. Overall, speed consistencies are very good in both of these document datastores.
  • Random reads: Both MongoDB and CouchDB are fast when it comes to read speeds. MongoDB is a tad better when it comes to reading large data sets.
  • Fault tolerance: Both MongoDB and CouchDB have comparable and good fault tolerance capability. CouchDB uses Erlang/OTP as the underlying technology platform for its implementation. Erlang is a language and a platform that was developed to make fault-tolerant, scalable, and highly concurrent systems. The fact that Erlang act as a backbone for CouchDB gives it a very good fault-tolerant capability. MongoDB uses C++ as the primary language for its underlying implementation. Industry adoption and its proven track record in the area of fault tolerance give MongoDB a good heads-up in this area.
  • Sharding: MongoDB has an in-built sharding capability, whereas CouchDB does not. Nevertheless, Couchbase, which is another document datastore built on top of CouchDB, has an automatic sharding capability.
  • Load balancing: MongoDB and CouchDB have a good load balancing capability. However, since the underlying technology, that is the actor paradigm, in CouchDB has a good provision for load balancing, it can be said that the capability in CouchDB scores over the capability in MongoDB.
  • Multi-data center support: CouchDB has multi-data center support, whereas MongoDB at the time of researching for this book, didn't have this support. However, I guess that in the future, with the popularity of MongoDB, we can expect it.
  • Scalability: Both CouchDB and MongoDB are highly scalable.
  • Manageability: Both CouchDB and MongoDB have good manageability.
  • Client: CouchDB has JSON for data exchange, whereas MongoDB has BSON, which is proprietary to MongoDB.

Column-oriented NoSQL

Column-oriented NoSQL is designed with the philosophy to store data in columns rather than rows. This way to store data is diametrically opposite to the way data is stored in RDBMS, such as in rows. Column-oriented databases are designed from the ground up to be highly scalable and hence, are distributed in nature. They give up on consistency to have this massive scalability.

The following screenshot is a depiction of a small inventory for smart tablets based on our perception; here, the idea is to show how the data is stored in RDBMS as compared to the data stored in a columnar database:

Column-oriented NoSQL

Presentation of data in columns and rows

The preceding tabular data is stored in RDBMS in the hard disk, in the format shown here:

Column-oriented NoSQL

Data serialized as columns

The source of the information in the preceding screenshot is http://en.wikipedia.org/wiki/Column-oriented_DBMS.

The same data in a columnar datastore will be stored as shown in the following figure; here, the data is serialized in columns:

Column-oriented NoSQL

Data serialized as rows

A world where vertical scalability is reaching its limit and horizontal scalability is the way organizations want to adopt to store data, columnar datastores are offering solutions that can store petabytes of data in a very cost-effective way. Google, Yahoo!, Facebook, and so on have pioneered the storage of data in a columnar way, and the proof is in the pudding, that is, the amount of data that these companies store is a well-known fact. HBase and Cassandra are a few of the well-known products that are columnar in nature and can store a huge amount of data. Both the datastores are built with eventual consistency in mind. The underlying language in the case of HBase and Cassandra is Java; it will be interesting to put them against each other in order to have a better overview.

Salient features of HBase and Cassandra

HBase is a datastore that belongs to the category of columnar-oriented datastores. This datastore came into existence after Hadoop became popular with its HDFS file storage system, inspired from the Google File System paper published in 2003. The fact that HBase is based on Hadoop makes it an excellent choice for data warehousing and large-scale data processing and analysis. HBase provides a SQL-type interface over the existing Hadoop ecosystem, which is similar to the way we have been viewing data in a RDBMS, that is row-oriented, but the data is stored in a column-oriented way internally. HBase stores row data against a row key, and it is in a sorted order as per the row key. It has components such as the Region Server, which can be plugged to the DataNode provided with Hadoop. This means that the Region Server is collocated with the DataNode and acts as a gateway for interacting with HBase clients. Behind the scenes, the HBase master handles the DDL operations. Apart from this, it also manages the Region assignments and other book keeping activities associated with that. Cluster information and management, which includes state management, is taken care of by Zookeeper nodes. HBase clients interact directly with Region Servers to put and get data. Components such as Zookeeper (used to coordinate between the master and slave nodes), Name Node, and HBase master node do not participate directly in the exchange of data between the HBase client and Region Server nodes.

Salient features of HBase and Cassandra

HBASE node set up

Cassandra is a datastore which belongs to the category of columnar-oriented datastores and also shows some features of the key-value datastore. Cassandra, which was initially started by Facebook but later forked to the Apache open source community, is best suited for real-time transaction processing and real-time analytics.

One of the key differentiators between Cassandra and HBase is that unlike HBase, which depends on the existing architecture of Hadoop, Cassandra is standalone in nature. Cassandra takes its inspiration from Amazon's Dynamo to store data. In short, the architectural approach of HBase makes the Region Server and DataNodes dependent on other components such as HBase master, Name Node, Zookeeper, whereas the nodes in Cassandra manage these responsibilities within and thus are not dependent on external components.

A Cassandra cluster can be viewed as a ring of nodes, of which there are a few seeds. These seeds are like any node but are responsible for up-to-date cluster state data. In the event of a seed node going down, a new seed can be elected among the available nodes. The data is distributed evenly across the ring, depending on the hash value of the row key. In Cassandra, data can be queried according to its row-key. Clients for Cassandra come in many flavors; that is, Thrift is one of the most native clients that can be used to interact with the Cassandra ring. Apart from this, there are clients that expose the Cassandra Query Language (CQL) interface, which has quite a resemblance to SQL.

Salient features of HBase and Cassandra

Cassandra nodes set up

  • Insertion of small and large data sets: Both HBase and Cassandra are very good at the insertion of small data sets. The fact that both these datastores use multiple nodes to distribute writes on top of it. Both of them write the data first to memory-based storage such as RAM, which makes its insertion performance good.
  • Random reads: Both HBase and Cassandra are fast when it comes to read speeds. In HBase, consistency was one of the key features that was kept in mind when designing the architecture. In Cassandra, data consistency was kept tunable, but one has to sacrifice speed in order to have higher consistency.
  • Eventual consistency: HBase has strong consistency and Cassandra has eventual consistency, but interestingly, the consistency model in Cassandra is tunable. It can be tuned to have better consistency, but one has to give up performance in the read and write speeds.
  • Load balancing: HBase and Cassandra have load balancing built into them. The idea is to have many nodes serving read and writes on a commodity grade node. Consistent hashing is used to distribute the load between the nodes.
  • Sharding: HBase and Cassandra both have sharding capability. This is essential since both claim to give good performance from a commodity grade node, which has limited disk and memory space.
  • Multi-data center support: Of the two, Cassandra has multi-data center support.
  • Scalability: HBase and Cassandra have very good scalability, which was one of the design requirements.
  • Manageability: Of the two, Cassandra has better manageability. This is because in Cassandra, there are nodes to manage but in HBase, there are many components that need to work in tandem, such as Zookeeper, DataNode, Name Node, Region Server, and so on.
  • Client: Both HBase and Cassandra have clients in Java, Python, Ruby, Node.js, and many more, making it easy to work with heterogeneous environments.

Key value-oriented NoSQL

Key-value datastores are probably one of the fastest and simplest NoSQL databases. In their most simplistic form, they can be understood as a big hash table. From a usage perspective, every value stored in the database has a key. The key can be used to search for values and the values can be deleted by deleting the key. Some popular choices in key-value databases are Redis, Riak, Amazon's DynamoDB, project voldermort, and more.

How does Redis fare in some of the nonfunctional requirements as a key-value datastore?

Redis is one of the fastest key-value stores, which is seeing a very fast adoption throughout the industry, cutting across many domains. Since this book focuses on Redis, let's find out a bit more about how Redis fares in some of the nonfunctional requirements in brief. We will be talking about them in length as the book progresses:

  • Insertion of data sets: The insertions of data sets is very fast in key-value datastores and Redis is no exception.
  • Random reads: Random reads are very fast in key-value datastores. In Redis, all the keys are stored in memory. This ensures faster lookups, so the read speeds are higher. While it will be great if all the keys and values are kept in memory, this has a drawback. The problem with this approach is that memory requirements will be very high. Redis takes care of this by introducing something called virtual memory. Virtual memory will keep all the keys in the memory but will write the least recently-used values to disk.
  • Fault tolerance: Fault handling in Redis depends on the cluster's topology. Redis uses the master-slave topology for its cluster deployment. All the data in the master is asynchronously copied to the slave; so, in case the master node goes to the failure state, one of the slave nodes can be promoted to master using the Redis sentinel.
  • Eventual consistency: Key-value datastores have master-slave topology, which means that once the master is updated, all the slave nodes are updated asynchronously. This can be envisaged in Redis since slaves are used by clients for a read-only mode; it is possible that the master might have the latest value written but while reading from the slave, the client might get the stale value because the master has not updated the slaves. Thus, this lag can cause inconsistency for a brief moment.
  • Load balancing: Redis has a simple way of achieving load balancing. As previously discussed, the master is used to write data, and slaves are used to read the data. So, the clients should have the logic built into them, have the read request evenly spread across the slave nodes, or use third-party proxies, such as Twemproxy to do so.
  • Sharding: It is possible to have datasets that are bigger than the available memory, which makes presharding the data across various peer nodes a horizontal scalable option.
  • Multi-data center support: Redis and key-value NoSQL do not provide inherent multi-data center support where the replications are consistent. However, we can have the master node in one data center and slaves in the other data center, but we will have to live with eventual consistency.
  • Scalability: When it comes to scaling and data partitioning, the Redis server lacks the logic to do so. Primarily, the logic to partition the data across many nodes should reside with the client or should use third-party proxies such as Twemproxy.
  • Manageability: Redis as a key value NoSQL is simple to manage.
  • Client: There are clients for Redis in Java, Python, and Node.js that implement the REdis Serialization Protocol (RESP).

Use cases of NoSQL

Understand your business first; this will help you to understand your data. This will also give you deep insights on the kind of data layer that you need to have. The idea is to have a top-to-bottom design methodology. Deciding on the persistence mechanism first and then fitting the data for the business use case in that persistence mechanism is a bad idea (bottom-to-top design methodology). So, define your business requirements first, decide on the roadmap for the future, and then decide on the data layer. Another important factor to take into consideration when understanding the business requirements specification is to factor the nonfunctional requirements for every business use case, which I believe is paramount.

Failing to add a nonfunctional requirement in the business or, functional requirement causes problems when the system goes to performance test or worse, when it goes live. If you feel that the data model requires NoSQL from a functional requirement standpoint, then ask a few questions as follows:

  • What type of NoSQL do you need for the data model?
  • How big can the data grow, and how much scalability is required?
  • How will you handle node failure? What is its impact on your business use case?
  • Which is better data replication or infrastructure investment when data is growing?
  • What are the strategies for handling read/write loads and how much concurrency is planned?
  • What is the level of data consistency required for the business use case?
  • How will the data reside (on a single data center or multiple data centers across geographies)?
  • What are the clustering strategies and data synch strategies?
  • What are the data backup strategies?
  • What kind of network topology do you plan to use? What is the impact of network latency on performance?
  • How comfortable is the team in handling, monitoring, administrating, and developing in the polyglot persistence environment?

Here's the summary of some of the NoSQL databases and how they are placed as per the CAP theorem. The following chart does not claim to be exhaustive, but is a snapshot of the most popular ones:

Use cases of NoSQL

NoSQL databases placed as per CAP theorem

Let's analyze how companies are using NoSQL, which will give us ideas on how we can use NoSQL in our solutions effectively:

  • Big data: This very term evokes a picture of hundreds and thousands of servers crunching petabytes of data for analysis. The use case for big data is self-evident and simple to argue for using NoSQL datastores. Columnar databases, one of the patterns of NoSQL, are the obvious choice for this kind of activity. Being distributed in nature, these solutions also have no single point of failure, parallel computing, write availability, and scalability. The following is a sample list of the different types of use cases where companies have successfully used columnar datastores in their business:
    • Spotify uses Hadoop for data aggregation, reporting, and analysis
    • Twitter uses Hadoop to process tweets and log files
    • Netflix uses Cassandra for their backend datastore in order to stream services
    • Zoho uses Cassandra to generate inbox previews for mail services
    • Facebook uses Cassandra for its Instagram operations
    • Facebook uses HBase in its message infrastructure
    • Su.pr uses HBase for real-time data storage and the analytics platform
    • HP IceWall SSO uses HBase to store user data in order to authenticate users for their web-based single sign-on solution
  • Heavy read/write: This nonfunctional requirement instantly gives us the impression of a social or a gaming website. For Enterprises where this is a requirement, they can take inspiration for the choice of NoSQL.
    • LinkedIn uses Voldermort (the key-value datastore) to cater to millions of read and writes per day under a few milliseconds
    • Wooga (a social network game and mobile developer) uses Redis for its gaming platform; some of the games have a million plus users in a day
    • Twitter caters to 200 million tweets a day and uses NoSQL, such as Cassandra, HBase, Memcached, and FlockDB, and also uses RDBMS, such as MySQL
    • Stack overflow uses Redis to cater to 30 million registered users in a month
  • Document store: The growth of Web 2.0 adoption and the rise in Internet content is creating data that is schema-less in nature. Having NoSQL (document-oriented) specially designed to store this kind of data makes the job of a developer simpler and the solution more stable in nature. Following are the examples of some companies that use different document stores:
    • SourceForge uses MongoDB to store front pages, project pages, and download pages; Allura on SourceForge is based on MongoDB
    • MetLife uses MongoDB for datastore for the wall, a customer service platform
    • Semantic News Portal uses CouchDB to store news data
    • Vermont public radio website's homepage uses CouchDB to stores news headlines, commentaries and more
    • AOL advertising uses Couchbase (a new avatar of CouchDB) to serve billions of impressions a month for 100 million plus users
  • Real-time experience and e-commerce platform: Shopping carts, user profile management, voting, user session management, real-time page counters, real-time analytics, and more are the services that are being offered by companies to give real-time experience to the end user. Following are the examples of some companies that use real-time experience and e-commerce platform:
    • Flickr push uses Redis to push real-time updates
    • Instagram uses Redis to store hundreds and millions of media content against keys and to serve them in real time
    • Digg uses Redis for its page views and user clicks solution
    • Best Buy uses Riak for its e-commerce platform

Summary

In this chapter, you saw how the Internet world is undergoing a paradigm shift, the evolution of the NoSQL world, and how social media is championing NoSQL adoption. You also saw the various alternatives in the NoSQL world and how they equate. Finally, you saw how Redis maps up in the NoSQL ecosystem.

In the next chapter, we will take a plunge into the world of Redis.

Left arrow icon Right arrow icon

Description

This book is for SQL developers who want to learn about Redis, the key value database for scalability and performance. Prior understanding of a programming language is essential; however no knowledge of NoSQL is required.

Who is this book for?

This book is for SQL developers who want to learn about Redis, the key value database for scalability and performance. Prior understanding of a programming language is essential; however no knowledge of NoSQL is required.

What you will learn

  • Familiarise yourself with NoSQL and install Redis
  • Build solutions and enhance your web applications in Redis
  • Understand the persistent mechanism for better scalability
  • Configure and tune the server to improve performance
  • Identify bottlenecks and how to handle fault management in Redis
  • Learn about backups and recovery strategies for the Redis environment
  • Discover the commands and functions of Redis
Estimated delivery fee Deliver to Chile

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 26, 2015
Length: 318 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980123
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Chile

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Publication date : Jun 26, 2015
Length: 318 pages
Edition : 1st
Language : English
ISBN-13 : 9781783980123
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 153.97
Redis Essentials
$43.99
Mastering Redis
$60.99
Learning Redis
$48.99
Total $ 153.97 Stars icon
Banner background image

Table of Contents

10 Chapters
1. Introduction to NoSQL Chevron down icon Chevron up icon
2. Getting Started with Redis Chevron down icon Chevron up icon
3. Data Structures and Communicating Protocol in Redis Chevron down icon Chevron up icon
4. Functions in the Redis Server Chevron down icon Chevron up icon
5. Handling Data in Redis Chevron down icon Chevron up icon
6. Redis in Web Applications Chevron down icon Chevron up icon
7. Redis in Business Applications Chevron down icon Chevron up icon
8. Clustering Chevron down icon Chevron up icon
9. Maintaining Redis Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.3
(3 Ratings)
5 star 0%
4 star 0%
3 star 66.7%
2 star 0%
1 star 33.3%
William Aug 27, 2015
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
I having a hard time following the writing style of this Author. It has been a struggle. Technical content is good.
Amazon Verified review Amazon
Simon Aug 02, 2015
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Words missing from sentences.It's Neo4j, not Neo4i.Doesn't appear to have been proof read before going in to production.
Amazon Verified review Amazon
Victor Plescan Jan 25, 2019
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
I had a rough understanding of what Redis is, and before applying it in production I decided to read a book to get a better understanding. This book had an alright rating(4.0 on Goodreads) and was in Java, exactly what I was needed.If there would be a book Understanding Redis, this will be the opposite of it. The author shies away from explanatory parts and sticks the code or diagrams whatever possible. As result, the book is clattered by unnecessary code and meaningless diagrams.There is no clear audience for the author, he starts explaining data-types and Big O notations like for non-technical audience but then complementing it with analogies from Java language. As result, I believe it’s not readable for a non-technical person and definitely boring for a Software Engineer, even without any knowledge in Redis.Most of the “learning” is just lists of commands with a brief description and “Execute the program and analyse the result yourself” note. The code examples are mostly are full listings of the programmes, with imports, getters/setters, System.out any other Java-specific stuff, probably to keep book long. In the end, it’s just a clutter of commands with brief text explaining what the program or command does.E. g. Chapter of “Redis in Web Applications” will just lead through the whole code of a simplified app author created. By “lead”, I mean will put the class listing and tell in one-two phrases what it does, in general terms. That’s all.Any documentation would provide better structured and explanatory information about Redis. Usually, books excel at explaining the idea, the use, the design patterns, best practices, building up the understanding of the technology. None of that here.I bought a kindle version from Amazon, the GO-TO menu is not divided by chapters, which makes the book difficult to navigate, only by page or location.I seldom rate books 1 star, even if I don’t like it, I try to find the prons to give it at least two stars, but I cannot call this a book. I wish I could give it 0 stars. The ratio of price vs value makes it nothing short then a scam, in my opinion.And if anything above will not discourage you from reading it, worth note that the Redis 2.6 was used in the book. Which wasn’t the latest version even at the time of publishing.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela