Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Ceph Cookbook
Ceph Cookbook

Ceph Cookbook: Practical recipes to design, implement, operate, and manage Ceph storage systems , Second Edition

Arrow left icon
Profile Icon Karan Singh Profile Icon Hackett Profile Icon Umrao
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2 (9 Ratings)
Paperback Nov 2017 466 pages 2nd Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Karan Singh Profile Icon Hackett Profile Icon Umrao
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2 (9 Ratings)
Paperback Nov 2017 466 pages 2nd Edition
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€20.98 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Ceph Cookbook

Ceph – Introduction and Beyond

In this chapter, we will cover the following recipes:

  • Ceph – the beginning of a new era
  • RAID – the end of an era
  • Ceph – the architectural overview
  • Planning a Ceph deployment
  • Setting up a virtual infrastructure
  • Installing and configuring Ceph
  • Scaling up your Ceph cluster
  • Using Ceph clusters with a hands-on approach

Introduction

Ceph is currently the hottest software-defined storage (SDS) technology and is shaking up the entire storage industry. It is an open source project that provides unified software-defined solutions for block, file, and object storage. The core idea of Ceph is to provide a distributed storage system that is massively scalable and high performing with no single point of failure. From the roots, it has been designed to be highly scalable (up to the exabyte level and beyond) while running on general-purpose commodity hardware.

Ceph is acquiring most of the traction in the storage industry due to its open, scalable, and reliable nature. This is the era of cloud computing and software-defined infrastructure, where we need a storage backend that is purely software-defined and, more importantly, cloud-ready. Ceph fits in here very well, regardless of whether you are running a public, private, or hybrid cloud.

Today's software systems are very smart and make the best use of commodity hardware to run gigantic-scale infrastructures. Ceph is one of them; it intelligently uses commodity hardware to provide enterprise-grade robust and highly reliable storage systems.

Ceph has been raised and nourished with the help of the Ceph upstream community with an architectural philosophy that includes the following:

  • Every component must scale linearly
  • There should not be any single point of failure
  • The solution must be software-based, open source, and adaptable
  • The Ceph software should run on readily available commodity hardware
  • Every component must be self-managing and self-healing wherever possible

The foundation of Ceph lies in objects, which are its building blocks. Object storage such as Ceph is the perfect provision for current and future needs for unstructured data storage. Object storage has its advantages over traditional storage solutions; we can achieve platform and hardware independence using object storage. Ceph plays meticulously with objects and replicates them across the cluster to avail reliability; in Ceph, objects are not tied to a physical path, making object location independent. This flexibility enables Ceph to scale linearly from the petabyte to the exabyte level.

Ceph provides great performance, enormous scalability, power, and flexibility to organizations. It helps them get rid of expensive proprietary storage silos. Ceph is indeed an enterprise-class storage solution that runs on commodity hardware; it is a low-cost yet feature-rich storage system. Ceph's universal storage system provides block, file, and object storage under one hood, enabling customers to use storage as they want.

In the following section we will learn about Ceph releases.

Ceph is being developed and improved at a rapid pace. On July 3, 2012, Sage announced the first LTS release of Ceph with the code name Argonaut. Since then, we have seen 12 new releases come up. Ceph releases are categorized as Long Term Support (LTS), and stable releases and every alternate Ceph release are LTS releases. For more information, visit https://Ceph.com/category/releases/.

Ceph release name

Ceph release version

Released On

Argonaut

V0.48 (LTS)

July 3, 2012

Bobtail

V0.56 (LTS)

January 1, 2013

Cuttlefish

V0.61

May 7, 2013

Dumpling

V0.67 (LTS)

August 14, 2013

Emperor

V0.72

November 9, 2013

Firefly

V0.80 (LTS)

May 7, 2014

Giant

V0.87.1

Feb 26, 2015

Hammer

V0.94 (LTS)

April 7, 2015

Infernalis

V9.0.0

May 5, 2015

Jewel

V10.0.0 (LTS)

Nov, 2015

Kraken

V11.0.0

June 2016

Luminous

V12.0.0 (LTS)

Feb 2017
Here is a fact: Ceph release names follow an alphabetic order; the next one will be an M release. The term Ceph is a common nickname given to pet octopuses and is considered a short form of Cephalopod, which is a class of marine animals that belong to the mollusk phylum. Ceph has octopuses as its mascot, which represents Ceph's highly parallel behavior, similar to octopuses.

Ceph – the beginning of a new era

Data storage requirements have grown explosively over the last few years. Research shows that data in large organizations is growing at a rate of 40 to 60 percent annually, and many companies are doubling their data footprint each year. IDC analysts have estimated that worldwide, there were 54.4 exabytes of total digital data in the year 2000. By 2007, this reached 295 exabytes, and by 2020, it's expected to reach 44 zettabytes worldwide. Such data growth cannot be managed by traditional storage systems; we need a system such as Ceph, which is distributed, scalable and most importantly, economically viable. Ceph has been especially designed to handle today's as well as the future's data storage needs.

Software-defined storage – SDS

SDS is what is needed to reduce TCO for your storage infrastructure. In addition to reduced storage cost, SDS can offer flexibility, scalability, and reliability. Ceph is a true SDS solution; it runs on commodity hardware with no vendor lock-in and provides low cost per GB. Unlike traditional storage systems, where hardware gets married to software, in SDS, you are free to choose commodity hardware from any manufacturer and are free to design a heterogeneous hardware solution for your own needs. Ceph's software-defined storage on top of this hardware provides all the intelligence you need and will take care of everything, providing all the enterprise storage features right from the software layer.

Cloud storage

One of the drawbacks of a cloud infrastructure is the storage. Every cloud infrastructure needs a storage system that is reliable, low-cost, and scalable with a tighter integration than its other cloud components. There are many traditional storage solutions out there in the market that claim to be cloud-ready, but today, we not only need cloud readiness, but also a lot more beyond that. We need a storage system that should be fully integrated with cloud systems and can provide lower TCO without any compromise to reliability and scalability. Cloud systems are software-defined and are built on top of commodity hardware; similarly, it needs a storage system that follows the same methodology, that is, being software-defined on top of commodity hardware, and Ceph is the best choice available for cloud use cases.

Ceph has been rapidly evolving and bridging the gap of a true cloud storage backend. It is grabbing the center stage with every major open source cloud platform, namely OpenStack, CloudStack, and OpenNebula. Moreover, Ceph has succeeded in building up beneficial partnerships with cloud vendors such as Red Hat, Canonical, Mirantis, SUSE, and many more. These companies are favoring Ceph big time and including it as an official storage backend for their cloud OpenStack distributions, thus making Ceph a red-hot technology in cloud storage space.

The OpenStack project is one of the finest examples of open source software powering public and private clouds. It has proven itself as an end-to-end open source cloud solution. OpenStack is a collection of programs, such as Cinder, Glance, and Swift, which provide storage capabilities to OpenStack. These OpenStack components require a reliable, scalable, and all in one storage backend such as Ceph. For this reason, OpenStack and Ceph communities have been working together for many years to develop a fully compatible Ceph storage backend for the OpenStack.

Cloud infrastructure based on Ceph provides much-needed flexibility to service providers to build Storage-as-a-Service and Infrastructure-as-a-Service solutions, which they cannot achieve from other traditional enterprise storage solutions as they are not designed to fulfill cloud needs. Using Ceph, service providers can offer low-cost, reliable cloud storage to their customers.

Unified next-generation storage architecture

The definition of unified storage has changed lately. A few years ago, the term unified storage referred to providing file and block storage from a single system. Now because of recent technological advancements, such as cloud computing, big data, and internet of Things, a new kind of storage has been evolving, that is, object storage. Thus, all storage systems that do not support object storage are not really unified storage solutions. A true unified storage is like Ceph; it supports blocks, files, and object storage from a single system.

In Ceph, the term unified storage is more meaningful than what existing storage vendors claim to provide. It has been designed from the ground up to be future-ready, and it's constructed such that it can handle enormous amounts of data. When we call Ceph future ready, we mean to focus on its object storage capabilities, which is a better fit for today's mix of unstructured data rather than blocks or files. Everything in Ceph relies on intelligent objects, whether it's block storage or file storage. Rather than managing blocks and files underneath, Ceph manages objects and supports block-and-file-based storage on top of it. Objects provide enormous scaling with increased performance by eliminating metadata operations. Ceph uses an algorithm to dynamically compute where the object should be stored and retrieved from.

The traditional storage architecture of SAN and NAS systems is very limited. Basically, they follow the tradition of controller high availability; that is, if one storage controller fails, it serves data from the second controller. But, what if the second controller fails at the same time, or even worse, if the entire disk shelf fails? In most cases, you will end up losing your data. This kind of storage architecture, which cannot sustain multiple failures, is definitely what we do not want today. Another drawback of traditional storage systems is their data storage and access mechanism. They maintain a central lookup table to keep track of metadata, which means that every time a client sends a request for a read or write operation, the storage system first performs a lookup in the huge metadata table, and after receiving the real data location, it performs the client operation. For a smaller storage system, you might not notice performance hits, but think of a large storage cluster—you would definitely be bound by performance limits with this approach. This would even restrict your scalability.

Ceph does not follow this traditional storage architecture; in fact, the architecture has been completely reinvented. Rather than storing and manipulating metadata, Ceph introduces a newer way: the CRUSH algorithm. CRUSH stands for Controlled Replication Under Scalable Hashing. Instead of performing a lookup in the metadata table for every client request, the CRUSH algorithm computes on demand where the data should be written to or read from. By computing metadata, the need to manage a centralized table for metadata is no longer there. Modern computers are amazingly fast and can perform a CRUSH lookup very quickly; moreover, this computing load, which is generally not too much, can be distributed across cluster nodes, leveraging the power of distributed storage. In addition to this, CRUSH has a unique property, which is infrastructure awareness. It understands the relationship between various components of your infrastructure and stores your data in a unique failure zone, such as a disk, node, rack, row, and data center room, among others. CRUSH stores all the copies of your data such that it is available even if a few components fail in a failure zone. It is due to CRUSH that Ceph can handle multiple component failures and provide reliability and durability.

The CRUSH algorithm makes Ceph self-managing and self-healing. In the event of component failure in a failure zone, CRUSH senses which component has failed and determines the effect on the cluster. Without any administrative intervention, CRUSH self-manages and self-heals by performing a recovering operation for the data lost due to failure. CRUSH regenerates the data from the replica copies that the cluster maintains. If you have configured the Ceph CRUSH map in the correct order, it makes sure that at least one copy of your data is always accessible. Using CRUSH, we can design a highly reliable storage infrastructure with no single point of failure. This makes Ceph a highly scalable and reliable storage system that is future-ready. CRUSH is covered more in detail in Chapter 9, Ceph Under the Hood.

RAID – the end of an era

The RAID technology has been the fundamental building block for storage systems for years. It has proven successful for almost every kind of data that has been generated in the last 3 decades. But all eras must come to an end, and this time, it's RAID's turn. These systems have started showing limitations and are incapable of delivering to future storage needs. In the course of the last few years, cloud infrastructures have gained a strong momentum and are imposing new requirements on storage and challenging traditional RAID systems. In this section, we will uncover the limitations imposed by RAID systems.

RAID rebuilds are painful

The most painful thing in a RAID technology is its super-lengthy rebuild process. Disk manufacturers are packing lots of storage capacity per disk. They are now producing an extra-large capacity of disk drives at a fraction of the price. We no longer talk about 450 GB, 600 GB, or even 1 TB disks, as there is a larger capacity of disks available today. The newer enterprise disk specification offers disks up to 4 TB, 6 TB, and even 10 TB disk drives, and the capacities keep increasing year by year.

Think of an enterprise RAID-based storage system that is made up of numerous 4 TB or 6 TB disk drives. Unfortunately, when such a disk drive fails, RAID will take several hours and even up to days to repair a single failed disk. Meanwhile, if another drive fails from the same RAID group, then it would become a chaotic situation. Repairing multiple large disk drives using RAID is a cumbersome process.

RAID spare disks increases TCO

The RAID system requires a few disks as hot spare disks. These are just free disks that will be used only when a disk fails; else, they will not be used for data storage. This adds extra cost to the system and increases TCO. Moreover, if you're running short of spare disks and immediately a disk fails in the RAID group, then you will face a severe problem.

RAID can be expensive and hardware dependent

RAID requires a set of identical disk drivers in a single RAID group; you would face penalties if you change the disk size, rpm, or disk type. Doing so would adversely affect the capacity and performance of your storage system. This makes RAID highly choosy about the hardware.

Also, enterprise RAID-based systems often require expensive hardware components, such as RAID controllers, which significantly increases the system cost. These RAID controllers will become single points of failure if you do not have many of them.

The growing RAID group is a challenge

RAID can hit a dead end when it's not possible to grow the RAID group size, which means that there is no scale-out support. After a point, you cannot grow your RAID-based system, even though you have money. Some systems allow the addition of disk shelves but up to a very limited capacity; however, these new disk shelves put a load on the existing storage controller. So, you can gain some capacity but with a performance trade-off.

The RAID reliability model is no longer promising

RAID can be configured with a variety of different types; the most common types are RAID5 and RAID6, which can survive the failure of one and two disks, respectively. RAID cannot ensure data reliability after a two-disk failure. This is one of the biggest drawbacks of RAID systems.

Moreover, at the time of a RAID rebuild operation, client requests are most likely to starve for I/O until the rebuild completes. Another limiting factor with RAID is that it only protects against disk failure; it cannot protect against a failure of the network, server hardware, OS, power, or other data center disasters.

After discussing RAID's drawbacks, we can come to the conclusion that we now need a system that can overcome all these drawbacks in a performance and cost-effective way. The Ceph storage system is one of the best solutions available today to address these problems. Let's see how.

For reliability, Ceph makes use of the data replication method, which means it does not use RAID, thus overcoming all the problems that can be found in a RAID-based enterprise system. Ceph is a software-defined storage, so we do not require any specialized hardware for data replication; moreover, the replication level is highly customized by means of commands, which means that the Ceph storage administrator can manage the replication factor of a minimum of one and a maximum of a higher number, totally depending on the underlying infrastructure.

In an event of one or more disk failures, Ceph's replication is a better process than RAID. When a disk drive fails, all the data that was residing on that disk at that point of time starts recovering from its peer disks. Since Ceph is a distributed system, all the data copies are scattered on the entire cluster of disks in the form of objects, such that no two object's copies should reside on the same disk and must reside in a different failure zone defined by the CRUSH map. The good part is that all the cluster disks participate in data recovery. This makes the recovery operation amazingly fast with the least performance problems. Furthermore, the recovery operation does not require any spare disks; the data is simply replicated to other Ceph disks in the cluster. Ceph uses a weighting mechanism for its disks, so different disk sizes is not a problem.

In addition to the replication method, Ceph also supports another advanced way of data reliability: using the erasure-coding technique. Erasure-coded pools require less storage space compared to replicated pools. In erasure-coding, data is recovered or regenerated algorithmically by erasure code calculation. You can use both the techniques of data availability, that is, replication as well as erasure-coding, in the same Ceph cluster but over different storage pools. We will learn more about the erasure-coding technique in the upcoming chapters.

Ceph – the architectural overview

The Ceph internal architecture is pretty straightforward, and we will learn about it with the help of the following diagram:

  • Ceph monitors (MON): Ceph monitors track the health of the entire cluster by keeping a map of the cluster state. They maintain a separate map of information for each component, which includes an OSD map, MON map, PG map (discussed in later chapters), and CRUSH map. All the cluster nodes report to monitor nodes and share information about every change in their state. The monitor does not store actual data; this is the job of the OSD.
  • Ceph object storage device (OSD): As soon as your application issues a write operation to the Ceph cluster, data gets stored in the OSD in the form of objects.

This is the only component of the Ceph cluster where actual user data is stored, and the same data is retrieved when the client issues a read operation. Usually, one OSD daemon is tied to one physical disk in your cluster. So in general, the total number of physical disks in your Ceph cluster is the same as the number of OSD daemons working underneath to store user data on each physical disk.

  • Ceph metadata server (MDS): The MDS keeps track of file hierarchy and stores metadata only for the CephFS filesystem. The Ceph block device and RADOS gateway do not require metadata; hence, they do not need the Ceph MDS daemon. The MDS does not serve data directly to clients, thus removing the single point of failure from the system.
  • RADOS: The Reliable Autonomic Distributed Object Store (RADOS) is the foundation of the Ceph storage cluster. Everything in Ceph is stored in the form of objects, and the RADOS object store is responsible for storing these objects irrespective of their data types. The RADOS layer makes sure that data always remains consistent. To do this, it performs data replication, failure detection, and recovery, as well as data migration and rebalancing across cluster nodes.
  • librados: The librados library is a convenient way to gain access to RADOS with support to the PHP, Ruby, Java, Python, C, and C++ programming languages. It provides a native interface for the Ceph storage cluster (RADOS) as well as a base for other services, such as RBD, RGW, and CephFS, which are built on top of librados. librados also supports direct access to RADOS from applications with no HTTP overhead.
  • RADOS block devices (RBDs): RBDs, which are now known as the Ceph block device, provide persistent block storage, which is thin-provisioned, resizable, and stores data striped over multiple OSDs. The RBD service has been built as a native interface on top of librados.
  • RADOS gateway interface (RGW): RGW provides object storage service. It uses librgw (the Rados Gateway Library) and librados, allowing applications to establish connections with the Ceph object storage. The RGW provides RESTful APIs with interfaces that are compatible with Amazon S3 and OpenStack Swift.
  • CephFS: The Ceph filesystem provides a POSIX-compliant filesystem that uses the Ceph storage cluster to store user data on a filesystem. Like RBD and RGW, the CephFS service is also implemented as a native interface to librados.
  • Ceph manager: The Ceph manager daemon (ceph-mgr) was introduced in the Kraken release, and it runs alongside monitor daemons to provide additional monitoring and interfaces to external monitoring and management systems.

Planning a Ceph deployment

A Ceph storage cluster is created on top of the commodity hardware. This commodity hardware includes industry-standard servers loaded with physical disk drives that provide storage capacity and some standard networking infrastructure. These servers run standard Linux distributions and Ceph software on top of them. The following diagram helps you understand the basic view of a Ceph cluster:

As explained earlier, Ceph does not have a very specific hardware requirement. For the purpose of testing and learning, we can deploy a Ceph cluster on top of virtual machines. In this section and in the later chapters of this book, we will be working on a Ceph cluster that is built on top of virtual machines. It's very convenient to use a virtual environment to test Ceph, as it's fairly easy to set up and can be destroyed and recreated anytime. It's good to know that a virtual infrastructure for the Ceph cluster should not be used for a production environment, and you might face serious problems with this.

Setting up a virtual infrastructure

To set up a virtual infrastructure, you will require open source software, such as Oracle VirtualBox and Vagrant, to automate virtual machine creation for you. Make sure you have the software installed and working correctly on your host machine. The installation processes of the software are beyond the scope of this book; you can follow their respective documentation in order to get them installed and working correctly.

Getting ready

You will need the following software to get started:

  • Oracle VirtualBox: This is an open source virtualization software package for host machines based on x86 and AMD64/Intel64. It supports Microsoft Windows, Linux, and Apple macOS X host operating systems. Make sure it's installed and working correctly. More information can be found at https://www.virtualbox.org.

Once you have installed VirtualBox, run the following command to ensure the installation was successful:

      # VBoxManage --version
  • Vagrant: This is software meant for creating virtual development environments. It works as a wrapper around virtualization software, such as VirtualBox, VMware, KVM, and so on. It supports the Microsoft Windows, Linux, and Apple macOS X host operating systems. Make sure it's installed and working correctly. More information can be found at https://www.vagrantup.com/. Once you have installed Vagrant, run the following command to ensure the installation was successful:
      # vagrant --version
  • Git: This is a distributed revision control system and the most popular and widely adopted version control system for software development. It supports Microsoft Windows, Linux, and Apple macOS X operating systems. Make sure it's installed
    and working correctly. More information can be found at http://git-scm.com/.

Once you have installed Git, run the following command to ensure the installation was successful:

       # git --version

How to do it...

Once you have installed the mentioned software, we will proceed with virtual machine creation:

  1. git clone ceph-cookbook repositories to your VirtualBox host machine:
      $ git clone https://github.com/PacktPublishing/Ceph-Cookbook-Second-Edition
  1. Under the cloned directory, you will find vagrantfile, which is our Vagrant configuration file that basically instructs VirtualBox to launch the VMs that we require at different stages of this book. Vagrant will automate the VM's creation, installation, and configuration for you; it makes the initial environment easy to set up:
      $ cd Ceph-Cookbook-Second-Edition ; ls -l
  1. Next, we will launch three VMs using Vagrant; they are required throughout this chapter:
      $ vagrant up ceph-node1 ceph-node2 ceph-node3
If the default Vagrant provider is not set to VirtualBox, set it to VirtualBox. To make it permanent, it can be added to user .bashrc file:
# export VAGRANT_DEFAULT_PROVIDER=virtualbox
# echo $VAGRANT_DEFAULT_PROVIDER
  1. Run vagrant up ceph-node1 ceph-node2 ceph-node3:
  1. Check the status of your virtual machines:
      $ vagrant status ceph-node1 ceph-node2 ceph-node3
The username and password that Vagrant uses to configure virtual machine is vagrant, and Vagrant has sudo rights. The default password for the root user is vagrant.
  1. Vagrant will, by default, set up hostnames as ceph-node<node_number> and IP address subnet as 192.168.1.X and will create three additional disks that will be used as OSDs by the Ceph cluster. Log in to each of these machines one by one and check whether the hostname, networking, and additional disks have been set up correctly by Vagrant:
      $ vagrant ssh ceph-node1
$ ip addr show
$ sudo fdisk -l
$ exit
  1. Vagrant is configured to update hosts file on the VMs. For convenience, update the /etc/hosts file on your host machine with the following content:
      192.168.1.101 ceph-node1
192.168.1.102 ceph-node2
192.168.1.103 ceph-node3
  1. Update all the three VM's to the latest CentOS release and reboot to the latest kernel:
  1. Generate root SSH keys for ceph-node1 and copy the keys to ceph-node2 and ceph-node3. The password for the root user on these VMs is vagrant. Enter the root user password when asked by the ssh-copy-id command and proceed with the default settings:
      $ vagrant ssh ceph-node1
$ sudo su -
# ssh-keygen
# ssh-copy-id root@ceph-node1
# ssh-copy-id root@ceph-node2
# ssh-copy-id root@ceph-node3
  1. Once the SSH keys are copied to ceph-node2 and ceph-node3, the root user from ceph-node1 can do an ssh login to VMs without entering the password:
      # ssh ceph-node2 hostname
# ssh ceph-node3 hostname
  1. Enable ports that are required by the Ceph MON, OSD, and MDS on the operating system's firewall. Execute the following commands on all VMs:
      # firewall-cmd --zone=public --add-port=6789/tcp --permanent
# firewall-cmd --zone=public --add-port=6800-7100/tcp --permanent
# firewall-cmd --reload
# firewall-cmd --zone=public --list-all
  1. Install and configure NTP on all VMs:
      # yum install ntp ntpdate -y
# ntpdate pool.ntp.org
# systemctl restart ntpdate.service
# systemctl restart ntpd.service
# systemctl enable ntpd.service
# systemctl enable ntpdate.service

Installing and configuring Ceph

To deploy our first Ceph cluster, we will use the ceph-ansible tool to install and configure Ceph on all three virtual machines. The ceph-ansible tool is a part of the Ceph project, which is used for easy deployment and management of your Ceph storage cluster. In the previous section, we created three virtual machines with CentOS 7, which have connectivity with the internet over NAT, as well as private host-only networks.

We will configure these machines as Ceph storage clusters, as mentioned in the following diagram:

Creating the Ceph cluster on ceph-node1

We will first install Ceph and configure ceph-node1 as the Ceph monitor and the Ceph OSD node. Later recipes in this chapter will introduce ceph-node2 and ceph-node3.

How to do it...

Copy ceph-ansible package on ceph-node1 from the Ceph-Cookbook-Second-Edition directory.

  1. Use vagrant as the password for the root user:
      # cd Ceph-Cookbook-Second-Edition
# scp ceph-ansible-2.2.10-38.g7ef908a.el7.noarch.rpm root@ceph-node1:/root
  1. Log in to ceph-node1 and install ceph-ansible on ceph-node1:
      [root@ceph-node1 ~]# 
yum install ceph-ansible-2.2.10-38.g7ef908a.el7.noarch.rpm -y
  1. Update the Ceph hosts to /etc/ansible/hosts:
  1. Verify that Ansible can reach the Ceph hosts mentioned in /etc/ansible/hosts:
  1. Create a directory under the root home directory so Ceph Ansible can use it for storing the keys:
>
  1. Create a symbolic link to the Ansible group_vars directory in the /etc/ansible/ directory:
  1. Go to /etc/ansible/group_vars and copy an all.yml file from the all.yml.sample file and open it to define configuration options' values:
  1. Define the following configuration options in all.yml for the latest jewel version on CentOS 7:
  1. Go to /etc/ansible/group_vars and copy an osds.yml file from the osds.yml.sample file and open it to define configuration options' values:
  1. Define the following configuration options in osds.yml for OSD disks; we are co-locating an OSD journal in the OSD data disk:
  1. Go to /usr/share/ceph-ansible and add retry_files_save_path option in ansible.cfg in the [defaults] tag:
  1. Run Ansible playbook in order to deploy the Ceph cluster on ceph-node1:

To run the playbook, you need site.yml, which is present in the same path: /usr/share/ceph-ansible/. You should be in the /usr/share/ceph-ansible/ path and should run following commands:

      # cp site.yml.sample site.yml
# ansible-playbook site.yml

Once playbook completes the Ceph cluster installation job and plays the recap with failed=0, it means ceph-ansible has deployed the Ceph cluster, as shown in the following screenshot:

You have all three OSD daemons and one monitor daemon up and running in ceph-node1.

Here's how you can check the Ceph jewel release installed version. You can run the ceph -v command to check the installed ceph version:

Scaling up your Ceph cluster

At this point, we have a running Ceph cluster with one MON and three OSDs configured on ceph-node1. Now we will scale up the cluster by adding ceph-node2 and ceph-node3 as MON and OSD nodes.

How to do it…

A Ceph storage cluster requires at least one monitor to run. For high availability, a Ceph storage cluster relies on an odd number of monitors and more than one, for example, 3 or 5, to form a quorum. It uses the Paxos algorithm to maintain quorum majority. You will notice that your Ceph cluster is currently showing HEALTH_WARN; this is because we have not configured any OSDs other than ceph-node1. By default, the data in a Ceph cluster is replicated three times, that too on three different OSDs hosted on three different nodes.

Since we already have one monitor running on ceph-node1, let's create two more monitors for our Ceph cluster and configure OSDs on ceph-node2 and ceph-node3:

  1. Update the Ceph hosts ceph-node2 and ceph-node3 to /etc/ansible/hosts:
  1. Verify that Ansible can reach the Ceph hosts mentioned in /etc/ansible/hosts:
  1. Run Ansible playbook in order to scale up the Ceph cluster on ceph-node2 and ceph-node3:

Once playbook completes the ceph cluster scaleout job and plays the recap with failed=0, it means that the Ceph ansible has deployed more Ceph daemons in the cluster, as shown in the following screenshot.

You have three more OSD daemons and one more monitor daemon running in ceph-node2 and three more OSD daemons and one more monitor daemon running in ceph-node3. Now you have total nine OSD daemons and three monitor daemons running on three nodes:

  1. We were getting a too few PGs per OSD warning and because of that, we increased the default RBD pool PGs from 64 to 128. Check the status of your Ceph cluster; at this stage, your cluster is healthy. PGs - placement groups are covered in detail in Chapter 9, Ceph Under the Hood.

Using the Ceph cluster with a hands-on approach

Now that we have a running Ceph cluster, we will perform some hands-on practice to gain experience with Ceph using some basic commands.

How to do it...

Below are some of the common commands used by Ceph admins:

  1. Check the status of your Ceph installation:
      # ceph -s or # ceph status
  1. Check Ceph's health detail:
      # ceph health detail
  1. Watch the cluster health:
      # ceph -w
  1. Check Ceph's monitor quorum status:
      # ceph quorum_status --format json-pretty
  1. Dump Ceph's monitor information:
      # ceph mon dump
  1. Check the cluster usage status:
      # ceph df
  1. Check the Ceph monitor, OSD, pool, and placement group stats:
      # ceph mon stat
# ceph osd stat
# ceph osd pool stats

# ceph pg stat
  1. List the placement group:
      # ceph pg dump
  1. List the Ceph pools in detail:
      # ceph osd pool ls detail
  1. Check the CRUSH map view of OSDs:
      # ceph osd tree
  1. Check Ceph's OSD usage:
      # ceph osd df
  1. List the cluster authentication keys:
      # ceph auth list

These were some basic commands that you learned in this section. In the upcoming chapters, you will learn advanced commands for Ceph cluster management.

Left arrow icon Right arrow icon

Key benefits

  • •Implement a Ceph cluster successfully and learn to manage it.
  • •Recipe based approach in learning the most efficient software defined storage system
  • •Implement best practices on improving efficiency and security of your storage cluster
  • •Learn to troubleshoot common issues experienced in a Ceph cluster

Description

Ceph is a unified distributed storage system designed for reliability and scalability. This technology has been transforming the software-defined storage industry and is evolving rapidly as a leader with its wide range of support for popular cloud platforms such as OpenStack, and CloudStack, and also for virtualized platforms. Ceph is backed by Red Hat and has been developed by community of developers which has gained immense traction in recent years. This book will guide you right from the basics of Ceph , such as creating blocks, object storage, and filesystem access, to advanced concepts such as cloud integration solutions. The book will also cover practical and easy to implement recipes on CephFS, RGW, and RBD with respect to the major stable release of Ceph Jewel. Towards the end of the book, recipes based on troubleshooting and best practices will help you get to grips with managing Ceph storage in a production environment. By the end of this book, you will have practical, hands-on experience of using Ceph efficiently for your storage requirements.

Who is this book for?

This book is targeted at storage and cloud engineers, system administrators, or anyone who is interested in building software defined storage, to power your cloud or virtual infrastructure. If you have basic knowledge of GNU/Linux and storage systems, with no experience of software defined storage solutions and Ceph, but eager to learn then this book is for you

What you will learn

  • • Understand, install, configure, and manage the Ceph storage system
  • • Get to grips with performance tuning and benchmarking, and learn practical tips to help run Ceph in production
  • • Integrate Ceph with OpenStack Cinder, Glance, and Nova components
  • • Deep dive into Ceph object storage, including S3, Swift, and Keystone integration
  • • Configure a disaster recovery solution with a Ceph Multi-Site V2 gateway setup and RADOS Block Device mirroring
  • • Gain hands-on experience with Ceph Metrics and VSM for cluster monitoring
  • • Familiarize yourself with Ceph operations such as maintenance, monitoring, and troubleshooting
  • • Understand advanced topics including erasure-coding, CRUSH map, cache pool, and general Ceph cluster maintenance

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 24, 2017
Length: 466 pages
Edition : 2nd
Language : English
ISBN-13 : 9781788391061
Vendor :
Canonical
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 24, 2017
Length: 466 pages
Edition : 2nd
Language : English
ISBN-13 : 9781788391061
Vendor :
Canonical
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 115.97
Mastering Ceph
€36.99
Learning Ceph
€41.99
Ceph Cookbook
€36.99
Total 115.97 Stars icon
Banner background image

Table of Contents

14 Chapters
Ceph – Introduction and Beyond Chevron down icon Chevron up icon
Working with Ceph Block Device Chevron down icon Chevron up icon
Working with Ceph and OpenStack Chevron down icon Chevron up icon
Working with Ceph Object Storage Chevron down icon Chevron up icon
Working with Ceph Object Storage Multi-Site v2 Chevron down icon Chevron up icon
Working with the Ceph Filesystem Chevron down icon Chevron up icon
Monitoring Ceph Clusters Chevron down icon Chevron up icon
Operating and Managing a Ceph Cluster Chevron down icon Chevron up icon
Ceph under the Hood Chevron down icon Chevron up icon
Production Planning and Performance Tuning for Ceph Chevron down icon Chevron up icon
The Virtual Storage Manager for Ceph Chevron down icon Chevron up icon
More on Ceph Chevron down icon Chevron up icon
An Introduction to Troubleshooting Ceph Chevron down icon Chevron up icon
Upgrading Your Ceph Cluster from Hammer to Jewel Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2
(9 Ratings)
5 star 66.7%
4 star 0%
3 star 22.2%
2 star 11.1%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Toby366 Mar 02, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Easy to follow helped me so much!
Amazon Verified review Amazon
Allen Murphey May 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Despite being a few versions behind a fast-moving system, this book has been a life-saver for getting up and going with Ceph quickly. The checklist/recipe format is extremely reference friendly, but the overall book is a readable introduction as well. But it is definitely hands-on and that has definitely helped me.
Amazon Verified review Amazon
John May 04, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I've worked with Ceph for over six years. Production Ceph deployments often start at the petabyte level, and grow from there. Additionally, Ceph supports file, block and object interfaces on the same storage platform. Ceph is open-source, free and powerful. It is also a de facto back end for OpenStack Nova, Glance and Cinder. These factors make Ceph both very exciting and a little bit intimidating for first time users.By contrast, learning how to use Ceph usually starts on a much smaller scale--often with a personal computer and some virtual machines. The Ceph Cookbook identifies all of the required software (open source, and free), and step-by-step guidance for deploying a Ceph cluster on a small scale. The skills you learn from the Ceph Cookbook can transfer directly to production environments. Ceph Cookbook also provides detailed step-by-step guidance on how to use the file, block and object interfaces, including their advanced features such as taking snapshots, disaster recovery, and even how to integrate Ceph with OpenStack! With the Ceph Cookbook, you can learn Ceph concepts and features on a small scale and transfer those skills directly to large scale production environments.By following the step-by-step examples provided by Ceph veterans, Vikhyat, Michael and Karan, the Ceph Cookbook will take you from a novice level to a competent Ceph user in short order. The Ceph Cookbook is well worth it!
Amazon Verified review Amazon
Amazon Customer Jan 29, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I was a newby to Ceph and software defined storage, and this book has helped greatly with my success with the product. Highly recommend!!
Amazon Verified review Amazon
Joe Q. Jan 25, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been working with Ceph for almost 2 years now, and many times I find myself searching many different documents to find the required commands or configurations that I require. Since aquiring this book I have been able to perform all of my daily tasks while referencing the book for any steps or procedures I need to manage my clusters, I no longer need to bounce between multiple guides. All of the steps are very detailed and easy to follow, and even include nice screenshots. I would recommend this cookbook for anyone just starting out with Ceph or anyone that deploys or manages Ceph on a daily basis. This is an excellent resource to have handy!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.