Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hadoop 2.x Administration Cookbook

You're reading from   Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Arrow left icon
Product type Paperback
Published in May 2017
Publisher Packt
ISBN-13 9781787126732
Length 348 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Aman Singh Aman Singh
Author Profile Icon Aman Singh
Aman Singh
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Hadoop Architecture and Deployment FREE CHAPTER 2. Maintaining Hadoop Cluster HDFS 3. Maintaining Hadoop Cluster – YARN and MapReduce 4. High Availability 5. Schedulers 6. Backup and Recovery 7. Data Ingestion and Workflow 8. Performance Tuning 9. HBase Administration 10. Cluster Planning 11. Troubleshooting, Diagnostics, and Best Practices 12. Security Index

Adding nodes to the cluster

Over a period of time, our cluster will grow in data and there will be a need to increase the capacity of the cluster by adding more nodes.

We can add Datanodes to the cluster in the same way that we first configured the Datanode started the Datanode daemon on it. But the important thing to keep in mind is that all nodes can be part of the cluster. It should not be that anyone can just start a Datanode daemon on his laptop and join the cluster, as it will be disastrous. By default, there is nothing preventing any node being a Datanode, as the user has just to untar the Hadoop package and point the file "core-site.xml" to the Namenode and start the Datanode daemon.

Getting ready

For the following steps, we assume that the cluster that is up and running with Datanodes is in a healthy state and we need to add a new Datanode in the cluster. We will login to the Namenode and make changes there.

How to do it...

  1. ssh to Namenode and edit the file hdfs-site.xml to add the following property to it:
    <property>
    <name>dfs.hosts</name>
    <value>/home/hadoop/includes</value>
    <final>true</final>
    </property>
  2. Make sure the file includes is readable by the user hadoop.
  3. Restart the Namenode daemon for the property to take effect:
    $ hadoop-daemons.sh stop namenode
    $ hadoop-daemons.sh start namenode
    
  4. A restart of Namenode is required only when any property is changed in the file. Once the property is in place, Namenode can read the changes to the contents of the includes file by simply refreshing the nodes.
  5. Add the dn1.cluster1.com node to the file excludes:
    $ cat includes
    dn1.cluster1.com
    
  6. The file includes or excludes can contain a list of multiple nodes, one node per line.
  7. After adding the node to the file, we just need to reload the file by entering the following:
    $ hadoop dfsadmin -refreshNodes
    
  8. After some time, the node will be available in the cluster and can be seen:
    $ hdfs dfsadmin -report
    

How it works...

The file /home/hadoop/includes will contain a list of all the Datanodes that are allowed to join a cluster. If the file includes is blank, then all Datanodes are allowed to join the cluster. If there is both an include and exclude file, the list of nodes must be mutually exclusive in both the files. So, to decommission the node dn1.cluster.com from the cluster, it must be removed from the includes file and added to the excludes file.

There's more...

In addition to controlling the nodes as we described, there will be firewall rules in place and separate VLANs for Hadoop clusters to keep the traffic and data isolated.

You have been reading a chapter from
Hadoop 2.x Administration Cookbook
Published in: May 2017
Publisher: Packt
ISBN-13: 9781787126732
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image