Starting multiple instances as part of a replica set
In this recipe, we will look at starting multiple servers on the same host but as a cluster. Starting a single mongo server is enough for development purposes or non-mission-critical applications. For crucial production deployments, we need the availability to be high, where if one server instance fails, another instance takes over and the data remains available to query, insert, or update. Clustering is an advanced concept and we won't be doing justice by covering this whole concept in one recipe. Here, we will be touching the surface and going into more detail in other recipes in the administration section later in the book. In this recipe, we will start multiple mongo server processes on the same machine for the purpose of testing. In a production environment, they will be running on different machines (or virtual machines) in the same or even different data centers.
Let's see in brief what a replica set exactly is. As the name suggests, it is a set of servers that are replicas of each other in terms of data. Looking at how they are kept in sync with each other and other internals is something we will defer to some later recipes in the administration section, but one thing to remember is that write operations will happen only on one node, which is the primary one. All the querying also happens from the primary by default, though we may permit read operations on secondary instances explicitly. An important fact to remember is that replica sets are not meant to achieve scalability by distributing the read operations across various nodes in a replica set. Its sole objective is to ensure high availability.
Getting ready
Though not a prerequisite, taking a look at the Starting a single node instance using command-line options recipe will definitely make things easier just in case you are not aware of various command-line options and their significance while starting a mongo server. Additionally, the necessary binaries and setups as mentioned in the single server setup must be done before we continue with this recipe. Let's sum up on what we need to do.
We will start three mongod processes (mongo server instances) on our localhost.
We will create three data directories, /data/n1
, /data/n2
, and /data/n3
for Node1
, Node2
, and Node3
, respectively. Similarly, we will redirect the logs to /logs/n1.log
, /logs/n2.log
, and /logs/n3.log
. The following image will give you an idea on how the cluster would look:
How to do it…
Let's take a look at the steps in detail:
- Create the
/data/n1
,/data/n2
,/data/n3
, and/logs
directories for the data and logs of the three nodes respectively. On the Windows platform, you can choose thec:\data\n1
,c:\data\n2
,c:\data\n3
, andc:\logs\
directories or any other directory of your choice for the data and logs respectively. Ensure that these directories have appropriate write permissions for the mongo server to write the data and logs. - Start the three servers as follows. Users on the Windows platform need to skip the
--fork
option as it is not supported:$ mongod --replSet repSetTest --dbpath /data/n1 --logpath /logs/n1.log --port 27000 --smallfiles --oplogSize 128 --fork $ mongod --replSet repSetTest --dbpath /data/n2 --logpath /logs/n2.log --port 27001 --smallfiles --oplogSize 128 --fork $ mongod --replSet repSetTest --dbpath /data/n3 --logpath /logs/n3.log --port 27002 --smallfiles --oplogSize 128 –fork
- Start the mongo shell and connect to any of the mongo servers running. In this case, we connect to the first one (listening to port
27000
). Execute the following command:$ mongo localhost:27000
- Try to execute an insert operation from the mongo shell after connecting to it:
> db.person.insert({name:'Fred', age:35})
This operation should fail as the replica set has not been initialized yet. More information can be found in the How it works… section.
- The next step is to start configuring the replica set. We start by preparing a JSON configuration in the shell as follows:
cfg = { '_id':'repSetTest', 'members':[ {'_id':0, 'host': 'localhost:27000'}, {'_id':1, 'host': 'localhost:27001'}, {'_id':2, 'host': 'localhost:27002'} ] }
- The last step is to initiate the replica set with the preceding configuration as follows:
> rs.initiate(cfg)
- Execute
rs.status()
after a few seconds on the shell to see the status. In a few seconds, one of them should become a primary and the remaining two should become secondary.
How it works…
We described the common options in the Installing single node MongoDB recipe with the command-line options recipe before and all these command-line options are described in detail.
As we are starting three independent mongod services, we have three dedicated database paths on the filesystem. Similarly, we have three separate log file locations for each of the processes. We then start three mongod processes with the database and log file path specified. As this setup is for test purposes and is started on the same machine, we use the --smallfiles
and --oplogSize
options. As these processes are running on the same host, we also choose the ports explicitly to avoid port conflicts. The ports that we chose here were 27000
, 27001
, and 27002
. When we start the servers on different hosts, we may or may not choose a separate port. We can very well choose to use the default one whenever possible.
The --fork
option demands some explanation. By choosing this option, we start the server as a background process from our operating system's shell and get the control back in the shell where we can then start more such mongod processes or perform other operations. In the absence of the --fork
option, we cannot start more than one process per shell and would need to start three mongod processes in three separate shells.
If we take a look at the logs generated in the log directory, we should see the following lines in it:
[rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG) [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
Though we started three mongod processes with the --replSet
option, we still haven't configured them to work with each other as a replica set. This command-line option is just used to tell the server on startup that this process will be running as a part of a replica set. The name of the replica set is the same as the value of this option passed on the command prompt. This also explains why the insert operation executed on one of the nodes failed before the replica set was initialized. In mongo replica sets, there can be only one primary node where all the inserting and querying happens. In the image shown, the N1 node is shown as the primary and listens to port 27000 for client connections. All the other nodes are slave/secondary instances, which sync themselves up with the primary and hence querying too is disabled on them by default. It is only when the primary goes down that one of the secondary takes over and becomes a primary node. However, it is possible to query the secondary for data as we have shown in the image; we will see how to query from a secondary instance in the next recipe.
Well, all that is left now is to configure the replica set by grouping the three processes that we started. This is done by first defining a JSON object as follows:
cfg = { '_id':'repSetTest', 'members':[ {'_id':0, 'host': 'localhost:27000'}, {'_id':1, 'host': 'localhost:27001'}, {'_id':2, 'host': 'localhost:27002'} ] }
There are two fields, _id
and members
, for the unique ID of the replica set and an array of the hostnames and port numbers of the mongod server processes as part of this replica set, respectively. Using localhost to refer to the host is not a very good idea and is usually discouraged; however, in this case, as we started all the processes on the same machine, we are ok with it. It is preferred that you refer to the hosts by their hostnames even if they are running on localhost. Note that you cannot mix referring to the instances using localhost and hostnames both in the same configuration. It is either the hostname or localhost. To configure the replica set, we then connect to any one of the three running mongod processes; in this case, we connect to the first one and then execute the following from the shell:
> rs.initiate(cfg)
The _id
field in the cfg
object passed has a value that is the same as the value we gave to the --replSet
option on the command prompt when we started the server processes. Not giving the same value would throw the following error:
{ "ok" : 0, "errmsg" : "couldn't initiate : set name does not match the set name host Amol-PC:27000 expects" }
If all goes well and the initiate call is successful, we should see something similar to the following JSON response on the shell:
{"ok" : 1}
In a few seconds, you should see a different prompt for the shell that we executed this command from. It should now become a primary or secondary. The following is an example of the shell connected to a primary member of the replica set:
repSetTest:PRIMARY>
Executing rs.status()
should give us some stats on the replica set's status, which we will explore in depth in a recipe later in the book in the administration section. For now, the stateStr
field is important and contains the PRIMARY
, SECONDARY
, and other texts.
There's more…
Look at the Connecting to the replica set in the shell to query and insert data recipe to perform more operations from the shell after connecting to a replica set. Replication isn't as simple as we saw here. See the administration section for more advanced recipes on replication.
See also
If you are looking to convert a standalone instance to a replica set, then the instance with the data needs to become a primary first, and then empty secondary instances will be added to which the data will be synchronized. Refer to the following URL on how to perform this operation:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/