In this recipe we will be looking at how to optimize on disk I/O by separating databases in different directories.
Separating directories per database
Getting ready
Ensure you have a MongoDB database installation ready.
How to do it...
- Start mongod daemon with no special parameters:
/data/mongodb/bin/mongod --dbpath /data/db
- Connect to mongo shell, create a test db and insert a sample document:
mongo localhost:27017
> use mydb
> db.mycol.insert({foo:1})
- Inspect the /data/db directory structure, it should look something like this:
ls /data/db
total 244
drwxr-xr-x 4 root root 4096 May 21 08:45 .
drwxr-xr-x 10 root root 4096 May 21 08:42 ..
-rw-r--r-- 1 root root 16384 May 21 08:43 collection-0-626293768203557661.wt
-rw-r--r-- 1 root root 16384 May 21 08:43 collection-2-626293768203557661.wt
-rw-r--r-- 1 root root 16384 May 21 08:43 collection-5-626293768203557661.wt
drwxr-xr-x 2 root root 4096 May 21 08:45 diagnostic.data
-rw-r--r-- 1 root root 16384 May 21 08:43 index-1-626293768203557661.wt
-rw-r--r-- 1 root root 16384 May 21 08:43 index-3-626293768203557661.wt
-rw-r--r-- 1 root root 16384 May 21 08:43 index-4-626293768203557661.wt
-rw-r--r-- 1 root root 16384 May 21 08:43 index-6-626293768203557661.wt
drwxr-xr-x 2 root root 4096 May 21 08:42 journal
-rw-r--r-- 1 root root 16384 May 21 08:43 _mdb_catalog.wt
-rw-r--r-- 1 root root 6 May 21 08:42 mongod.lock
-rw-r--r-- 1 root root 16384 May 21 08:44 sizeStorer.wt
-rw-r--r-- 1 root root 95 May 21 08:42 storage.bson
-rw-r--r-- 1 root root 49 May 21 08:42 WiredTiger
-rw-r--r-- 1 root root 4096 May 21 08:42 WiredTigerLAS.wt
-rw-r--r-- 1 root root 21 May 21 08:42 WiredTiger.lock
-rw-r--r-- 1 root root 994 May 21 08:45 WiredTiger.turtle
-rw-r--r-- 1 root root 61440 May 21 08:45 WiredTiger.wt
- Shutdown the previous mongod instance.
- Create a new db path and start mongod with --directoryperdb option:
mkdir /data/newdb
/data/mongodb/bin/mongod --dbpath /data/newdb --directoryperdb
- Connect to the mongo shell, create a test db, and insert a sample document:
mongo localhost:27017
> use mydb
> db.mycol.insert({bar:1})
- Inspect the /data/newdb directory structure, it should look something like this:
ls /data/newdb
total 108
drwxr-xr-x 7 root root 4096 May 21 08:42 .
drwxr-xr-x 10 root root 4096 May 21 08:42 ..
drwxr-xr-x 2 root root 4096 May 21 08:41 admin
drwxr-xr-x 2 root root 4096 May 21 08:42 diagnostic.data
drwxr-xr-x 2 root root 4096 May 21 08:41 journal
drwxr-xr-x 2 root root 4096 May 21 08:41 local
-rw-r--r-- 1 root root 16384 May 21 08:42 _mdb_catalog.wt
-rw-r--r-- 1 root root 0 May 21 08:42 mongod.lock
drwxr-xr-x 2 root root 4096 May 21 08:41 mydb
-rw-r--r-- 1 root root 16384 May 21 08:42 sizeStorer.wt
-rw-r--r-- 1 root root 95 May 21 08:41 storage.bson
-rw-r--r-- 1 root root 49 May 21 08:41 WiredTiger
-rw-r--r-- 1 root root 4096 May 21 08:42 WiredTigerLAS.wt
-rw-r--r-- 1 root root 21 May 21 08:41 WiredTiger.lock
-rw-r--r-- 1 root root 986 May 21 08:42 WiredTiger.turtle
-rw-r--r-- 1 root root 28672 May 21 08:42 WiredTiger.wt
How it works...
We start by running a mongod instance with no special parameters except for --dbpath. In step 2, we create a new database mydb and insert a document in the collection mycol, using the mongo shell. By doing this, the data files for this new db are created and can be seen by inspecting the directory structure of our main database path /data/db. In that, among other files, you can see that database files begin with collection-<number> and its relevant index file begins with index-<number>. As we guessed, all databases and their relevant files are within the same directory as our db path.
If you are curious and wish to find the correlation between the files and the db, then run the following commands in mongo shell:
> use mydb
> var curiosity = db.mycol.stats()
> curiosity['wiredTiger']['uri']
statistics:table:collection-5-626293768203557661
The last part of this string that is, collection-5-626293768203557661 corresponds to the file in our /data/db path.
Moving on, in steps 4 and step 5, we stop the previous mongod instance, create a new path for our data files and start a new mongod instance but this time with the --directoryperdb parameter. As before, in step 6 we insert some random data in the mycol collection of a new database called mydb. In step 7, we look at the directory listing of our data path and we can see that there is a subdirectory in the data path which, as you guessed, matches our database name mydb. If you look inside this directory that is, /data/newdb/mydb, you should see a collection and an index file.
So one might ask, why go through all this trouble for having separate directories for databases? Well, in certain application scenarios, if your database workloads are significantly high, you should consider storing the database on a separate disk/volume. Ideally, this should be a physically separate disk or a RAID volume created using separate physical disks. This ensures the separation of disk I/O from other operations including MongoDB journals. Additionally, this also helps you separate your fault domains. One thing you should keep in mind is that journals are stored separately, that is, outside the database's directory. So, using separate disks for databases allows the journals to not content for same disk I/O path.