David Lopez b756db6a4e | ||
---|---|---|
create_user_ssh | ||
playbooks | ||
zabbix | ||
Debian.yml | ||
LICENSE.md | ||
README.md | ||
hola | ||
hosts | ||
site.yml |
README.md
Deploying a sharded, production-ready MongoDB cluster with Ansible
- Requires Ansible 1.2
- Expects CentOS/RHEL 6 hosts
A Primer
The above diagram shows how MongoDB differs from the traditional relational database model. In an RDBMS, the data associated with 'user' is stored in a table, and the records of users are stored in rows and columns. In MongoDB, the 'table' is replaced by a 'collection' and the individual 'records' are called 'documents'. One thing to notice is that the data is stored as key/value pairs in BJSON format.
Another thing to notice is that NoSQL-style databases have a looser consistency model. As an example, the second document in the users collection has an additional field of 'last name'.
Data Replication
Data backup is achieved in MongoDB via replica sets. As the figure above shows, a single replication set consists of a replication master (active) and several other replications slaves (passive). All the database operations like add/delete/update happen on the replication master and the master replicates the data to the slave nodes. mongod is the process which is responsible for all the database activities as well as replication processes. The minimum recommended number of slave servers are 3.
Sharding (Horizontal Scaling) .
Sharding works by partitioning the data into separate chunks and allocating different ranges of chunks to different shard servers. The figure above shows a collection which has 90 documents which have been sharded across the three servers: the first shard getting ranges from 1-29, and so on. When a client wants to access a certain document, it contacts the query router (mongos process), which in turn contacts the 'configuration node', a lightweight mongod process) that keeps a record of which ranges of chunks are distributed across which shards.
Please do note that every shard server should be backed by a replica set, so that when data is written/queried copies of the data are available. So in a three-shard deployment we would require 3 replica sets and primaries of each would act as the sharding server.
Here are the basic steps of how sharding works:
-
A new database is created, and collections are added.
-
New documents get updated when clients update, and all the new documents goes into a single shard.
-
When the size of collection in a shard exceeds the 'chunk_size' the collection is split and balanced across shards.
Deploying MongoDB Ansible
Deploy the Cluster
The diagram above illustrates the deployment model for a MongoDB cluster deployed by Ansible. This deployment model focuses on deploying three shard servers, each having a replica set, with the backup replica servers serving as the other two shard primaries. The configuration servers are co-located with the shards. The mongos servers are best deployed on separate servers. This is the minimum recommended configuration for a production-grade MongoDB deployment. Please note that the playbooks are capable of deploying N node clusters, not limited to three. Also, all the processes are secured using keyfiles.
Prerequisite
Edit the group_vars/all file to reflect the below variables.
-
iface: 'eth1' # the interface to be used for all communication
. -
Set a unique
mongod_port
variable in the inventory file for each MongoDB server. -
The default directory for storing data is
/data
, please do change it if required. Make sure it has sufficient space: 10G is recommended.
Deployment Example
The inventory file looks as follows:
#The site wide list of mongodb servers
[mongo_servers]
mongo1 mongod_port=2700
mongo2 mongod_port=2701
mongo3 mongod_port=2702
#The list of servers where replication should happen, including the master server.
[replication_servers]
mongo3
mongo1
mongo2
#The list of mongodb configuration servers, make sure it is 1 or 3
[mongoc_servers]
mongo1
mongo2
mongo3
#The list of servers where mongos servers would run.
[mongos_servers]
mongos1
mongos2
Build the site with the following command:
ansible-playbook -i hosts site.yml
Verifying the Deployment
Once configuration and deployment has completed we can check replication set
availability by connecting to individual primary replication set nodes, mongo --host 192.168.1.1 --port 2700
and issue the command to query the status of
replication set, we should get a similar output.
web2:PRIMARY> rs.status()
{
"set" : "web2",
"date" : ISODate("2013-03-19T10:26:35Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "web2:2013",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 102,
"optime" : Timestamp(1363688755000, 1),
"optimeDate" : ISODate("2013-03-19T10:25:55Z"),
"self" : true
},
{
"_id" : 1,
"name" : "web3:2013",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 40,
"optime" : Timestamp(1363688755000, 1),
"optimeDate" : ISODate("2013-03-19T10:25:55Z"),
"lastHeartbeat" : ISODate("2013-03-19T10:26:33Z"),
"pingMs" : 1
}
],
"ok" : 1
}
We can check the status of the shards as follows: connect to the mongos service
mongo localhost:8888/admin -u admin -p 123456
and issue the following command to get
the status of the Shards:
mongos> sh.status()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "web2", "host" : "web2/web2:2013,web3:2013" }
{ "_id" : "web3", "host" : "web3/web2:2014,web3:2014" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
We can also make sure the sharding works by creating a database, a collection, and populate it with documents and check if the chunks of the collection are balanced equally across nodes. The below diagram illustrates the verification step.
The above mentioned steps can be tested with an automated playbook.
Issue the following command to run the test. Pass one of the mongos servers in the servername variable.
ansible-playbook -i hosts playbooks/testsharding.yml -e servername=server1
Once the playbook completes, we check if the sharding has succeeded by logging on to any mongos server and issuing the following command. The output displays the number of chunks spread across the shards.
mongos> sh.status()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "bensible", "host" : "bensible/bensible:20103,web2:20103,web3:20103" }
{ "_id" : "web2", "host" : "web2/bensible:20105,web2:20105,web3:20105" }
{ "_id" : "web3", "host" : "web3/bensible:20102,web2:20102,web3:20102" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "web3" }
test.test_collection chunks:
bensible 7
web2 6
web3 7
Scaling the Cluster
To add a new node to the existing MongoDB Cluster, modify the inventory file as follows:
#The site wide list of mongodb servers
[mongoservers]
mongo1 mongod_port=2700
mongo2 mongod_port=2701
mongo3 mongod_port=2702
mongo4 mongod_port=2703
#The list of servers where replication should happen, make sure the new node is listed here.
[replicationservers]
mongo4
mongo3
mongo1
mongo2
#The list of mongodb configuration servers, make sure it is 1 or 3
[mongoc_servers]
mongo1
mongo2
mongo3
#The list of servers where mongos servers would run.
[mongos_servers]
mongos1
mongos2
Make sure you have the new node added in the replicationservers section and execute the following command:
ansible-playbook -i hosts site.yml
Verification
The newly added node can be easily verified by checking the sharding status and seeing the chunks being rebalanced to the newly added node.
$/usr/bin/mongo localhost:8888/admin -u admin -p 123456
mongos> sh.status()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "bensible", "host" : "bensible/bensible:20103,web2:20103,web3:20103" }
{ "_id" : "web2", "host" : "web2/bensible:20105,web2:20105,web3:20105" }
{ "_id" : "web3", "host" : "web3/bensible:20102,web2:20102,web3:20102" }
{ "_id" : "web4", "host" : "web4/bensible:20101,web3:20101,web4:20101" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "bensible" }
test.test_collection chunks:
web4 3
web3 6
web2 6
bensible 5