diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..2b437ec --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,4 @@ +Copyright (C) 2013 AnsibleWorks, Inc. + +This work is licensed under the Creative Commons Attribution 3.0 Unported License. +To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/deed.en_US. diff --git a/README.md b/README.md index 417df9c..7665887 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,290 @@ -# vn-ansible-sistemas +## Deploying a sharded, production-ready MongoDB cluster with Ansible +------------------------------------------------------------------------------ +- Requires Ansible 1.2 +- Expects CentOS/RHEL 6 hosts + +### A Primer +--------------------------------------------- + +![Alt text](images/nosql_primer.png "Primer NoSQL") + +The above diagram shows how MongoDB differs from the traditional relational +database model. In an RDBMS, the data associated with 'user' is stored in a +table, and the records of users are stored in rows and columns. In MongoDB, the +'table' is replaced by a 'collection' and the individual 'records' are called +'documents'. One thing to notice is that the data is stored as key/value pairs +in BJSON format. + +Another thing to notice is that NoSQL-style databases have a looser consistency +model. As an example, the second document in the users collection has an +additional field of 'last name'. + +### Data Replication +------------------------------------ + +![Alt text](images/replica_set.png "Replica Set") + +Data backup is achieved in MongoDB via _replica sets_. As the figure above shows, +a single replication set consists of a replication master (active) and several +other replications slaves (passive). All the database operations like +add/delete/update happen on the replication master and the master replicates +the data to the slave nodes. _mongod_ is the process which is responsible for all +the database activities as well as replication processes. The minimum +recommended number of slave servers are 3. + +### Sharding (Horizontal Scaling) . +------------------------------------------------ + +![Alt text](images/sharding.png "Sharding") + +Sharding works by partitioning the data into separate chunks and allocating +different ranges of chunks to different shard servers. The figure above shows a +collection which has 90 documents which have been sharded across the three +servers: the first shard getting ranges from 1-29, and so on. When a client wants +to access a certain document, it contacts the query router (mongos process), +which in turn contacts the 'configuration node', a lightweight mongod +process) that keeps a record of which ranges of chunks are distributed across +which shards. + +Please do note that every shard server should be backed by a replica set, so +that when data is written/queried copies of the data are available. So in a +three-shard deployment we would require 3 replica sets and primaries of each +would act as the sharding server. + +Here are the basic steps of how sharding works: + +1) A new database is created, and collections are added. + +2) New documents get updated when clients update, and all the new documents +goes into a single shard. + +3) When the size of collection in a shard exceeds the 'chunk_size' the +collection is split and balanced across shards. + + +### Deploying MongoDB Ansible +-------------------------------------------- + +#### Deploy the Cluster +---------------------------- + +![Alt text](images/site.png "Site") + +The diagram above illustrates the deployment model for a MongoDB cluster deployed by +Ansible. This deployment model focuses on deploying three shard servers, +each having a replica set, with the backup replica servers serving as the other two shard +primaries. The configuration servers are co-located with the shards. The _mongos_ +servers are best deployed on separate servers. This is the minimum recommended +configuration for a production-grade MongoDB deployment. Please note that the +playbooks are capable of deploying N node clusters, not limited to three. Also, +all the processes are secured using keyfiles. + +#### Prerequisite + +Edit the group_vars/all file to reflect the below variables. + +1) `iface: 'eth1' # the interface to be used for all communication`. + +2) Set a unique `mongod_port` variable in the inventory file for each MongoDB +server. + +3) The default directory for storing data is `/data`, please do change it if +required. Make sure it has sufficient space: 10G is recommended. + +### Deployment Example + +The inventory file looks as follows: + + #The site wide list of mongodb servers + [mongo_servers] + mongo1 mongod_port=2700 + mongo2 mongod_port=2701 + mongo3 mongod_port=2702 + + #The list of servers where replication should happen, including the master server. + [replication_servers] + mongo3 + mongo1 + mongo2 + + #The list of mongodb configuration servers, make sure it is 1 or 3 + [mongoc_servers] + mongo1 + mongo2 + mongo3 + + #The list of servers where mongos servers would run. + [mongos_servers] + mongos1 + mongos2 + +Build the site with the following command: + + ansible-playbook -i hosts site.yml + + +#### Verifying the Deployment +--------------------------------------------- + +Once configuration and deployment has completed we can check replication set +availability by connecting to individual primary replication set nodes, `mongo +--host 192.168.1.1 --port 2700` and issue the command to query the status of +replication set, we should get a similar output. + + + web2:PRIMARY> rs.status() + { + "set" : "web2", + "date" : ISODate("2013-03-19T10:26:35Z"), + "myState" : 1, + "members" : [ + { + "_id" : 0, + "name" : "web2:2013", + "health" : 1, + "state" : 1, + "stateStr" : "PRIMARY", + "uptime" : 102, + "optime" : Timestamp(1363688755000, 1), + "optimeDate" : ISODate("2013-03-19T10:25:55Z"), + "self" : true + }, + { + "_id" : 1, + "name" : "web3:2013", + "health" : 1, + "state" : 2, + "stateStr" : "SECONDARY", + "uptime" : 40, + "optime" : Timestamp(1363688755000, 1), + "optimeDate" : ISODate("2013-03-19T10:25:55Z"), + "lastHeartbeat" : ISODate("2013-03-19T10:26:33Z"), + "pingMs" : 1 + } + ], + "ok" : 1 + } + + +We can check the status of the shards as follows: connect to the mongos service +`mongo localhost:8888/admin -u admin -p 123456` and issue the following command to get +the status of the Shards: + + + + mongos> sh.status() + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "web2", "host" : "web2/web2:2013,web3:2013" } + { "_id" : "web3", "host" : "web3/web2:2014,web3:2014" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + + +We can also make sure the sharding works by creating a database, a collection, +and populate it with documents and check if the chunks of the collection are +balanced equally across nodes. The below diagram illustrates the verification +step. + +------------------------------------------------------------------------------------------------------------------------------------------------------------- + +![Alt text](images/check.png "check") + +The above mentioned steps can be tested with an automated playbook. + +Issue the following command to run the test. Pass one of the _mongos_ servers +in the _servername_ variable. + + ansible-playbook -i hosts playbooks/testsharding.yml -e servername=server1 + + +Once the playbook completes, we check if the sharding has succeeded by logging +on to any mongos server and issuing the following command. The output displays +the number of chunks spread across the shards. + + mongos> sh.status() + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "bensible", "host" : "bensible/bensible:20103,web2:20103,web3:20103" } + { "_id" : "web2", "host" : "web2/bensible:20105,web2:20105,web3:20105" } + { "_id" : "web3", "host" : "web3/bensible:20102,web2:20102,web3:20102" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "test", "partitioned" : true, "primary" : "web3" } + + test.test_collection chunks: + + bensible 7 + web2 6 + web3 7 + + + + +### Scaling the Cluster +--------------------------------------- + +![Alt text](images/scale.png "scale") + +To add a new node to the existing MongoDB Cluster, modify the inventory file as follows: + + #The site wide list of mongodb servers + [mongoservers] + mongo1 mongod_port=2700 + mongo2 mongod_port=2701 + mongo3 mongod_port=2702 + mongo4 mongod_port=2703 + + #The list of servers where replication should happen, make sure the new node is listed here. + [replicationservers] + mongo4 + mongo3 + mongo1 + mongo2 + + #The list of mongodb configuration servers, make sure it is 1 or 3 + [mongoc_servers] + mongo1 + mongo2 + mongo3 + + #The list of servers where mongos servers would run. + [mongos_servers] + mongos1 + mongos2 + +Make sure you have the new node added in the _replicationservers_ section and +execute the following command: + + ansible-playbook -i hosts site.yml + +### Verification +----------------------------- + +The newly added node can be easily verified by checking the sharding status and +seeing the chunks being rebalanced to the newly added node. + + $/usr/bin/mongo localhost:8888/admin -u admin -p 123456 + mongos> sh.status() + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "bensible", "host" : "bensible/bensible:20103,web2:20103,web3:20103" } + { "_id" : "web2", "host" : "web2/bensible:20105,web2:20105,web3:20105" } + { "_id" : "web3", "host" : "web3/bensible:20102,web2:20102,web3:20102" } + { "_id" : "web4", "host" : "web4/bensible:20101,web3:20101,web4:20101" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "test", "partitioned" : true, "primary" : "bensible" } + + test.test_collection chunks: + + web4 3 + web3 6 + web2 6 + bensible 5 + + diff --git a/hosts b/hosts index e69de29..1c945a8 100644 --- a/hosts +++ b/hosts @@ -0,0 +1,29 @@ +#The site wide list of mongodb servers + +# the mongo servers need a mongod_port variable set, and they must not conflict. +[mongo_servers] +mongo1 mongod_port=2700 +mongo2 mongod_port=2701 +mongo3 mongod_port=2702 +mongo4 mongod_port=2703 + +#The list of servers where replication should happen, by default include all servers +[replication_servers] +mongo4 +mongo3 +mongo1 +mongo2 + +#The list of mongodb configuration servers, make sure it is 1 or 3 +[mongoc_servers] +mongo1 +mongo2 +mongo3 + + +#The list of servers where mongos servers would run. +[mongos_servers] +mongo1 +mongo2 + + diff --git a/site.yml b/site.yml new file mode 100644 index 0000000..2caa62f --- /dev/null +++ b/site.yml @@ -0,0 +1,22 @@ +--- +# This Playbook would deploy the whole mongodb cluster with replication and sharding. + +- hosts: all + roles: + - role: common + +- hosts: mongo_servers + roles: + - role: mongod + +- hosts: mongoc_servers + roles: + - role: mongoc + +- hosts: mongos_servers + roles: + - role: mongos + +- hosts: mongo_servers + tasks: + - include: roles/mongod/tasks/shards.yml