First Step to Redis Cluster
Here, I will be talking about setting up a Redis cluster on your local machine, its distributed storage concept and how it handles failover to utilize its performance.
Before starting off, let’s get some brief idea on Redis and Redis cluster.
What is Redis?
- Redis is an in-memory, key-value store.
- In-memory store: Redis keeps the data in the cache and it does not write to the disk. This makes reading/writing data very fast. (However, Redis has an option to write data to the disk)
- Key-value store: Redis can store data as key-value pairs.
eg: SET "name" "Varuni"
"name" is the Key and "Varuni" is the Value
2. It is a No-SQL database.
3. Uses data structures to store data.
4. Interaction with data is command-based.
Redis as a Database
Getting data from a disk can be time-consuming. In order to increase the performance, we can put the requests that need to be serviced very fast into the Redis memory and service from there while keeping the rest of the data in the main database.
Master-Slave Replication of Redis
Redis can replicate data to any number of slaves. That is, it lets the slaves have the exact copy of their master. This greatly helps in performance optimization.
Thus, you can use Redis as a database, a cache or a message broker. If you are a complete newbie to Redis, you might want to read this, which gives a detailed introduction 🙂
What is a Redis Cluster?
A Redis cluster is simply a data sharding strategy. It automatically partitions data across multiple Redis nodes. It is an advanced version of Redis that achieves distributed storage and prevents a single point of failure.
In Summary, a Redis cluster is,
- Horizontally scalable: We can continue to add more nodes as the capacity requirement increases.
- Auto data sharding: can automatically partition and split data among the nodes.
- Fault tolerant: even though we lose a node, we can still continue to operate without losing any data.
- Decentralized cluster management: no single node acts as an orchestrator of the entire cluster, every node participates in the cluster configuration (via gossip protocol)
Redis cluster topology
The minimal cluster that works as expected requires;
- Minimum 3 Redis master nodes
- Minimum 3 Redis slaves, 1 slave per master (to allow minimal fail-over mechanism)
Distributed Storage of Redis Cluster
Every key that you save into a Redis cluster is associated with a hash slot. There are 0–16383 slots in a Redis cluster. Thus, a Redis cluster can have a maximum of 16384 master nodes (however the suggested max size of nodes is ~ 1000 nodes). Each master node in a cluster handles a subset of the 16384 hash slots.
The distributed algorithm that Redis Cluster uses to map keys to hash slots is,
HASH_SLOT = CRC16(key) mod HASH_SLOTS_NUMBER
CRC stands for Cyclic Redundancy Check.
For example, let’s assume the key space is divided into 10 slots (0–9). Each node will hold a subset of the hash slots.
A given key “name” is at slot:
slot = CRC16(“name”) % 16384
Failure Handling in Redis Cluster
Redis has introduced the Master-Slave concept to increase data availability by preventing the single point of failure. Every master node in a Redis cluster has at least one slave node. When the Master node fails or becomes unreachable, the cluster will automatically choose its slave node/one of the slave nodes and make that one the new Master. Therefore, failure in one node will not stop the entire system from working.
How to detect failures?
Every node has a unique ID in the cluster. This ID is used to identify each and every node across the entire cluster using the gossip protocol.
So, a node keeps the following information within itself;
- node ID, IP, and port
- a set of flags
- what is the Master of the node if it is flagged as “slave”
- last time a node was pinged
- last time the pong was received
N.B. When you ping a Redis node, if it is working properly, it will reply with a pong.
Nodes in a cluster always keep gossiping with each other, enabling them to auto-discover other nodes.
e.g. If A knows B, and B knows C, eventually B will send gossip messages to A about C. Then A will register C as part of the network and will try to connect with C.
There are two flags that are used for failure detection namely PFAIL and FAIL.
- PFAIL (Possible FAILure): a non-acknowledged failure type.
- FAIL: This tells that a node is failing, and it was confirmed by the majority of the masters within a fixed amount of time.
The node-to-node communication follows the binary protocol (Cluster Bus Protocol) which is optimized for bandwidth and speed, but the node-client communication follows the ASCII protocol.
Setup Redis Cluster on your local machine
First, you need to install Redis on your local machine. Here I am using Ubuntu 18.04.1 to set up Redis.
$ sudo apt update
$ sudo apt install redis-server
$ sudo gedit /etc/redis/redis.conf
//open the file and search for "supervised". It is set to No by default. Change it to "systemd"
$ sudo systemctl restart redis.service //restart the redis service just in case if it is not started --> sudo systemctl enable redis-server (this should work hopefully! 😊)
$ sudo systemctl status redis //check whether the Redis service is running.If it is running successfully, you will get an output like below.
Now to deploy a Redis cluster, there are 2 ways to do this.
- By creating empty Redis instances running in cluster mode
- By using the create-cluster script
Now for me, the latter is easier and quicker. But if you want to deploy a cluster using the first method, this would help you to do that.
Alright, let’s deploy a Redis Cluster
- Go inside your Redis distribution, locate the “utils” folder, there is another folder inside that called “create-cluster”. You’ll see a script called “create-cluster” inside that folder. This is a simple bash script which would start a 6 nodes cluster with 3 masters and 3 slaves by default ( you can change the default values as required)
path: <redis-distribution>/utils/create-cluster
2. Start the cluster
create-cluster start
Output:
This would start 6 nodes at ports 30001, 30002, 30003, 30004, 30005 and 30006 by default.
You can also change the default port number. Open the create-cluster script using a text editor. Find PORT=30000, and set it with your desired port number.
e.g. PORT=7000 //this will start nodes from port number 7001 onwards
3. Create the cluster
create-cluster create
Output:
Make sure you reply “yes” when the redis-cli utility prompts you to accept the layout.
4. Test the cluster
You can use any well-known redis-clients available or redis-cli utility for this. I will be using redis-cli to interact with the cluster.
e.g.
redis-cli -c -p 30001
set name1 varuni //writing data, using key-value pairs
get name1 //reading data, using the key
N.B. Here, the Redis cluster nodes are able to redirect a client to the right node when retrieving data. There are some enhanced clients which can cache the map between hash slots and node addresses, and directly use the right connection to the right node.
5. Stop the cluster
create-cluster stop
Output:
For Your Knowledge
Replication Vs. Sharding
Replication is also known as mirroring. It copies all data in the master node to the slave node/s.
Sharding is also known as partitioning. It splits the data up by key.
e.g.
With sharding, keys 1 and 3 are stored on machine A, and keys 2 and 4 on machine B.
With replication, all the keys 1, 2, 3, and 4 are stored on both machine A and B
Replication Or Clustering?
If you have more data than RAM in a single machine, use a Redis cluster to shard the data across multiple databases.
If you have less data than RAM in a machine, set up a master/slave replication with a sentinel in front to handle the failover.
A sentinel handles health checks of the masters/slaves, and will automatically promote a slave if a master is unreachable. You need to have at least 3 sentinels running so that they can agree on reachability of nodes, and to ensure the sentinels do not have a single point of failure.
Why Do you need a minimum of 3 masters?
During the failure detection, the majority of the master nodes are required to come to an agreement. If there are only 2 masters, say A and B and B failed, then the A master node cannot reach to a decision according to the protocol. The A node needs another third node, say C, to tell A that it also cannot reach B.
📝 Read this story later in Journal.
👩💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.