Monday, April 11, 2011

Feature Preview: The Multi-Threaded Slave

The Need for Multi-Threaded Slaves!

The MySQL replication team has come up with a feature that looks very promising for improving slave scalability. Although there is some work that one can do on the master side, it is the work already done at the slave side that this blog post highlights...

At its core, MySQL replication is single-threaded! In detail, the tiniest unit of work for replication is an event, and a group of events forms a transaction. Events are pushed by the master to the slaves by a thread, known as "dump thread". At the slave, a reader ("IO thread") reads event-by-event and writes them to a local persistent queue, the "relay log". Then a single threaded applier, the "SQL thread", reads and applies events sequentially.

Contrary to the master, which executes transactions concurrently, the slave serializes execution of each and every transaction, even if they were executed concurrently on the master and are guaranteed to be conflict free. In fact, should that be the case, then such transactions could be applied in parallel thus taking advantage of multi-core/cpu hardware. Furthermore, recent optimizations on the master side enabled much better scale-up, thence enforcing the need for more scalable slave execution as well.

Today, Andrei Elkin has actually delivered a talk on this subject at Collaborate 11: Boosting MySQL Replication Performance Through the Multi-Threaded Slave. Roughly two hours earlier, a prototype - feature preview based on 5.6 codebase - was published in MySQL Labs for you to go and test it out. NOTE: the package is not intended for using in production. It IS a feature preview.

What is a Multi-Threaded Slave?

While multi-threaded is a rather broad term, it is loosely used in the current context, targeting a specific technique of parallelization of transaction execution. In other words, multi-threaded slave refers to inter-transactional parallelization while applying transactions to the slave's databases. This makes replication more scalable, especially on multi-core architectures.

The fact is that even the most common hardware today is multi-core and given that replication strives to exhibit good scalability always, the benefits from having a multi-core machine at the slave side must be exploited. Apparently, that is not completely the case yet, because every now and then we hear someone saying that the slave is not able to keep up with its master or that it is "lagging behind"! On such settings inter-transactional parallelization comes to the rescue, as it fits a big number of cases, is relatively easy to implement and is engine agnostic.

Earlier today, the MySQL replication team has put up for download, on MySQL Labs, a prototype that fits one particular scenario: apply, in parallel, transactions that operate on different database sets. In fact, it turns out to be quite a commonplace. In detail, it is often the case that the application partitions its data logically per database, thus the workload pattern is such, that a set of unrelated and non-conflicting transactions are executed concurrently, and on different databases, at the master. The replication slave can take advantage of this fact and execute such transactions, concurrently as well, i.e., by applying them in parallel. In practice, and on such scenarios, a multi-threaded slave would read transactions from the relay log and assign them to different worker threads, depending on the database the transaction is working on. Transactions operating on the same database would then be guaranteed to be serialized. On the other hand, transactions executing on different databases would be concurrently executed and committed independently. This is depicted in the following diagram:



In this figure we can find the usual stages that transactions go through during the apply procedure at the slave. It is not fundamentally different from what we have today on latest MySQL GA releases, except that now there is an additional "workers" stage. In a nutshell, the execution flow goes like this: IO thread receives events/transactions and queues them in the relay log; SQL coordinator thread reads them from the log file and queues the event on a specific worker queue. Selecting which queue to use is done by checking which database the event should be executed against (the best case scenario is the one that all events in a transaction change the same database, otherwise the scheduling may have some negative impact); Workers dequeue the events from their queues and execute them. If the event is a COMMIT operation, the worker then commits the transaction independently from other workers.

Some Special Cases

There are some special cases that one needs to think of. What about cross-database transactions? What about slave positions ? What about stopping and restarting the slave ? What about crashing and recovering?

I don't want to risk getting tangled into details, so lets not go into much detail and discuss them briefly:
  • For cross-database transactions, the slave waits until all preceding transactions that are working on the same database set are over. However, multiple databases in the same transaction can result in a partition itself that the slave can parallelize, e.g., if T1 changes "db1" and "db2" and T2 changes "db3" and "db4", then there is room for the slave to execute both transactions in parallel, despite the fact that both change more than one database;
  • On a multi-threaded slave, there are several workers, each committing independently, therefore usual concept of "last executed position" is not very meaningful. Some concurrent transactions may commit in a different order than the one that is established in the binary log, thus execution gaps may exist throughout execution time. The slave needs to track such gaps, and for that purpose it keeps track of each worker position and stores them persistently on system tables ("mysql.slave_worker_info");
  • Stopping and restarting the server is somewhat analogous to crashing and recovering it. On restart the slave checks each worker position, thus finding out which transactions need to be replayed since the last checkpoint (position in the log that has no execution gaps before it). This takes care of existing gaps,and the new workers can start executing from a base position.

State of the Art

For the following examples, a multi-threaded enabled slave server is required, replication is assumed to be setup, but the slave threads should not have been started. In addition, the slave should be configured to use the new system tables to store positions (not really necessary, but it turns the whole experience much more appealing). As such, one needs to start the slave MySQL server with the (additional) following options:

--relay-log-info-repository="TABLE" --master-info-repository="TABLE" --worker-info-repository="TABLE"

Now that the servers are started, lets begin by changing the number of workers that will be available for executing transactions and start the slave:
mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL mts_slave_parallel_workers=2;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@mts_slave_parallel_workers;
+------------------------------+
| @@mts_slave_parallel_workers |
+------------------------------+
| 2 |
+------------------------------+
1 row in set (0.00 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.09 sec)

mysql> SELECT USER,STATE FROM INFORMATION_SCHEMA.PROCESSLIST WHERE USER='system user';
+-------------+------------------------------------------------------------------+
| USER | STATE |
+-------------+------------------------------------------------------------------+
| system user | Slave has read all relay log; waiting for the slave I/O thread t |
| system user | Waiting for an event from sql thread |
| system user | Waiting for an event from sql thread |
| system user | Waiting for master to send event |
+-------------+------------------------------------------------------------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM mysql.slave_worker_info\G
*************************** 1. row ***************************
Master_id: 2
Worker_id: 0
Relay_log_name:
Relay_log_pos: 0
Master_log_name:
Master_log_pos: 0
Checkpoint_relay_log_name:
Checkpoint_relay_log_pos: 0
Checkpoint_master_log_name:
Checkpoint_master_log_pos: 0
Checkpoint_seqno: 0
Checkpoint_group_size: 64
Checkpoint_group_bitmap:
*************************** 2. row ***************************
Master_id: 2
Worker_id: 1
Relay_log_name:
Relay_log_pos: 0
Master_log_name:
Master_log_pos: 0
Checkpoint_relay_log_name:
Checkpoint_relay_log_pos: 0
Checkpoint_master_log_name:
Checkpoint_master_log_pos: 0
Checkpoint_seqno: 0
Checkpoint_group_size: 64
Checkpoint_group_bitmap:
2 rows in set (0.00 sec)

In the instructions above, we have changed the number of workers and then we started the slave and showed the status of all "system user" processes entries. There is one SQL thread that acts as coordinator, and is waiting for the IO thread to queue more events into the relay log. Additionally, we can find two worker threads and an IO thread that are waiting for events. Finally, we also printed out the position entries for each worker by inspecting the "mysql.slave_worker_info" table.

Next, lets issue a couple of statements that fit the multi-threaded use cases described previously. Thence, we need to create a couple of databases, some tables and issue some DML statements over those tables. On the master, we issue the following:

mysql> CREATE DATABASE db1;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE DATABASE db2;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE TABLE db2.t1 (a INT);
Query OK, 0 rows affected (0.03 sec)

mysql> CREATE TABLE db1.t1 (a INT);
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO db1.t1 VALUES (1), (2); INSERT INTO db2.t1 VALUES (3),(4);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

At this point we have inserted into two tables that are on different databases. Since they are non-conflicting, the slave server will execute these two statements concurrently, one on each slave worker. Lets have a look at each worker positions:

mysql> SELECT * FROM mysql.slave_worker_info\G
*************************** 1. row ***************************
Master_id: 2
Worker_id: 0
Relay_log_name: ./slave-relay-bin.000003
Relay_log_pos: 1081
Master_log_name: master-bin.000001
Master_log_pos: 927
Checkpoint_relay_log_name: ./slave-relay-bin.000003
Checkpoint_relay_log_pos: 626
Checkpoint_master_log_name: master-bin.000001
Checkpoint_master_log_pos: 472
Checkpoint_seqno: 1
Checkpoint_group_size: 64
Checkpoint_group_bitmap:
*************************** 2. row ***************************
Master_id: 2
Worker_id: 1
Relay_log_name: ./slave-relay-bin.000003
Relay_log_pos: 854
Master_log_name: master-bin.000001
Master_log_pos: 700
Checkpoint_relay_log_name: ./slave-relay-bin.000003
Checkpoint_relay_log_pos: 626
Checkpoint_master_log_name: master-bin.000001
Checkpoint_master_log_pos: 472
Checkpoint_seqno: 0
Checkpoint_group_size: 64
Checkpoint_group_bitmap:
2 rows in set (0.00 sec)


The table shows us that both workers were engaged during the apply activities. Worker #1 handled insert for one database, while worker #2 handled the other insert (on the other database).

This is the very basics of multi-threaded slave, thence you should go and try/play with it yourself!

Tuning

The multi-threaded slave can be tuned by changing some of its parameters. The following are just a few (those that are somewhat more relevant). Since this is still work in progress, it can be that they get renamed or even removed before the feature is actually released in a GA version. Anyway, if you're playing with the feature preview, the following might be helpful:

  --mts-slave-parallel-workers=#
Number of worker threads for executing events in parallel.

--mts-pending-jobs-size-max=#
Max size of Slave Worker queues holding yet not
applied events.

A quick and dirty list of parameters that the server can actually accept related to multi-threaded slave, can be obtained by running mysqld --help as in the following command:

shell> ./bin/mysqld --verbose --help | grep mts

[... some error messages might show up here ...]

--mts-checkpoint-group=#
--mts-checkpoint-period=#
--mts-coordinator-basic-nap=#
--mts-exp-slave-local-timestamp
--mts-partition-hash-soft-max=#
Number of records in the mts partition hash below which
--mts-pending-jobs-size-max=#
--mts-slave-parallel-workers=#
--mts-slave-worker-queue-len-max=#
all queues are governed by mts_pending_jobs_size_max.
--mts-worker-underrun-level=#
mts-checkpoint-group 512
mts-checkpoint-period 300
mts-coordinator-basic-nap 5
mts-exp-slave-local-timestamp FALSE
mts-partition-hash-soft-max 16
mts-pending-jobs-size-max 16777216
mts-slave-parallel-workers 0
mts-slave-worker-queue-len-max 40000
mts-worker-underrun-level 0

Summary

This blog post presents the need for a Multi-threaded slave and stresses the fact that Oracle's MySQL replication team delivered a feature preview based on the MySQL 5.6 codebase. The user can download it from MySQL Labs.

Apart from explaining the needs, it also details what its use cases are and explains briefly how it actually works internally.

It finishes by presenting a few MySQL commands that one can use in the feature preview server to actually make use of several threads at the slave.

2 comments:

  1. Thank you for taking the time to provide these details. I look forward to trying this on test servers in production.

    ReplyDelete