Row events, Before and After images
In RBR, one row event contains changes to one or more rows in a given table. A row change, in its turn, may consist of one or two full row copies of the row that is being changed. These are generally known as row images. The first one, known as before image (BI), contains data as it was before the row was modified. The second one, known as after image (AI), is what the row looks like after the changes were done. Each image has different purpose: the BI is used for locating, in the storage engine, the row to be updated/deleted, while the AI, is used for replaying the actual changes. Needless to say, not both images are needed for every operation. Deletes only need the BI, inserts, on the other hand, just need the AI, while updates need both.
This is pretty neat stuff already, however, there is room for improvements, in particular to address storage overuse. Logging can be optimized in such a way that writing full images into the binary log can be avoided, and instead, log only those parts that are meaningful for replication. Not only does this help with reducing storage space, but also enables less network bandwidth consumption - smaller events are transmitted to the slaves - as well as smaller mysql memory footprint - less buffer usage for handling row events. So… How can one tackle this ?
Optimizing Before Image logging
The one and only usage for such image is to help finding the correct row to be updated or deleted at the slave's storage engine. Locating a row is done by considering one of the following search methods: search by primary key, by unique key, by an index scan or by doing a table scan - depending on what indexes are available on the slave's table. As such, assuming that the index structures are the same on master and slaves' tables, BIs can be logged based on the uniqueness of the index structures available. If there is a primary or unique key (with non-nullable columns), the fields that compose the key are enough to locate the row at the slave, thus BI can be composed of just those fields. On the other hand, if there is no unique key, all fields in the row need to be written into the binary log. Here is an example:
MASTER> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 char(1), c3 TEXT);
MASTER> INSERT INTO t1 VALUES (1, 'a', '1MB_text...');
MASTER> DELETE FROM t1 WHERE c1=1;
Logging the full BI in this case, means that each row deleted has a correspondent before image stored in the binary log sizing up to (4 + 1 + 1024*1024) bytes ~ 1MB. Deleting 10 rows in the table, easily grows the binary log to 10 MB (assuming that c3 is always 1MB large). Now, if only the primary key is logged, then for the same 10 rows, only 40 bytes are needed for storing the BI. Lets see how that plays out for the scenario above by trying out this feature on a real mysql session:
mysql> SET binlog_format=ROW;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 char(1), c3 LONGTEXT);
Query OK, 0 rows affected (0.10 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 238 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> SET @text= repeat('a', 1048576);
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO t1 VALUES (1, 'a', @text);
Query OK, 1 row affected (0.04 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 1049037 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> RESET MASTER;
Query OK, 0 rows affected (0.30 sec)
mysql> DELETE FROM t1 WHERE c1=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 1048913 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
As one can see here, the DELETE operation resulted in a binary log entry that takes up to 1048913 bytes, although only the primary key would be needed to locate and delete the row at the slave. Now, if the server is set to log only minimal images, then it will instead log just a few bytes:
mysql> INSERT INTO t1 VALUES (1, 'a', @text);Looking at the output from the last 'SHOW MASTER STATUS', one finds that indeed the binlog size for the delete operation decreased from 1048913 to 331, that is 1MB less than when using a full row.
Query OK, 1 row affected (0.04 sec)
mysql> RESET MASTER;
Query OK, 0 rows affected (0.37 sec)
mysql> SET SESSION binlog_row_image='MINIMAL';
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 114 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> DELETE FROM t1 WHERE c1=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 331 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
Optimizing the After Image Logging
Assuming that the table definitions are pretty much the same on both master and slave(s), the server can avoid logging the entire AI and log only the columns that were actually changed/inserted, much like in SBR. Lets look at an example:
MASTER> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 char(1), c3 CHAR(255) DEFAULT 'aaaaaaaaaaaaaaaaaaaa');
MASTER> INSERT INTO t1(c1,c2) VALUES (1, 'a');
When the server is set to log full rows, the insert operation will generate an AI containing c1, c2 and c3 values. This means that an additional 20 bytes ('aaaaaaaaaaaaaaaaaaaa') will be logged pointlessly. On the other hand, if the server is set to log only minimal rows, the extra 20 bytes are avoided. Lets check what happens with a real sample:
mysql> SET binlog_format=ROW;
Query OK, 0 rows affected (0.00 sec)
mysql> SET SESSION binlog_row_image='FULL';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 char(1), c3 VARCHAR(1024) DEFAULT 'aaaaaaaaaaaaaaaaaaaa');
Query OK, 0 rows affected (0.10 sec)
mysql> RESET MASTER;
Query OK, 0 rows affected (0.37 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 114 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO t1(c1,c2) VALUES (1, 'a');
Query OK, 1 row affected (0.00 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 356 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
We can find that the event logged, corresponding to the INSERT operation, grows the binary log up to 356 bytes. Setting the server to log only minimal images we get the following:
mysql> SET SESSION binlog_row_image='MINIMAL';
Query OK, 0 rows affected (0.00 sec)
mysql> DELETE FROM t1;
Query OK, 1 row affected (0.00 sec)
mysql> RESET MASTER;
Query OK, 0 rows affected (0.34 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 114 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO t1(c1,c2) VALUES (1, 'a');
Query OK, 1 row affected (0.00 sec)
mysql> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000001 | 334 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
Again, looking at the last SHOW MASTER STATUS command, one finds that the event entry, related to the INSERT operation, grows the binary log to 334 instead of 356 bytes, i.e., 22 bytes less (2 bytes for c3 metadata + 20 bytes for c3 content).
The case for UPDATE operations
Updates generate pairs of images for each row changed. One needs the BI to locate the row and the AI to actually update it. Thus, this operation benefits from both optimizations that were described previously for the INSERT and DELETE operations. The following picture depicts what columns each image contains in previous MySQL versions and compares against 5.6.2 with partial rows logging enabled (assuming same queries and table definition as in the example shown before). One can easily see that in the green shaded box (5.6.2 DM binary log with partial logging turned on) BI and AI contain far less columns (the dark green boxes with "Cn" on them):

BLOBs special case
Apart from full and minimal rows, there is also a special option to optimize just the logging of blob fields, if setting binlog_row_image to 'NOBLOB';
In this case, replication will avoid logging blob fields whenever possible (BI: when they are not needed to locate the row, AI: when they are not modified). The other fields are still logged as if it the option binlog_row_image had been set to 'FULL'.
Summary
This post presents a new feature, available on MySQL 5.6.2 DM release, which enables more flexible logging of ROW events. It gives the user the freedom to choose whether the server should log full or partial rows, which is a major asset when either changing a large number of rows and/or changing sizable rows.
There are some assumptions when using this feature, for instance:
- The tables on both master and slave should have the same index structures, or at least the index set on the master should be a subset of slave's index set. If not, unexpected behavior may happen;
- The table definition should be the same with respect to default values, otherwise on INSERTs there may be different content inserted on the master and the slave (just like in SBR);
- When setting binlog_row_image to 'MINIMAL' or 'NOBLOB', the logged rows are incomplete, therefore, if the user requires that the binary log contains full rows, then he should not use these options.
Go and try it out. Have fun!
Now I can use RBR!
ReplyDelete