Monday 23 June 2014

How MySQL optimizer decides which filesort algorithm to be used while processing an order by clause.

Today when i was going through the variable "max_length_for_sort_data" got to know some good information how order by internally works and which kind of algorithm it uses to process order by. It is something interesting to me, so decided to share this information with you guys also.

As of MySQL 5.6 There are 2 variations of file sort algorithm used as explained follows :

    1) The original filesort algorithm
    2) The modified filesort algorithm

There are few differences between original and modified file sort algorithms like how it retrieves and process the data from the table. For more details go through.

I would like to focus mainly on the following :
    a) The "original filesort" algorithm retrieves the rowid and the columns specified in the order by clause and put them in the sort buffer size, sort the data in the soft buffer size and again retrieve the data for other columns specified in the select statement from the table. so it will read the data from the table twice.
    b) But, when it comes to the "modified filesort" algorithm it will retrieve the rowid and all the columns specified in the select statement from the table to sort_buffer_size, then sort the data based on the columns specified in order by clause and sends the output.For more details go through.

Suppose that a table t1 has four VARCHAR columns a, b, c, and d and that the optimizer uses filesort for this query:

SELECT * FROM t1 ORDER BY a, b;

The query sorts by a and b, but returns all columns, so the columns referenced by the query are a, b, c, and d. Depending on which filesort algorithm the optimizer chooses, the query executes as follows:

For the original algorithm, sort buffer tuples have these contents: (fixed size a value, fixed size b value,row ID into t1)
The optimizer sorts on the fixed size values. After sorting, the optimizer reads the tuples in order and uses the row ID in each tuple to read rows from t1 to obtain the select list column values.

For the modified algorithm, sort buffer tuples have these contents: (fixed size a value, fixed size b value,a value, b value, c value, d value)
The optimizer sorts on the fixed size values. After sorting, the optimizer reads the tuples in order and uses the values for a, b, c, and d to obtain the select list column values without reading t1 again.

So how mysql decides which filesort algorithm to be used?

Based on the value specified by "max_length_for_sort_data", mysql decides which filesort algorithm to be used. Lets go through more detail how it will work.

For a given query optimizer first calculates the "row_size" which includes the rowid,pointers and the columns data specified in the select statement's select clause and order by clause, if the row_size > max_length_for_sort_data then optimizer uses the "original filesort" algorithm, if the row_size <= max_length_for_sort_data then optimizer uses the "modified filesort" algorithm.

The following is the detailed example which illustrates how exactly it works :

mysql> create table t1 (a varchar(10), b varchar(10), c varchar(10), d varchar(10));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values('1','1','1','1'),('2','2','2','2'),('3','3','3','3');
Query OK, 3 rows affected (0.17 sec)
Records: 3  Duplicates: 0  Warnings: 0


mysql> SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on;
Query OK, 0 rows affected (0.00 sec)

mysql>  SET OPTIMIZER_TRACE_MAX_MEM_SIZE=90000000;
Query OK, 0 rows affected (0.00 sec)

CASE #1 :

mysql> set max_length_for_sort_data=128;
Query OK, 0 rows affected (0.00 sec)

NOTE : I am reducing the value of "max_length_for_sort_data" from its default (1024) to 128 for testing.

mysql>  select * from t1 order by a,b;
+------+------+------+------+
| a    | b    | c    | d    |
+------+------+------+------+
| 1    | 1    | 1    | 1    |
| 2    | 2    | 2    | 2    |
| 3    | 3    | 3    | 3    |
+------+------+------+------+

mysql> select * from information_schema.optimizer_trace\G
*************************** 1. row ***************************
                            QUERY: select * from t1 order by a,b
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `t1`.`a` AS `a`,`t1`.`b` AS `b`,`t1`.`c` AS `c`,`t1`.`d` AS `d` from `t1` order by `t1`.`a`,`t1`.`b`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "table_dependencies": [
              {
                "table": "`t1`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "rows_estimation": [
              {
                "table": "`t1`",
                "table_scan": {
                  "rows": 3,
                  "cost": 1
                } /* table_scan */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "scan",
                      "rows": 3,
                      "cost": 1.6,
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "cost_for_plan": 1.6,
                "rows_for_plan": 3,
                "sort_cost": 3,
                "new_cost_for_plan": 4.6,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": null,
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`t1`",
                  "attached": null
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`t1`.`a`,`t1`.`b`",
              "items": [
                {
                  "item": "`t1`.`a`"
                },
                {
                  "item": "`t1`.`b`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`t1`.`a`,`t1`.`b`"
            } /* clause_processing */
          },
          {
            "refine_plan": [
              {
                "table": "`t1`",
                "access_type": "table_scan"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`t1`",
                "field": "a"
              },
              {
                "direction": "asc",
                "table": "`t1`",
                "field": "b"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 3,
              "examined_rows": 3,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 63224,
              "sort_mode": "<sort_key, rowid>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.01 sec)

From the above output if we look at "filesort_information" section, "sort_mode": "<sort_key, rowid>" which indicate that the filesort algorithm used is "original filesort".

I have tried with different values for "max_length_for_sort_data" and at last it worked for me at the value of "max_length_for_sort_data=171", why because the row_lenth is 171 bytes ( it didn' worked for me even the values is 170, and it worked in all the cases if the value is >=171).

CASE #2 :

mysql> set max_length_for_sort_data=171;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 order by a,b;
+------+------+------+------+
| a    | b    | c    | d    |
+------+------+------+------+
| 1    | 1    | 1    | 1    |
| 2    | 2    | 2    | 2    |
| 3    | 3    | 3    | 3    |
+------+------+------+------+
3 rows in set (0.00 sec)

mysql> select * from information_schema.optimizer_trace\G                                                                                      *************************** 1. row ***************************
                            QUERY: select * from t1 order by a,b
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `t1`.`a` AS `a`,`t1`.`b` AS `b`,`t1`.`c` AS `c`,`t1`.`d` AS `d` from `t1` order by `t1`.`a`,`t1`.`b`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "table_dependencies": [
              {
                "table": "`t1`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "rows_estimation": [
              {
                "table": "`t1`",
                "table_scan": {
                  "rows": 3,
                  "cost": 1
                } /* table_scan */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "scan",
                      "rows": 3,
                      "cost": 1.6,
                      "chosen": true,
                      "use_tmp_table": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "cost_for_plan": 1.6,
                "rows_for_plan": 3,
                "sort_cost": 3,
                "new_cost_for_plan": 4.6,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": null,
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`t1`",
                  "attached": null
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`t1`.`a`,`t1`.`b`",
              "items": [
                {
                  "item": "`t1`.`a`"
                },
                {
                  "item": "`t1`.`b`"
                }
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`t1`.`a`,`t1`.`b`"
            } /* clause_processing */
          },
          {
            "refine_plan": [
              {
                "table": "`t1`",
                "access_type": "table_scan"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`t1`",
                "field": "a"
              },
              {
                "direction": "asc",
                "table": "`t1`",
                "field": "b"
              }
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": {
              "usable": false,
              "cause": "not applicable (no LIMIT)"
            } /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": {
              "rows": 3,
              "examined_rows": 3,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 202091,
              "sort_mode": "<sort_key, additional_fields>"
            } /* filesort_summary */
          }
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.01 sec)

From the above output if we look at "filesort_information" section, "sort_mode": "<sort_key, additional_fields>" which indicate that the filesort algorithm used is "modidied filesort" and also observed that sort buffer size used is high (202091) in case of "modified filesort" algorithm and it is low (63224) in case of "original filesort" algorithm, it is because of in case of "modified filesort" algorithm it keeps all select columns data in the sort buffer.

CONCLUSION : For better execution of order by queries we may need to increase this value to some extent based on requirements and before change this value do some testing and then only proceed. Most of the cases all the columns specified in the select statement’s select clause might not be specified in the order by clause, if that is the case leaving this value to the default might give the better performance than changing this value.  

For information about the optimizer trace, see MySQL Internals: Tracing the Optimizer


MySQL InternalsMySQL HighAvailabilityMySQL Performance TuningMySQL Query OptimizationMySQL performanceMySQL FabricMySQL HAMySQL InstallationMySQL UpgradeInnoDB Performance TuningInnoDB Buffer Pool SizeMySQL Performance TuningMySQL ClusterMySQL Latest NewsNews and EventsMySQL Customers

Tuesday 3 June 2014

How MySQL Manages InnoDB Buffer Pool Internally

InnoDB manages the pool as a list, using a variation of the least recently used (LRU) algorithm. When room is needed to add a new block to the pool, InnoDB evicts the least recently used block and adds the new block to the middle of the list. This “midpoint insertion strategy” treats the list as two sublists:

    At the head, a sublist of “new” (or “young”) blocks that were accessed recently.

    At the tail, a sublist of “old” blocks that were accessed less recently.


This algorithm keeps blocks that are heavily used by queries in the new sublist. The old sublist contains less-used blocks; these blocks are candidates for eviction.

The LRU algorithm operates as follows by default:

    3/8 of the buffer pool is devoted to the old sublist.

    The midpoint of the list is the boundary where the tail of the new sublist meets the head of the old sublist.

    When InnoDB reads a block into the buffer pool, it initially inserts it at the midpoint (the head of the old sublist). A block can be read in because it is required for a user-specified operation such as an SQL query, or as part of a read-ahead operation performed automatically by InnoDB.

    Accessing a block in the old sublist makes it “young”, moving it to the head of the buffer pool (the head of the new sublist). If the block was read in because it was required, the first access occurs immediately and the block is made young. If the block was read in due to read-ahead, the first access does not occur immediately (and might not occur at all before the block is evicted).

    As the database operates, blocks in the buffer pool that are not accessed “age” by moving toward the tail of the list. Blocks in both the new and old sublists age as other blocks are made new. Blocks in the old sublist also age as blocks are inserted at the midpoint. Eventually, a block that remains unused for long enough reaches the tail of the old sublist and is evicted.

By default, blocks read by queries immediately move into the new sublist, meaning they will stay in the buffer pool for a long time. A table scan (such as performed for a mysqldump operation, or a SELECT statement with no WHERE clause) can bring a large amount of data into the buffer pool and evict an equivalent amount of older data, even if the new data is never used again. Similarly, blocks that are loaded by the read-ahead background thread and then accessed only once move to the head of the new list. These situations can push frequently used blocks to the old sublist, where they become subject to eviction.

For more information go through http://dev.mysql.com/doc/refman/5.6/en/innodb-buffer-pool.html



MySQL InternalsMySQL HighAvailabilityMySQL Performance TuningMySQL Query OptimizationMySQL performanceMySQL FabricMySQL HAMySQL InstallationMySQL UpgradeInnoDB Performance TuningInnoDB Buffer Pool SizeMySQL Performance TuningMySQL ClusterMySQL Latest NewsNews and EventsMySQL Customers