MySQL Internals: January 2014

Friday, 10 January 2014

Tips to MySQL Tuning and Optimization

MySQL is a powerful open-source database. With more and more database driven applications, people have been pushing MySQL to its limits. Here are 101 tips for tuning and optimizing your MySQL install. Some tips are specific to the environment they are installed on, but the concepts are universal. I have divided them up into several categories to help you with getting the most out of MySQL:

MySQL Server Hardware and OS Tuning:

1. Have enough physical memory to load your entire InnoDB file into memory – InnoDB is much faster when the file can be accessed in memory rather than from disk.
2. Avoid Swap at all costs – swapping is reading from disk, its slow.
3. Use Battery-Backed RAM.
4. Use an advanced RAID – preferably RAID10 or higher.
5. Avoid RAID5 – the checksum needed to ensure integrity is costly.
6. Separate your OS and data partitions, not just logically, but physically – costly OS writes and reads will impact your database performance.
7. Put your mysql temp space and replication logs on a separate partition than your data – background writes will impact your database when it goes to write/read from disk.
8. More disks equals more speed.
9. Faster disks are better.
10. Use SAS over SATA.
11. Smaller disks are faster than larger disks, especially in RAID configs.
12. Use Battery-Backed Cache RAID controllers.
13. Avoid software raids.
14. Consider using Solid State IO Cards (not disk drives) for your data partition – these cards can sustain over 2GB/s writes for almost any amount of data.
15. On Linux set your swappiness value to 0 – no reason to cache files on a database server, this is more of a web server or desktop advantage.
16. Mount filesystem with noatime and nodirtime if available – no reason to update database file modification times for access.
17. Use XFS filesystem – a faster, smaller filesystem than ext3 and has more options for journaling, also ext3 has been shown to have double buffering issues with MySQL.
18. Tune your XFS filesystem log and buffer variables – for maximum performance benchmark.
19. On Linux systems, use NOOP or DEADLINE IO scheduler – the CFQ and ANTICIPATORY scheduler have been shown to be slow vs NOOP and DEADLINE scheduler.
20. Use a 64-bit OS – more memory addressable and usable to MySQL.
21. Remove unused packages and daemons from servers – less resource stealing.
22. Put your host that use MySQL and your MySQL host in a hosts file – no dns lookups.
23. Never force kill a MySQL process – you will corrupt your database and be running for the backups.
24. Dedicate your server to MySQL – background processes and other services can steal from the db cpu time.

MySQL Configuration:

25. Use innodb_flush_method=O_DIRECT to avoid a double buffer when writing.
26. Avoid O_DIRECT and EXT3 filesystem – you will serialize all your writes.
27. Allocate enough innodb_buffer_pool_size to load your entire InnoDB file into memory – less reads from disk.
28. Do not make innodb_log_file_size too big, with faster and more disks – flushing more often is good and lowers the recovery time during crashes.
29. Do not mix innodb_thread_concurrency and thread_concurrency variables – these two values are not compatible.
30. Allocate a minimal amount for max_connections – too many connections can use up your RAM and lock up your MySQL server.
31. Keep thread_cache at a relatively high number, about 16 – to prevent slowness when opening connections.
32. Use skip-name-resolve – to remove dns lookups.
33. Use query cache if your queries are repetitive and your data does not change often – however using query cache on data that changes often will give you a performance hit.
34. Increase temp_table_size – to prevent disk writes.
35. Increase max_heap_table_size – to prevent disk writes.
36. Do not set your sort_buffer_size too high – this is per connection and can use up memory fast.
37. Monitor key_read_requests and key_reads to determine your key_buffer size – the key read requests should be higher than your key_reads, otherwise you are not efficiently using your key_buffer.
38. Set innodb_flush_log_at_trx_commit = 0 will improve performance, but leaving it to default (1), you will ensure data integrity, you will also ensure replication is not lagging
39. Have a test environment where you can test your configs and restart often, without affecting production.

MySQL Schema Optimization:

40. Keep your database trim.
41. Archive old data – to remove excessive row returns or searches on queries.
42. Put indexes on your data.
43. Do not overuse indexes, compare with your queries.
44. Compress text and blob data types – to save space and reduce number of disk reads.
45. UTF 8 and UTF16 is slower than latin1.
46. Use Triggers sparingly.
47. Keep redundant data to a minimum – do not duplicate data unnecessarily.
48. Use linking tables rather than extending rows.
49. Pay attention to your data types, use the smallest one possible for your real data.
50. Separate blob/text data from other data if other data is often used for queries when blob/text are not.
51. Check and optimize tables often.
52. Rewrite InnoDB tables often to optimize.
53. Sometimes, it is faster to drop indexes when adding columns and then add indexes back.
54. Use different storage engines for different needs.
55. Use ARCHIVE storage engine for Logging tables or Auditing tables – this is much more efficient for writes.
56. Store session data in memcache rather than MySQL – memcache allows for auto-expiring values and prevents you from having to create costly reads and writes to MySQL for temporal data.
57. Use VARCHAR instead CHAR when storing variable length strings – to save space since CHAR is fixed length and VARCHAR is not (utf8 is not affected by this).
58. Make schema changes incrementally – a small change can have drastic effects.
59. Test all schema changes in a development environment that mirrors production.
60. Do NOT arbitrarily change values in your config file, it can have disastrous affects.
61. Sometimes less is more in MySQL configs.
62. When in doubt use a generic MySQL config file.

Query Optimization:

63. Use the slow query log to find slow queries.
64. Use EXPLAIN to determine queries are functioning appropriately.
65. Test your queries often to see if they are performing optimally – performance will change over time.
66. Avoid count(*) on entire tables, it can lock the entire table.
67. Make queries uniform so subsequent similar queries will use query cache.
68. Use GROUP BY instead of DISTINCT when appropriate.
69. Use indexed columns in WHERE, GROUP BY, and ORDER BY clauses.
70. Keep indexes simple, do not reuse a column in multiple indexes.
71. Sometimes MySQL chooses the wrong index, use USE INDEX for this case
72. Check for issues using SQL_MODE=STRICT.
73. Use a LIMIT on UNION instead of OR for less than 5 indexed fields.
74. Use INSERT ON DUPLICATE KEY or INSERT IGNORE instead of UPDATE to avoid the SELECT prior to update.
75. Use a indexed field and ORDER BY instead of MAX.
76. Avoid using ORDER BY RAND().
77. LIMIT M,N can actually slow down queries in certain circumstances, use sparingly.
78. Use UNION instead of sub-queries in WHERE clauses.
79. For UPDATES, use SHARE MODE to prevent exclusive locks.
80. On restarts of MySQL, remember to warm your database, to ensure that your data is in memory and queries are fast.
81. Use DROP TABLE then CREATE TABLE instead of DELETE FROM to remove all data from a table.
82. Minimize the data in your query to only the data you need, using * is overkill most of the time.
83. Consider persistent connections instead of multiple connections to reduce overhead.
84. Benchmark queries, including using load on the server, sometimes a simple query can have affects on other queries.
85. When load increases on your server, use SHOW PROCESSLIST to view slow/problematic queries.
86. Test all suspect queries in a development environment where you have mirrored production data.

MySQL Backup Procedures:

87. Backup from secondary replicated server.
88. Stop replication during backups to prevent inconsistencies on data dependencies and foreign constraints.
89. Stop MySQL altogether and take a backup of the database files.
90. Backup binary logs at same time as dumpfile if MySQL dump used – to make sure replication does not break.
91. Do not trust an LVM snapshot for backups – this could create data inconsistencies that will give you issues in the future.
92. Make dumps per table for easier single table recovery – if data is isolated from other tables.
93. Use –opt when using mysqldump.
94. Check and Optimize tables before a backup.
95. When importing temporarily disable foreign constraints for a faster import.
96. When importing temporarily disable unique checks for a faster import.
97. Calculate size of database/tables data and indexes after each backup to monitor growth.
98. Monitor slave replication for errors and delay with a cron script.
99. Perform Backups regularly.
100. Test your backups regularly.

MySQL Internals, MySQL HighAvailability, MySQL Performance Tuning, MySQL Query Optimization, MySQL performance, MySQL Fabric, MySQL HA, MySQL Installation, MySQL Upgrade, InnoDB Performance Tuning, InnoDB Buffer Pool Size, MySQL Performance Tuning, MySQL Cluster MySQL Latest News, News and Events, MySQL Customers

Thursday, 9 January 2014

MySQL Error 1356 mysqldump Couldn’t execute

MySQL Error mysqldump Couldn’t execute references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them (1356)

If you get an error messages like “mysqldump: Couldn’t execute ‘SHOW FIELDS FROM `store_information`’: View ‘database.viewname’ references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them (1356)” then a number of things might be wrong:
» The view is “corrupted” (this could happen if the table the view is of has been updated in a way so that the view is no longer correct – for instance if the table structure is updated)
» The user you are using are lacking permissions (to the view/table (that the view is referencing) )
» If the database is having any temporary tables then there is a chance to get this kind of error as well.

What I usually do when this happens to solve it is to:
1.Verify if the user running the mysqldump has the permissions (or test as root if you can – as root has more permissions).
If this is the case: update the permissions on the user running the dump – or change to another user

2.Try to find out if the table the view is referencing has been updated (think back what have I done check with the rest of the team)

Or if you do not wish to/can not do this, check a previous backup up the base table (from when mysqldump worked with the view) and compare the structure with the current structure in the database.

If this is the case: I just drop the view and recreate the view so that it gets corrected in the underlying db structure and re-run the dump.

Overview of MySQL Architecture

MySQL is based on tiered architecture,consisting of both subsystems and support components that interact with each other to read,parse,and execute queries,and to cache and return query results.
MySQL architecture consists of five primary subsystems that work together to respond to a request made to MySQL database server.

1) Query Engine

SQL Interface :

The SQL interface provides the mechanisms to receive commands and transmit results to the user. The MySQL SQL interface was built to the ANSI SQL standard and accepts the same basic SQL statements as most ANSI-compliant database servers. Although many of the SQL commands supported in MySQL have options that are not ANSI standard, the MySQL developers have stayed very close to the ANSI SQL standard.Connections to the database server are received from the network communication pathways and a thread is created for each. The threaded process is the heart of the executable pathway in the MySQL server. MySQL is built as a true multi threaded application whereby each thread executes independently of the other threads (except for certain helper threads). The incoming SQL command is stored in a class structure and the results are transmitted to the client by writing the results out to the network communication protocols. Once a thread has been created,the MySQL server attempts to parse the SQL command and store the parts in the internal data structure.

Parser :

When a client issues a query, a new thread is created and the SQL statement is forwarded to the parser for syntactic validation (or rejection due to errors). The MySQL parser is implemented using a large Lex-YACC script that is compiled with Bison. The parser constructs a query structure used to represent the query statement (SQL) in memory as a tree structure (also called an abstract syntax tree) that can be used to execute the query.

Query Optimizer :

The MySQL query optimizer subsystem is considered by some to be misnamed. The optimizer used is a SELECT-PROJECT-JOIN strategy that attempts to restructure the query by first doing any restrictions (SELECT) to narrow the number of tuples to work with, then performs the projections to reduce the number of attributes (fields) in the resulting tuples, and finally evaluates any join conditions. While not considered a member of the extremely complicated query optimizer category, the SELECT-PROJECT-JOIN strategy falls into the category of heuristic optimizers. In this case, the heuristics (rules) are simply
• Horizontally eliminate extra data by evaluating the expressions in the WHERE (HAVING) clause.
• Vertically eliminate extra data by limiting the data to the attributes specified in the attribute list. The exception is the storage of the attributes used in the join clause that may not be kept in the final query.
• Evaluate join expressions. This results in a strategy that ensures a known-good access method to retrieve data in an efficient manner. Despite critical reviews, the SELECT-PROJECT-JOIN strategy has proven effective at executing the typical queries found in transaction processing.

The first step in the optimizer is to check for the existence of tables and access control by the user. If there are errors, the appropriate error message is returned and control returns to the thread manager, or listener. Once the correct tables have been identified, they are opened
and the appropriate locks are applied for concurrency control. Once all of the maintenance and setup tasks are complete, the optimizer uses the internal query structure and evaluates the WHERE conditions (a restrict operation) of the query. Results are returned as temporary tables to prepare for the next step. If UNION operators are present, the optimizer executes the SELECT portions of all statements in a loop before continuing. The next step in the optimizer is to execute the projections. These are executed in a similar manner as the restrict portions, again storing the intermediate results as temporary tables and saving only those attributes specified in the column specification in the SELECT statement.
Lastly, the structure is analyzed for any JOIN conditions that are built using the join class, and then the join::optimize() method is called. At this stage the query is optimized by evaluating the expressions and eliminating any conditions that result in dead branches or always true or always false conditions (as well as many other similar optimizations). The optimizer is attempting to eliminate any known-bad conditions in the query before executing the join. This is done because joins are the most expensive and time consuming of all of the relational operators. It is also important to note that the join optimization step is performed for all queries that have a WHERE or HAVING clause regardless of whether there are any join conditions. This enables developers to concentrate all of the expression evaluation code in one place. Once the join
optimization is complete, the optimizer uses a series of conditional statements to route the query to the appropriate library method for execution.

Query Execution :

Execution of the query is handled by a set of library methods designed to implement a particular query. For example, the mysql_insert() method is designed to insert data. Likewise, there is a mysql_select() method designed to find and return data matching the WHERE clause. This library of execution methods is located in a variety of source code files under a file of a similar name (e.g., sql_insert.cc or sql_select.cc). All of these methods have as a parameter a thread object that permits the method to access the internal query structure and eases execution. Results from each of the execution methods are returned using the network communication pathways library. The query execution library methods are clearly implemented using the interpretative model of query execution.

Query Cache :

While not its own subsystem, the query cache should be considered a vital part of the query optimization and execution subsystem. The query cache is a marvelous invention that caches not only the query structure but also the query results themselves. This enables the system to
check for frequently used queries and shortcut the entire query optimization and execution stages altogether. This is another of the technologies that is unique to MySQL. Other database system cache queries, but no others cache the actual results. As you can appreciate, the query cache must also allow for situations where the results are “dirty” in the sense that something has changed since the last time the query was run (e.g., an INSERT, UPDATE, or DELETE was run against the base table) and that the cached queries may need to be occasionally purged.

2) Buffer Manager/Cache and Buffers

The caching and buffers subsystem is responsible for ensuring that the most frequently used data (or structures, as you will see) are available in the most efficient manner possible. In other words, the data must be resident or ready to read at all times. The caches dramatically increase the response time for requests for that data because the data is in memory and thus no additional disk access is necessary to retrieve it. The cache subsystem was created to encapsulate all of the caching and buffering into a loosely coupled set of library functions. Although you will find the caches implemented in several different source code files, they are considered part of the same subsystem.
A number of caches are implemented in this subsystem. Most of the cache mechanisms use the same or similar concept of storing data as structures in a linked list. The caches are implemented in different portions of the code to tailor the implementation to the type of data
that is being cached. Let’s look at each of the caches.

Table Cache :

The table cache was created to minimize the overhead in opening, reading, and closing tables (the .FRM files on disk). For this reason, the table cache is designed to store metadata about the tables in memory. This makes it much faster for a thread to read the schema of the table without having to reopen the file every time. Each thread has its own list of table cache structures. This permits the threads to maintain their own views of the tables so that if one thread is altering the schema of a table (but has not committed the changes) another thread may use that table with the original schema. The structure used is a simple one that includes all of the metadata information for a table. The structures are stored in a linked list in memory and associated with each thread.

Record Cache :

The record cache was created to enhance sequential reads from the storage engines. Thus the record cache is usually only used during table scans. It works like a read-ahead buffer by retrieving a block of data at a time, thus resulting in fewer disk accesses during the scan. Fewer disk accesses generally equates to improved performance. Interestingly, the record cache is also used in writing data sequentially by writing the new (or altered) data to the cache first and then writing the cache to disk when full. In this way write performance is improved as well. This sequential behavior (called locality of reference) is the main reason the record cache is most often used with the MyISAM storage engine, although it is not limited to MyISAM. The record cache is implemented in an agnostic manner that doesn’t interfere with the code used to
access the storage engine API. Developers don’t have to do anything to take advantage of the record cache as it is implemented within the layers of the API.

Key Cache :

The key cache is a buffer for frequently used index data. In this case, it is a block of data for the index file (B-tree) and is used exclusively for MyISAM tables (the .MYI files on disk). The indexes themselves are stored as linked lists within the key cache structure. A key cache is created when a MyISAM table is opened for the first time. The key cache is accessed on every index read. If an index is found in the cache, it is read from there; otherwise, a new index block must be read from disk and placed into the cache. However, the cache has a limited size and is tunable by changing the key_cache_block_size configuration variable. Thus not all blocks of the index file will fit into memory.
So how does the system keep track of which blocks have been used?
The cache implements a monitoring system to keep track of how frequent the index blocks are used. The key cache has been implemented to keep
track of how “warm” the index blocks are. Warm in this case refers to how many times the index block has been accessed over time. Values for warm include BLOCK_COLD, BLOCK_WARM, and BLOCK_HOT. As the blocks cool off and new blocks become warm, the cold blocks are purged and the warm blocks added. This strategy is a least recently used (LRU) page-replacement strategy—the same algorithm used for virtual memory management and disk buffering in operating systems—that has been proven to be remarkably efficient even in the face of much more sophisticated page replacement algorithms. In a similar way, the key cache keeps track of the index blocks that have changed (called getting “dirty”). When a dirty block is purged, its data is written back to the index file on disk before being replaced. Conversely, when a clean block is purged it is simply removed from memory.

Privilege Cache :

The privilege cache is used to store grant data on a user account. This data is stored in the same manner as an access control list (ACL), which lists all of the privileges a user has for an object in the system. The privilege cache is implemented as a structure stored in a first in, last out (FILO) hash table. Data for the cache is gathered when the grant tables are read during user authentication and initialization. It is important to store this data in memory as it saves a lot of time reading the grant tables.

Hostname Cache :

The hostname cache is another of the helper caches, like the privilege cache. It too is implemented as a stack of a structure. It contains the hostnames of all the connections to the server. It may seem surprising, but this data is frequently requested and therefore in high demand and
a candidate for a dedicated cache.

Miscellaneous :

A number of other small cache mechanisms are implemented throughout the MySQL source code. One example is the join buffer cache used during complex join operations. For example, some join operations require comparing one tuple to all the tuples in the second table. A cache in this case can store the tuples read so that the join can be implemented without having to reread the second table into memory multiple times.

3) The Storage Manager

The storage manager interfaces with the operating system to write data to the disk efficiently. Because the storage functions reside in a separate subsystem, the MYSQL engine operates at a level of abstraction away from the operating system. The storage manager writes to disk all of the data in the user tables. indexes, and logs as well as the internal system data.

4)The Transaction Manager
The function of the Transaction manager is to facilitate concurrency in data access. This subsystem provides a locking facility to ensure that multiple simultaneous users access the data in consistent way. Without corrupting or damaging the data. Transaction control takes place via the lock manager subcomponent, which places and release locks on various objects being used in transaction.

5) Recovery Manager.
The Recovery Manager’s job is to keep copies of data for retrieval later, in case of loss of data. It also logs commands that modify the data and other significant events inside the database So far, only InnoDB and BDB table handlers.

Subsystem Interaction and control Flow :

The Query engine requests the data to be read or written to buffer manager to satisfy a users query.it depend on the transaction manager for locking of the data so that the concurrency is ensured. To perform table creation and drop operations, the query engine accesses the storage manager directly, bypassing the buffer manager.to create or delete the files in the file system.

The buffer manager caches data from the storage manager for efficient retrieval by the query engine. It depends on the transaction manager to check the locking status of data before it performs any modification action.

The Transaction manager depends on the Query cache and storage manager to place locks on data in memory and in the file system.
The recovery manager uses the Storage manager to store command/event logs and backups of the data in the file system. It depends on the transaction manager to obtain locks on the log files being written. The recovery manager also needs to use the buffer manager during recovery from crashes.

The storage manager depends on the operating system file system for persistent storage and retrieval of data. It depends on the transaction manager to obtain locking status information.

MySQL Internals, MySQL HighAvailability, MySQL Performance Tuning, MySQL Query Optimization, MySQL performance, MySQL Fabric, MySQL HA, MySQL Installation, MySQL Upgrade, InnoDB Performance Tuning, InnoDB Buffer Pool Size, MySQL Performance Tuning, MySQL Cluster MySQL Latest News, News and Events, MySQL Customers