MySQL general query log performance effects

Question

One of our customers who has a somewhat problematic web application would like us to log all the queries for a period of 24 hours.

I wouldn't know how much data it will actually write.

The amount queries for any 24-hour period would be somewhere around 5M.

I could either allow a couple of GB of ramdisk, or mount an NFS share located on an otherwise unused JBOD.

Questions

What would be the effect if the destination where the log is written would get filled?
Would performance of the mysqld in any way (other than the general log write) be affected if the (dedicated) NFS share will start to perform slow due to heavy I/O?

Thanks in advance,

RolandoMySQLDBA · Accepted Answer · 2020-06-19T17:47:30.383

Instead of using the general log, how about going with a query profiler? In fact, you could do query profiling without using any of MySQL's log files and while the queries are still running.

You must use mk-query-digest or pt-query-digest and poll the processlist.

I learned how to use mk-query-digest from this youtube video as a replacement for the slow log: http://www.youtube.com/watch?v=GXwg1fiUF68&feature=colike

Here is the script I wrote to run the query digest program

#!/bin/sh
RUNFILE=/tmp/QueriesAreBeingDigested.txt
if [ -f ${RUNFILE} ] ; then exit ; fi
MKDQ=/usr/local/sbin/mk-query-digest
RUNTIME=${1}
COPIES_TO_KEEP=${2}
DBVIP=${3}
WHICH=/usr/bin/which
DATE=${WHICH} date
ECHO=${WHICH} echo
HEAD=${WHICH} head
TAIL=${WHICH} tail
AWK=${WHICH} awk
SED=${WHICH} sed
CAT=${WHICH} cat
WC=${WHICH} wc
RM=${WHICH} rm | ${TAIL} -1 | ${AWK} '{print $1}'
LS=${WHICH} ls | ${TAIL} -1 | ${AWK} '{print $1}'
HAS_THE_DBVIP=/sbin/ip addr show | grep &quot;scope global secondary&quot; | grep -c &quot;${DBVIP}&quot;
if [ ${HAS_THE_DBVIP} -eq 1 ] ; then exit ; fi
DT=${DATE} +&quot;%Y%m%d_%H%M%S&quot;
UNIQUETAG=${ECHO} ${SSH_CLIENT}_${SSH_CONNECTION}_${DT} | ${SED} 's/\./ /g' | ${SED} 's/ //g'
cd /root/QueryDigest
OUTFILE=QP_${DT}.txt
HOSTADDR=${DBVIP}
${MKDQ} --processlist h=${HOSTADDR},u=queryprofiler,p=queryprofiler --run-time=${RUNTIME} > ${OUTFILE}

Rotate out Old Copies

QPFILES=QPFiles.txt
QPFILES2ZAP=QPFiles2Zap.txt
${LS} QP_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9].txt > ${QPFILES}
LINECOUNT=${WC} -l &lt; ${QPFILES}
if [ ${LINECOUNT} -gt ${COPIES_TO_KEEP} ]
then
        (( DIFF = LINECOUNT - COPIES_TO_KEEP ))
        ${HEAD} -${DIFF} < ${QPFILES} > ${QPFILES2ZAP}
        for QPFILETOZAP in ${CAT} ${QPFILES2ZAP}
        do
                ${RM} ${QPFILETOZAP}
        done
fi
rm -f ${QPFILES2ZAP}
rm -f ${QPFILES}
rm -f ${RUNFILE}

Make sure

you a user called queryprofiler whose password is queryprofiler and who only has the PROCESS privilege
you put */20 * * * * /root/QueryDigest/ExecQueryDigest.sh 1190s 144 10.64.95.141 in the crontab to run every 20 minutes (Each profile is 20 min less 10 seconds, Keeps the last 144 copies, and only runs if specfifc DBVIP is present [Alter script to bypass checking for DBVIPs])

The output produces a file with the 20 worst running queries based on the number of times the query was called X avg sec per query.

Here is the sample output of the query profiling summary of mk-query-digest

# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x812D15015AD29D33   336.3867 68.5%     910   0.369656 SELECT mt_entry mt_placement mt_category
#    2 0x99E13015BFF1E75E    25.3594  5.2%     210   0.120759 SELECT mt_entry mt_objecttag
#    3 0x5E994008E9543B29    16.1608  3.3%      46   0.351321 SELECT schedule_occurrence schedule_eventschedule schedule_event schedule_eventtype schedule_event schedule_eventtype schedule_occurrence.start
#    4 0x84DD09F0FC444677    13.3070  2.7%      23   0.578567 SELECT mt_entry
#    5 0x377E0D0898266FDD    12.0870  2.5%     116   0.104199 SELECT polls_pollquestion mt_category
#    6 0x440EBDBCEDB88725    11.5159  2.3%      21   0.548376 SELECT mt_entry
#    7 0x1DC2DFD6B658021F    10.3653  2.1%      54   0.191949 SELECT mt_entry mt_placement mt_category
#    8 0x6C6318E56E149036     8.8294  1.8%      44   0.200667 SELECT schedule_occurrence schedule_eventschedule schedule_event schedule_eventtype schedule_event schedule_eventtype schedule_occurrence.start
#    9 0x392F6DA628C7FEBD     8.5243  1.7%       9   0.947143 SELECT mt_entry mt_objecttag
#   10 0x7DD2B294CFF96961     7.3753  1.5%      70   0.105362 SELECT polls_pollresponse
#   11 0x9B9092194D3910E6     5.8124  1.2%      57   0.101973 SELECT content_specialitem content_basecontentitem advertising_product organizations_neworg content_basecontentitem_item_attributes
#   12 0xA909BF76E7051792     5.6005  1.1%      55   0.101828 SELECT mt_entry mt_objecttag mt_tag
#   13 0xEBE07AC48DB8923E     5.5195  1.1%      54   0.102213 SELECT rssfeeds_contentfeeditem
#   14 0x3E52CF0261A7C3FF     4.4676  0.9%      44   0.101536 SELECT schedule_occurrence schedule_occurrence.start
#   15 0x9D0BCD3F6731195B     4.2804  0.9%      41   0.104401 SELECT mt_entry mt_placement mt_category
#   16 0x7961BD4C76277EB7     4.0143  0.8%      18   0.223014 INSERT UNION UPDATE UNION mt_session
#   17 0xD2F486BA41E7A623     3.1448  0.6%      21   0.149754 SELECT mt_entry mt_placement mt_category mt_objecttag mt_tag
#   18 0x3B9686D98BB8E054     2.9577  0.6%      11   0.268885 SELECT mt_entry mt_objecttag mt_tag
#   19 0xBB2443BF48638319     2.7239  0.6%       9   0.302660 SELECT rssfeeds_contentfeeditem
#   20 0x3D533D57D8B466CC     2.4209  0.5%      15   0.161391 SELECT mt_entry mt_placement mt_category

Above this output are histograms of these 20 top worst-performing queries

Example of the first entry's histogram

# Query 1: 0.77 QPS, 0.28x concurrency, ID 0x812D15015AD29D33 at byte 0 __
# This item is included in the report because it matches --limit.
#              pct   total     min     max     avg     95%  stddev  median
# Count         36     910
# Exec time     58    336s   101ms      2s   370ms   992ms   230ms   393ms
# Lock time      0       0       0       0       0       0       0       0
# Users                  1      mt
# Hosts                905 10.64.95.74:54707 (2), 10.64.95.74:56133 (2), 10.64.95.80:33862 (2)... 901 more
# Databases              1     mt1
# Time range 1321642802 to 1321643988
# bytes          1   1.11M   1.22k   1.41k   1.25k   1.26k   25.66   1.20k
# id            36   9.87G  11.10M  11.11M  11.11M  10.76M    0.12  10.76M
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms  ################################################################
#    1s  ###
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `mt1` LIKE 'mt_entry'\G
#    SHOW CREATE TABLE `mt1`.`mt_entry`\G
#    SHOW TABLE STATUS FROM `mt1` LIKE 'mt_placement'\G
#    SHOW CREATE TABLE `mt1`.`mt_placement`\G
#    SHOW TABLE STATUS FROM `mt1` LIKE 'mt_category'\G
#    SHOW CREATE TABLE `mt1`.`mt_category`\G
# EXPLAIN
SELECT `mt_entry`.`entry_id`, `mt_entry`.`entry_allow_comments`, `mt_entry`.`entry_allow_pings`, `mt_entry`.`entry_atom_id`, `mt_entry`.`entry_author_id`, `mt_entry`.`entry_authored_on`, `mt_entry`.`entry_basename`, `mt_entry`.`entry_blog_id`, `mt_entry`.`entry_category_id`, `mt_entry`.`entry_class`, `mt_entry`.`entry_comment_count`, `mt_entry`.`entry_convert_breaks`, `mt_entry`.`entry_created_by`, `mt_entry`.`entry_created_on`, `mt_entry`.`entry_excerpt`, `mt_entry`.`entry_keywords`, `mt_entry`.`entry_modified_by`, `mt_entry`.`entry_modified_on`, `mt_entry`.`entry_ping_count`, `mt_entry`.`entry_pinged_urls`, `mt_entry`.`entry_status`, `mt_entry`.`entry_tangent_cache`, `mt_entry`.`entry_template_id`, `mt_entry`.`entry_text`, `mt_entry`.`entry_text_more`, `mt_entry`.`entry_title`, `mt_entry`.`entry_to_ping_urls`, `mt_entry`.`entry_week_number` FROM `mt_entry` INNER JOIN `mt_placement` ON (`mt_entry`.`entry_id` = `mt_placement`.`placement_entry_id`) INNER JOIN `mt_category` ON (`mt_placement`.`placement_category_id` = `mt_category`.`category_id`) WHERE (`mt_entry`.`entry_status` = 2  AND `mt_category`.`category_basename` IN ('business_review' /*... omitted 3 items ...*/ ) AND NOT (`mt_entry`.`entry_id` IN (53441))) ORDER BY `mt_entry`.`entry_authored_on` DESC LIMIT 4\G

There is no performance impact doing this because one DB Connection is maintained to poll the processlist for the duration you specify and hundreds of queries against the processlist are totally lightweight within the confines of a single DB Connection. In light of this, you wll not need a NFS share or any hardware considerations for query performance and analysis.

Give it a Try !!!

UPDATE

mk-query-digest can use either the processlist (via live DB Connection) or tcpdump (via pipe). Here are the options:

--processlist

   --processlist
       type: DSN
       Poll this DSNâs processlist for queries, with "--interval" sleep between.
       If the connection fails, mk-query-digest tries to reopen it once
       per second. See also "--mirror".
   --interval
       type: float; default: .1
       How frequently to poll the processlist, in seconds.

--tcpdump

       tcpdump
           mk-query-digest does not actually watch the network (i.e. it
           does NOT "sniff packets").  Instead, itâs just parsing the out-
           put of tcpdump.  You are responsible for generating this out-
           put; mk-query-digest does not do it for you.  Then you send
           this to mk-query-digest as you would any log file: as files on
           the command line or to STDIN.
       The parser expects the input to be formatted with the following
       options: &quot;-x -n -q -tttt&quot;.  For example, if you want to capture
       output from your local machine, you can do something like

         tcpdump -i eth0 port 3306 -s 65535 -c 1000 -x -n -q -tttt \
           â mk-query-digest --type tcpdump

       The other tcpdump parameters, such as -s, -c, and -i, are up to
       you.  Just make sure the output looks like this:

         2009-04-12 09:50:16.804849 IP 127.0.0.1.42167 &gt; 127.0.0.1.3306: tcp 37
             0x0000:  4508 0059 6eb2 4000 4006 cde2 7f00 0001
             0x0010:  ....

       Remember tcpdump has a handy -c option to stop after it cap-
       tures some number of packets!  Thatâs very useful for testing
       your tcpdump command.

After 8.5 years, I decided to update this post with a version of my wrapper script using pt-query-digest instead of mk-query-digest:

#!/bin/bash

qdlive : Wrapper Around pt-query-digest
BASE_FOLDER     : Target Folder for This Wrapper's Output
PT_QUERY_DIGEST : Absolute path to the pt-query-digest script
MYCNF           : Absolue path to my.cnf with User Credentials to run pt-query-digest and mysqladmin
QD_FOLDER       : Folder Where pt-query-digest Output are Stored

BASE_FOLDER=${HOME}/qdlive
MYCNF=${BASE_FOLDER}/my.cnf
ERRLOG=${BASE_FOLDER}/err.log
PT_QUERY_DIGEST=/usr/bin/pt-query-digest
QD_FOLDER=${BASE_FOLDER}/reports
mkdir -p ${QD_FOLDER}
cd ${BASE_FOLDER}

1st Parameter is the Number of Minutes for pt-query-digest to run
2nd Parameter is the Number of Copies of the pt-query-digest Output to Keep

[   "${1}" == "" ] && exit
[   "${2}" == "" ] && exit
[ ! "${3}" == "" ] && exit
OK1=echo &quot;${1}&quot; | grep -c &quot;^[1-9]$&quot;
OK2=echo &quot;${1}&quot; | grep -c &quot;^[1-9][0-9]$&quot;
OK3=echo &quot;${1}&quot; | grep -c &quot;^[1-9][0-9][0-9]$&quot;
(( OK = OK1 + OK2 + OK3 ))
if [ ${OK} -eq 0 ]
then
        DT=date +&quot;%Y-%m-%d %H:%M:%S&quot;
        echo "${DT} : Invalid Minutes (${1})" >> ${ERRLOG}
        exit
fi
OK=1
[ ${1} -lt  2 ] && OK=0
[ ${1} -gt 60 ] && OK=0
if [ ${OK} -eq 0 ]
then
        DT=date +&quot;%Y-%m-%d %H:%M:%S&quot;
        echo "${DT} : Invalid Minutes (${1})" >> ${ERRLOG}
        exit
fi
RUNTIME_MINUTES=${1}
(( RUNTIME_SECONDS = RUNTIME_MINUTES * 60 ))
OK1=echo &quot;${2}&quot; | grep -c &quot;^[1-9]$&quot;
OK2=echo &quot;${2}&quot; | grep -c &quot;^[1-9][0-9]$&quot;
OK3=echo &quot;${2}&quot; | grep -c &quot;^[1-9][0-9][0-9]$&quot;
(( OK = OK1 + OK2 + OK3 ))
if [ ${OK} -eq 0 ]
then
        DT=date +&quot;%Y-%m-%d %H:%M:%S&quot;
        echo "${DT} : Invalid Copies (${2})" >> ${ERRLOG}
        exit
fi
OK=1
[ ${2} -lt   7 ] && OK=0
[ ${2} -gt 300 ] && OK=0
if [ ${OK} -eq 0 ]
then
        DT=date +&quot;%Y-%m-%d %H:%M:%S&quot;
        echo "${DT} : Invalid Copies (${2})" >> ${ERRLOG}
        exit
fi
COPIES_TO_KEEP=${2}

Parse my.cnf

[mysql]
host = ....
user = ....
password = ...

MYSQL_HOST=localhost
MYSQL_HOST=grep ^host ${MYCNF} | awk '{print $3}'
MYSQL_USER=grep ^user ${MYCNF} | awk '{print $3}'
MYSQL_PASS=grep ^password ${MYCNF} | awk '{print $3}'
MYSQL_PORT=3306
MYSQL_AUTH="--defaults-file=${MYCNF}"
PTOOL_AUTH="-F ${MYCNF}"

Make Sure mysqld is running

MYSQL_RUNNING=mysqladmin ${MYSQL_AUTH} ping 2&gt;&amp;1 | grep -c &quot;mysqld is alive&quot;
if [ ${MYSQL_RUNNING} -eq 0 ]
then
        DT=date +&quot;%Y-%m-%d %H:%M:%S&quot;
        echo "${DT} : mysqld Not Running" >> ${ERRLOG}
        exit
fi

Execute Query Digest

DT=date +&quot;%Y%m%d_%a_%H%M%S&quot;
QD_OUTPUT=${QD_FOLDER}/qd_${DT}_${RUNTIME_MINUTES}m.rpt
${PT_QUERY_DIGEST} --processlist ${MYSQL_HOST} ${PTOOL_AUTH} --run-time=${RUNTIME_SECONDS}s >${QD_OUTPUT} 2>&1

Rotate Old Reports

COPIES_ON_HAND=ls -l | grep -c rpt$
if [ ${COPIES_ON_HAND} -gt ${COPIES_TO_KEEP} ]
then
        (( COPIES_TO_ZAP = COPIES_ON_HAND - COPIES_TO_KEEP ))
        FILELIST=""
        SPC=""
        for FIL in ls *.txt | head -${COPIES_TO_ZAP}
        do
                FILELIST="${FILELIST}${SPC}${FIL}"
                SPC=" "
        done
        for FIL in ${FILELIST} ; do rm -f ${FIL} ; done
fi

Aaron Brown · Answer 2 · 2011-11-25T18:51:27.600

Like Rolando, I would recommend using pt-query-digest to capture this data, but instead of using the --processlist option, capture data with tcpdump. The processlist method will not capture everything and timing data will only be as accurate as the granularity of your interval. tcpdump will get every query. What you get out of the processlist that you don't get from tcpdump is additional information about what state the queries were in.

Here is how to do this. It could easily be scripted:

tcpdump -s 65535 -x -nn -q -tttt -i any -c 9999999 port 3306 | gzip -c > /tmp/tcpdump.txt.gz
gunzip -c /tmp/tcpdump.txt.gz | pt-query-digest --type tcpdump > /tmp/digest.txt

The only issue is that (to my knowledge) tcpdump cannot capture data over an interval of time - it only understands packets. So, either make the value of -c (the number of packets to capture) some very large number, or kill the tcpdump manually after 24 hours...again, that can be scripted. You can use pt-query-digest filters to select for the exact time range, if that is important.

I would output tcpdump to some destination that has a lot of space. In the example above, I am gzipping, but that isn't necessary if you have sufficient space. pt-query-digest is very CPU intensive, so I wouldn't run the pt-query-digest command right on the production server - copy the data somewhere else and work with it there.

EDIT: I forgot to mention that if you don't want the summary data, you can extract just the list of queries from here by adding --no-report --print

score 1 · Answer 3 · answered Nov 24 '11 at 14:10

activating the query log in MySQL, even if the data goes somewhere else will have a negative impact on performance, especially disk I/O. I have found it better to use a third party tool like Jet Profiler for MySQL to analyse queries. This is much like SQL Server Profiler, except it also gives you graphs, pie charts, and more valuable information. Whatever you use, it is always better to analyse queries from another machine rather than activating the query log on the server directly.

To answer your question though, it would be impossible to tell you exactly how big the file is going to get since that depends on the number and the size of the queries being processed by MySQL. Be careful not to place the query log in the same partition as MySQL's data files otherwise MySQL will just stop and say "waiting for someone to free up space".

http://www.jetprofiler.com/

MySQL general query log performance effects

3 Answers3

Rotate out Old Copies

qdlive : Wrapper Around pt-query-digest

BASE_FOLDER : Target Folder for This Wrapper's Output

PT_QUERY_DIGEST : Absolute path to the pt-query-digest script

MYCNF : Absolue path to my.cnf with User Credentials to run pt-query-digest and mysqladmin

QD_FOLDER : Folder Where pt-query-digest Output are Stored

1st Parameter is the Number of Minutes for pt-query-digest to run

2nd Parameter is the Number of Copies of the pt-query-digest Output to Keep

Parse my.cnf

[mysql]

host = ....

user = ....

password = ...

Make Sure mysqld is running

Execute Query Digest

Rotate Old Reports

Linked