I have 4 tables (I've specified main columns for each):
monitors(id)monitor_node(monitor_id, node_id, average_response_time)monitor_checks(id, monitor_id, node_id, response_time)nodes(id)
Their relations:
1 monitor - N checks
N monitor - N nodes
1 check - 1 monitor & 1 node
So, I'm logging a lot of checks for each monitor. Monitor_checks will have millions of rows (approximately up to 500 millions).
When I insert another bunch of checks (~1k checks), I need to calculate average response time per node (table monitor_node, column average_response_time).
I'm pretty sure I'm doing it the wrong way and I need a faster solution. What I do now: after inserting 1k rows in monitor_checks I calculate average response time for each monitor (grouping by monitor_id). Then, based on this info I make an array and use insert ... on duplicate key update for monitor_node table, in order to do bulk update. Apart from average response time I calculate some other attributes, which goes along with response time in this bulk update.
Making the array with info and executing insert ... on duplicate key update is fast enough.
The slow query is:
select monitor_id, avg(response_time) as avg_response_time
from `monitor_checks`
where `node_id` = 2
group by `monitor_id`
which takes like ~10-20 seconds for ~4m rows I guess.
I also realised that it's not necessary to get average time based on all checks, I can as well use last 50-100 rows. But I couldn't figure out how to do this. I got a fancy solution for grouping checks for each monitor and limit them for some number of rows: How to get latest 2 records of each group but it took way too long as well.
So the question is: how to quickly recalculate average response times in monitor_node when inserting ~1k rows in monitor_checks table?
DB Fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=bd95afc030361bf1d87f8bc5c3935c2f
Final desired result:
monitor_id node_id average_response_time
1 1 0.30
1 2 0.25
2 1 0.55
2 2 0.65