Opened 4 years ago

Closed 3 years ago

#1146 closed defect (fixed)

Recount functions are incredibly poor on performance/resources and timeout

Reported by: _ck_ Owned by: sambauers
Priority: high Milestone: 1.0.3
Component: Administration Version: 1.0.2
Severity: critical Keywords: has-patch tested
Cc:

Description

All of the recount functions need to be rewritten to work completely internal with mysql instead of external loops. There is a serious growing problem with query counts that can go into the MILLIONS.

If a forum has half a million topics, many of the recount functions first fetch ALL the topic id's and THEN loop through them, writing to meta for each one separately. Updatemeta uses at least 3 queries each.

So to use just one of the recount functions on a forum with a half million topics, you currently pull an array of the topic id's several megabytes in size into ram, then proceed to use 500,000 x 3 = 1.5 MILLION QUERIES to update the meta.

Then you repeat that for many of the recount functions. By the time you are done, you've used over 10 MILLION QUERIES and melted even the most robust of mysql servers into a puddle (but probably timed out most PHP setups well before that).

I have an internal recount function in Post Count Plus that does the recount completely in mysql and handles a half million posts in 6 seconds. The manual loop method can't even finish in the 30 second timeout on most PHP installs.

Attachments (3)

fast-recount.php (409 bytes) - added by _ck_ 4 years ago.
part 1 of 2, fast recount for bbPress 0.9
fast-recount-admin.php (2.5 KB) - added by _ck_ 4 years ago.
part 2 of 2, fast recount for bbPress 0.9
faster_tag_recount.patch (18.9 KB) - added by sambauers 3 years ago.
Faster tag recount functions

Download all attachments as: .zip

Change History (28)

comment:1   _ck_4 years ago

Example:

Here is how you are doing the topic_voices recount in 1.0
It uses 3-4 queries per topic because updatemeta reads, then writes, then reads, as well as flush the cache each time and other memory management. On a large forum this function might not even finish, hitting the php timeout and exhausting mysql resources:

		if ( $topics = (array) $bbdb->get_results("SELECT topic_id FROM $bbdb->topics ORDER BY topic_id") ) {
			$messages[] = __('Counted voices');
			foreach ($topics as $topic) {
				$topic_id = (int) $topic->topic_id;
				if ( $voices = $bbdb->get_col( $bbdb->prepare( "SELECT DISTINCT poster_id FROM $bbdb->posts WHERE topic_id = %s AND post_status = '0';", $topic_id ) ) ) {
					$voices = count( $voices );
					bb_update_topicmeta( $topic_id, 'voices_count', $voices );
				}
			}
			unset($topics, $topic, $topic_id);
		}

Here is how I do it in only 2 queries in my Topic Views plugin for 0.9
It is lightning fast as it's done 100% inside of mysql.

		$query="SELECT 'voices_count',topic_id,count(DISTINCT poster_id) as meta_value  FROM $bbdb->posts WHERE post_status = 0 GROUP BY topic_id";
		$bbdb->query("DELETE FROM $bbdb->topicmeta WHERE meta_key = 'voices_count'");  	 
		$bbdb->query("INSERT INTO $bbdb->topicmeta (meta_key, topic_id, meta_value) $query");  	 

comment:2   _ck_4 years ago

This little query will recount all posts for all topics, 100% in mysql, no external php looping requires and only one query from php

INSERT INTO bb_topics (topic_id, topic_posts)
(SELECT topic_id, count(post_status) as topic_posts
FROM bb_posts WHERE post_status = '0' GROUP BY topic_id)
ON DUPLICATE KEY UPDATE topic_posts=VALUES(topic_posts)

comment:3   _ck_4 years ago

in addition to recounting posts in topics above:

recount topics in forums in one query

INSERT INTO bb_forums (forum_id, topics)
(SELECT forum_id, count(topic_status) as topics
FROM bb_topics WHERE topic_status = '0' GROUP BY forum_id)
ON DUPLICATE KEY UPDATE topics=VALUES(topics)

recount posts in forums in one query

INSERT INTO bb_forums (forum_id, posts)
(SELECT forum_id, count(post_status) as posts
FROM bb_posts WHERE post_status = '0' GROUP BY forum_id)
ON DUPLICATE KEY UPDATE posts=VALUES(posts)
  • Milestone set to 1.1
  • Owner set to sambauers
  • Priority changed from normal to high
  • Severity changed from normal to critical
  • Status changed from new to assigned

These are a great start. Recounting is currently a dangerous affair for anyone with any reasonable amount of data.

Prioritising this for 1.1 and maybe before that if I can get time. I'd appreciate more of your SQL query writing if you have time.

It would be nice to just have a single recount button with no options if we can get these fast enough.

One small limitation on this is that sometimes we can't do queries which query more than one table very reliably.

For example, I was just rewriting the recount of topics replied for each user in this way, but my user tables are in a separate DB from the rest of the tables.

As a result, the sub-query wouldn't work. The solution was to manually build the values to insert from the sub-query that is run separately.

Three queries (DELETE, SELECT, INSERT) instead of two (DELETE, INSERT WITH SUB QUERY).

Still multiple times faster than before though.

A really (really, really) big table might run into a MAX_QUERY_SIZE limit too, we can probably work around that quite simply though.

comment:7   _ck_4 years ago

Oh sorry, I forgot to update this ticket, I've now rewritten a few of the recounting functions for bbPress 0.9 - though not all of them yet and a couple of them will have to be changed for 1.0

http://bbshowcase.org/plugins/fast-recount.zip

comment:8   _ck_4 years ago

Oh and I highly recommend you keep them separate and not just one big recount button.

Even the faster versions take several seconds on very large forums and it's possible that PHP will still timeout if they are all done at once. I would NOT even put in a "check all" button, people should look and select only what they need and do it in multiple passes if necessary to keep server loads down.

(In [2353]) First pass at faster recounting - no improved tag recounting yet (that will be harder). See #1146. Props _ck_.

(In [2354]) Remove unimplemented form element from recount page. See #1146.

(In [2355]) Do any inserts 10000 rows at a time to limit exposure to query size limits (excluding tag counting for now). See #1146.

(In [2356]) Add TODO comments to tag recount functions. See #1146.

_ck_4 years ago

part 1 of 2, fast recount for bbPress 0.9

_ck_4 years ago

part 2 of 2, fast recount for bbPress 0.9

(In [2360]) branches 0.9: Better recount queries. Props _ck_. See #1146.

comment:14 follow-up: ↓ 15   Nightgunner54 years ago

bb_recount_topic_tags is missing a global $bbdb;

comment:15 in reply to: ↑ 14   Gautam Gupta3 years ago

Replying to Nightgunner5:

bb_recount_topic_tags is missing a global $bbdb;

That's fixed. Check [2391].

  • Milestone changed from 1.1 to 1.5
  • Keywords needs-patch added
  • Milestone changed from 1.5 to 1.1
  • Summary changed from recount functions are incredibly poor on performance/resources and timeout to Recount functions are incredibly poor on performance/resources and timeout
  • Version changed from 1.0 to 1.0.2

Now, only 3 functions are left to be optimized.

I added a first attempt at faster tag recounting. Unfortunately it was necessary to bypass the taxonomy API completely. It was also necessary to change the way each function returns. Recount functions used to return a message, now they return an array with the first value being an "exit" code and the second being the message. Recount facilities added by plugins can still just return a message if they like or an array similar to the core functions.

  • Keywords needs-testing added; needs-patch removed

Faster tag recount functions

Just reattached patch to fix a bad message being displayed in some cases when requesting a "Remove deleted topics from users' favorites" recount.

  • Keywords has-patch added
  • Keywords tested added; needs-testing removed

The patch works great, I just tested it on a high activity forum. Here's the log that came back:

Counting the number of posts in each topic… Complete!
Counting the number of voices in each topic… Complete!
Counting the number of deleted posts in each topic… Complete!
Counting the number of topics in each forum… Complete!
Counting the number of posts in each forum… Complete!
Counting the number of topics to which each user has replied… Complete!
Counting the number of topic tags in each topic… Complete!
Counting the number of topics in each topic tag… Complete!
Deleting topic tags with no topics… Already run with success.
Removing deleted topics from user favorites… Nothing to remove!
Cleaning up bbPM messages…


  • Milestone changed from 1.1 to 1.0.3

This should just be added to the next minor release.

I conducted a statistical test, and here are the results (without any plugins active):

  • No. of queries taken to normally load the recount page: 7
  • No. of queries taken by the functions which were not optimized (not including 7 above): 39
  • No. of queries taken by the functions which were optimized by the patch (not including 7 above): 15

That makes a difference of 24 queries! (More than 50% which is great)

IMO, this can go into the core.

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [2441]) Performance improvements for tag recount functions. Fixes #1146. Props sambauers, _ck_

Note: See TracTickets for help on using tickets.