Opened 10 months ago

Last modified 7 weeks ago

#1925 new defect

Performance: Optimize _bbp_last_active_time to use an indexed field for very large deployments

Reported by: vibol Owned by:
Priority: normal Milestone: 2.4
Component: General Version: 2.1.2
Severity: normal Keywords: dev-feedback
Cc: vibol, georgemamadashvili@…, wordpress@…, jared@…, stephen@…, nashwan.doaqan@…

Description

Background ticket: http://bbpress.trac.wordpress.org/ticket/1885

Background Conversation:

Vibol: For the first query, last_active_time now looks to be the culprit. It's a postmeta lookup with a sort.

JJJ: Meta queries are always going to be a slow like this. There isn't an available index we can use here to avoid the meta query. We *could* hack in repurposing post_date as the freshness time, but we lose the actual date the topic was posted. It would take some rearchitecting to move the original post creation date to postmeta, to make the index available.

Problem: The _bbp_last_active_time query is doing a full table scan and sort against the postmeta table when retrieving forum index pages. The slowdown is linear with size of table scan. Here's an example slow query:

# Time: 120804 19:52:17
# Query_time: 21.460766  Lock_time: 0.000051 Rows_sent: 25  Rows_examined: 313770
SET timestamp=1344127937;
SELECT SQL_CALC_FOUND_ROWS  wp_posts.* FROM wp_posts  INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1  AND wp_posts.post_parent = 17  AND wp_posts.post_type = 'topic' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'closed') AND (wp_postmeta.meta_key = '_bbp_last_active_time' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value DESC LIMIT 0, 25;

Expected behavior: Should be fast if the data is indexed, but the postmeta table does not have indexes against it.

Guidance: I'm looking for someone to fix this, or, I am happy to put in the time to fix it with guidance on desired approach. This is a non-trivial issue, so someone with more experience in WP/BBP is ideal.

Attachments (2)

mysql-slow.log (403.2 KB) - added by vibol 9 months ago.
1925.diff (8.7 KB) - added by MZAWeb 5 months ago.

Download all attachments as: .zip

Change History (18)

  • Cc georgemamadashvili@… added

So... We're sort of stuck. The approach you quoted from our previous conversation is the only available index in the posts table that makes sense to try to use. We could also, maybe, try a subquery using the ID and post_parent indexes, but that only works if we can guarantee that newer replies will always have a higher ID. This *should* be the case, but it's hard to know for sure given the way the importer works, and how other forum software stores its data.

It's a problem not easily solved due to our limited schema and a lack of indexes on the postmeta table. Happy that you're around looking into it too.

As I dig through this, it seems like there are more issues that crop up. See the following set of queries that happen from forum view to thread reply.

# Query_time: 14.640429  Lock_time: 0.000133 Rows_sent: 25  Rows_examined: 313770
SET timestamp=1345373401;
SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1  AND wp_posts.post_parent = 17  AND wp_posts.post_type = 'topic' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'closed') AND (wp_postmeta.meta_key = '_bbp_last_active_time' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value DESC LIMIT 0, 25;
# Time: 120819  5:51:22
# Query_time: 11.424189  Lock_time: 0.000054 Rows_sent: 1  Rows_examined: 1623849
SET timestamp=1345373482;
SELECT COUNT( DISTINCT post_author ) FROM wp_posts WHERE ( post_parent = 1623928 AND post_status = 'publish' AND post_type = 'reply' ) OR ( ID = 1623928 AND post_type = 'topic' );
# Time: 120819  5:51:50
# Query_time: 27.115751  Lock_time: 0.000070 Rows_sent: 55996  Rows_examined: 56000
SET timestamp=1345373510;
SELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;

The last one I'll have to try to attach because it's too big to fit in this field.

vibol9 months ago

WP posts have four date fields. post_date, post_date_gmt, post_modified and post_modified_gmt. Feels hacky but we could use the post_modified or post_modified_gmt in the forums and topics records for the last_active_time. Even without index, not having to join against postmeta will increase the performance a lot

Last edited 6 months ago by MZAWeb (previous) (diff)
  • Cc wordpress@… added

Sadly, since all of the core database columns are used by WordPress already, reusing a column for our own purposes is probably dangerous, and will take more work to override the way WordPress naturally writes to those columns than it's probably worth.

post_modified *almost* makes sense, but we should be fully aware of what WordPress core uses that for before hijacking it. If it's used for comparing revisions, we're out of luck; if it's not used at all, we're at risk of it being deprecated and disappearing completely (like post_category).

  • Milestone changed from 2.3 to 2.4

Query times are the average of 20 runs. First couple of runs ignored as to get MySQL "hot".

DB Size:

forums 5
topics 29208
replies 95755

Main query now (ordering by meta_value):

SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM   wp_posts
       INNER JOIN wp_postmeta
               ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE  1 = 1
   AND wp_posts.post_parent = 4
   AND wp_posts.post_type = 'topic'
   AND ( wp_posts.post_status = 'publish'
          OR wp_posts.post_status = 'closed'
          OR wp_posts.post_status = 'private' )
   AND ( wp_postmeta.meta_key = '_bbp_last_active_time' )
GROUP  BY wp_posts.ID
ORDER  BY wp_postmeta.meta_value DESC
LIMIT  0, 15

Avg. time: 162ms

Main query without the join:

SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM   wp_posts
WHERE  1 = 1
   AND wp_posts.post_parent = 4
   AND wp_posts.post_type = 'topic'
   AND ( wp_posts.post_status = 'publish'
          OR wp_posts.post_status = 'closed'
          OR wp_posts.post_status = 'private' )
 GROUP  BY wp_posts.ID
ORDER  BY wp_posts.post_modified DESC
LIMIT  0, 15	

Avg. time: 14.9ms

I'm cleaning the patch to switch all uses of _bbp_last_active_time meta to post_modified. Will post asap.

MZAWeb5 months ago

comment:9 follow-up: ↓ 13   MZAWeb5 months ago

First pass

1) The batch update script in tools.php is not working. Need to fix.
2) I need to change the importer in includes/admin/converters/bbPress1.php, but I don't have a clue how. Need to research how it works. (Never played with the importer)

Finally swinging back around to this. MZAWeb, are you using your changes in production?

Another field that has come to my attention is menu_order. It's primarily used for menu items, but is seldom used for Posts. There are already many plugins that use it to allow users to manually order their Pages and Posts. It's an int(11) so it should be able to fit a timestamp until about 2033.

In either case (post_modified or menu_order), altering the index on the table to add the newly referenced field should help speed things up as well since post_modified and menu_order are not in the indices.

I'd be curious to hear your thoughts.

An update.

First off, thank you MZAWeb for contributing your patch. The patch makes a significant difference in performance for my use case, enough so that I intend to use it in production.

The only concern would be curiosity into where WordPress updates last_modified--I believe it does this if a Post is edited (in this case, a Topic). I'm still mulling over the idea of using menu_order, simply because that is a field that is seldom changed by WordPress.

I'll put up changes I made to MZAWeb's patch that prevents an infinite recursion when a Forum is created. I'll also include an index that I've added to further speed up the query.

  • Cc jared@… added

comment:13 in reply to: ↑ 9   netweb8 weeks ago

  • Cc stephen@… added

Replying to MZAWeb:

2) I need to change the importer in includes/admin/converters/bbPress1.php, but I don't have a clue how. Need to research how it works. (Never played with the importer)

This part should be pretty straightforward (for me), I'll test your patch and add a patch for the importers here next week.

comment:14 follow-up: ↓ 15   alex-ye7 weeks ago

  • Cc nashwan.doaqan@… added

I really like this ticket it will help to get a better performance , I read your conversation about using last_modified or menu_order but It seems not a good idea for me , And you know why ! ...

There are some other options we can use :
1 - Create a new table "wp_forum_data" like an example ,
with an indexed columns for the most needed things .
2 - Extend the wp_posts table columns .

comment:15 in reply to: ↑ 14 ; follow-up: ↓ 16   MZAWeb7 weeks ago

Replying to alex-ye:

I really like this ticket it will help to get a better performance , I read your conversation about using last_modified or menu_order but It seems not a good idea for me , And you know why ! ...

Actually, I'm not sure why it doesn't seem a good idea for you. Care to elaborate? I'm on the fence myself and would love to hear arguments.

There are some other options we can use :
1 - Create a new table "wp_forum_data" like an example ,
with an indexed columns for the most needed things .
2 - Extend the wp_posts table columns .

I totally get the arguments in favor of that, I really do. But I'm quite sure adding a new table or field it's not gonna happen.

comment:16 in reply to: ↑ 15   alex-ye7 weeks ago

Replying to MZAWeb:

Care to elaborate? I'm on the fence myself and would love to hear arguments.

The main arguments is :

1 - WordPress is still use the post_modified, post_modified_gmt and menu_order , So using this fields by bbPress in another meaning can cause many bugs as my expectation .

2 - _bbp_last_active_time is not the only meta value which make SQL queries slow , bbPress use many meta fields , and it may be add more in the future

3 - What if the keymaster want to know the really last modified forum ?! , last_active_time and post_modified have a confuse meaning ! , This is not logical to me

I totally get the arguments in favor of that, I really do. But I'm quite sure adding a new table or field it's not gonna happen.

I think adding a new table can be good idea if we have many meta to index , but can you explain why you refuse to add a new column ?

Note: See TracTickets for help on using tickets.