Skip to:
Content

bbPress.org

Opened 17 months ago

Closed 2 months ago

Last modified 2 months ago

#3395 closed enhancement (fixed)

Akismet spam cleanup functionality

Reported by: stephdau Owned by: johnjamesjacoby
Milestone: 2.6.7 Priority: high
Severity: normal Version:
Component: Extend - Akismet Keywords: has-patch commit
Cc:

Description

This was first proposed on Slack here.

In the WP AK plugin, we have a scheduled function Akismet::delete_old_comments_meta() to delete old akismet_as_submitted meta records, because they are pretty large, and not useful at all after a while.

Since then, @johnjamesjacoby suggested:

We could go as far as moving it out of post meta and into a custom post type with a post_parent relationship to the topic/reply. That would allow for some longer data retention, without the surplus of metadata keys.

I'm attaching a patch with what I had started before JJJ add his comment, since I have it, but leaving any fancier concept to the bbPress team to bounce around.

Note that my patch does not actually have the scheduling of said cleanup, as I wasn't sure where it would be best to trigger that in bbPress. See:

/**
 * The above will need to be scheduled with something like:
 * if ( ! wp_next_scheduled( 'akismet_scheduled_delete' ) ) {
 * 	wp_schedule_event( time(), 'daily', 'akismet_scheduled_delete' );
 * }
 */

Also note that the patch in wholly untasted, since I didn't get to keep going. :)

Attachments (5)

3395.diff (5.9 KB) - added by stephdau 17 months ago.
3395.2.diff (8.6 KB) - added by johnjamesjacoby 5 months ago.
3395.3.diff (3.7 KB) - added by johnjamesjacoby 2 months ago.
3395.4.diff (6.3 KB) - added by johnjamesjacoby 2 months ago.
3395.5.diff (7.1 KB) - added by johnjamesjacoby 2 months ago.
Constant removed. Default 1000.

Download all attachments as: .zip

Change History (32)

@stephdau
17 months ago

#1 @stephdau
17 months ago

Another note: I will be off on a sabbatical in Sept/Oct/Nov, so I'll tag my fellow Akismet devs on this ticket, so they can keep up with it, and your team.

Ping @cfinke and @procifer. Tag! ;p

Last edited 17 months ago by stephdau (previous) (diff)

#2 @johnjamesjacoby
13 months ago

  • Keywords has-patch needs-testing added
  • Milestone changed from Awaiting Review to 2.7
  • Owner set to johnjamesjacoby
  • Priority changed from normal to high
  • Status changed from new to assigned

Patch looks really good to me. Let's milestone this for 2.7, and I'll be sure to test this thoroughly then. 🙏

#3 @stephdau
13 months ago

Thanks!

#4 @stephdau
5 months ago

Any chance this could make it in sooner rather than later?
It's been 13 months now. :)

#5 @johnjamesjacoby
5 months ago

3395.2.diff includes the following changes:

  • Removed/improved some related code docs
  • Replace return; statements with break; inside of while loops, so that table optimizations still run
  • Pass more variables into hooks, to make them potentially more useful later
  • Add a maybe_optimize_postmeta() method to encapsulate some repeated code
  • Add docs on filters
  • Relocate filter names, because I think they were inverted (count vs. batch)?

@stephdau what do you think, old friend? 🍕

#6 @stephdau
5 months ago

Looks brilliant. Thanks!

#7 @johnjamesjacoby
4 months ago

  • Keywords commit added; needs-testing removed
  • Milestone changed from 2.7 to 2.6.7

#8 @johnjamesjacoby
4 months ago

In 7203:

Akismet: introduce a few clean-up action hook methods.

  • delete_old_spam() - deletes old spam topics & replies from the queue after 15 days.
  • delete_old_spam_meta() - deletes _bbp_akismet_as_submitted entries after 15 days.
  • delete_orphaned_spam_meta() - deletes post meta that no longer have corresponding posts in the database.

Props stephdau, johnjamesjacoby.

In branches/2.6, for 2.6.7.

See #3395.

#9 @johnjamesjacoby
4 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 7204:

Akismet: introduce a few clean-up action hook methods.

  • delete_old_spam() - deletes old spam topics & replies from the queue after 15 days.
  • delete_old_spam_meta() - deletes _bbp_akismet_as_submitted entries after 15 days.
  • delete_orphaned_spam_meta() - deletes post meta that no longer have corresponding posts in the database.

Props stephdau, johnjamesjacoby.

In trunk, for 2.7.0.

Fixes #3395.

#10 @johnjamesjacoby
4 months ago

In 7205:

Akismet: hook clean-up methods from r7203 into WP Cron.

This change takes stephdau's advice and schedules a daily cron on the akismet_scheduled_delete event (but only when actually adding to the Akismet post histories, to try to narrow the scope slightly).

In branches/2.6, for 2.6.7.

See #3395.

#11 @johnjamesjacoby
4 months ago

In 7206:

Akismet: hook clean-up methods from r7204 into WP Cron.

This change takes stephdau's advice and schedules a daily cron on the akismet_scheduled_delete event (but only when actually adding to the Akismet post histories, to try to narrow the scope slightly).

In trunk, for 2.7.0.

See #3395.

#12 @dd32
2 months ago

FYI; These cleanup methods appear to be very heavy on some sites, for example, on WordPress.org/bbPress.org/Buddypress.org the following errors are being triggered:

E_ERROR: Allowed memory size of 268435456 bytes exhausted (tried to allocate 16777224 bytes) in wp-includes/wp-db.php:1544
or
E_ERROR: Maximum execution time of 100 seconds exceeded in wp-includes/wp-db.php:1447

Source: <Cron request, or wp-cli cron runner>

Current filter: akismet_scheduled_delete
- attached to priority 10:
    BBP_Akismet->delete_old_spam,
    BBP_Akismet->delete_old_spam_meta,
    BBP_Akismet->delete_orphaned_spam_meta,
    Akismet::delete_old_comments,
    Akismet::delete_old_comments_meta,
    Akismet::delete_orphaned_commentmeta

I'm not certain what Query / or which hooked filter is running during those errors, but this started happening around October 12th, which is about the time that a sync of other code from bbPress to w.org happened.

The maximumn time error appears to be in wpdb::prepare() and the OOM appears to be within print_error(), suggesting an extra-large SQL query being run too given how much memory it's trying to store.

Potentially related, is that Akismet always defines AKISMET_DELETE_LIMIT as 100,000 where as the above change uses 10,000 otherwise. I've raised that with the Akismet team.

#13 @dd32
2 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Ahah, I found it. A little different than I thought.

[24-Nov-2021 07:00:42 UTC] База даних WordPress повернула помилку Unknown column 'post_id' in 'where clause' у відповідь на запит DELETE FROM wp_posts WHERE post_id IN ( '175', '221', '299', '461', '494', '498' ), виконаний

do_action_ref_array('akismet_scheduled_delete'), WP_Hook->do_action, WP_Hook->apply_filters, BBP_Akismet->delete_old_spam, hyperdb->query, wpdb->print_error, hyperdb->get_caller
  • includes/extend/akismet.php

     
    10371037                        $format_string = implode( ", ", array_fill( 0, count( $spam_ids ), '%s' ) );
    10381038
    10391039                        // Run the delete queries
    1040                         $wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->posts} WHERE post_id IN ( " . $format_string . " )", $spam_ids ) );
     1040                        $wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->posts} WHERE ID IN ( " . $format_string . " )", $spam_ids ) );
    10411041                        $wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->postmeta} WHERE post_id IN ( " . $format_string . " )", $spam_ids ) );
    10421042
    10431043                        // Clean the post cache for these topics & replies

This ticket was mentioned in Slack in #bbpress by dd32. View the logs.


2 months ago

#15 @stephdau
2 months ago

Well... That's a facepalm moment... Great catch, @dd32

#16 follow-up: @johnjamesjacoby
2 months ago

3395.3.diff includes the following changes:

  • Incorporate feedback from @dd32
  • Replace p.id with p.ID in 2 more queries
  • Reduce interval from 10000 to 1000
  • Incorporate AKISMET_DELETE_LIMIT constant and _bbp_akismet_delete_spam_limit filter into the other 2 clean-up methods, allowing them to be adjusted when necessary

Another idea is to use bbp_is_large_install() (or some other database query) to guess at an appropriate limit. That seems out of scope to getting this fixed up, though.

#17 @johnjamesjacoby
2 months ago

3395.4.diff iterates on 3395.3.diff in the following ways:

  • Turns duplicate LIMIT code into a new method: get_delete_limit()
  • Turns duplicate DAY code into a new method: get_delete_interval()
  • Both methods accept filter parameters, so they can continue to be unique
  • Removes superfluous intval() calls since they are cast as (int) previously
  • Minor string formatting changes for consistency

#18 in reply to: ↑ 16 ; follow-ups: @dd32
2 months ago

Replying to johnjamesjacoby:

  • Reduce interval from 10000 to 1000

I'm not sure that's really useful here.. Since Akismet always defines it as 100,000: https://plugins.trac.wordpress.org/browser/akismet/trunk/akismet.php?marks=43#L40

Sites can't even override it without PHP warnings by defining it first :shrug:

#19 @dd32
2 months ago

#3444 was marked as a duplicate.

#20 in reply to: ↑ 18 ; follow-up: @stephdau
2 months ago

Replying to dd32:

Sites can't even override it without PHP warnings by defining it first :shrug:

True. We'll look into that, but they can override the limit by using the `akismet_delete_comment_limit` filter.

#21 in reply to: ↑ 20 @stephdau
2 months ago

Or _bbp_akismet_delete_spam_limit in bbPress' case.

#22 in reply to: ↑ 18 @johnjamesjacoby
2 months ago

Replying to dd32:

Sites can't even override it without PHP warnings by defining it first :shrug:

I'd like to propose that bbPress simply omit the usage of the constant.

The complexity and quantity of meta data in bbPress is almost-always guaranteed to be larger than Comments, so using that same value for both may actually be counter-intuitive to administrators.

Thoughts? 🧐

@johnjamesjacoby
2 months ago

Constant removed. Default 1000.

#23 @stephdau
2 months ago

That works. We might make the constant obsolete in a future AK release, since it's not being used for what it's meant to.

Releasing the fix is definitely needed ASAP, as we're getting a non-zero number of support requests for it, such as this one.

#24 @johnjamesjacoby
2 months ago

In 7226:

Akismet: improvements to clean-up routines, based on user feedback.

  • Use correct ID column for the posts database table
  • Use correct post_id column for the postmeta database table
  • Reduce row limit to 1000 from 100000 to avoid lengthy table locks in active forums
  • Remove usage of constant, that may be phased out eventually
  • Update related code docs
  • Introduce helper methods for applying dynamically named filters

In trunk for 2.7. See #3395.

#25 @johnjamesjacoby
2 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 7227:

Akismet: improvements to clean-up routines, based on user feedback.

  • Use correct ID column for the posts database table
  • Use correct post_id column for the postmeta database table
  • Reduce row limit to 1000 from 100000 to avoid lengthy table locks in active forums
  • Remove usage of constant, that may be phased out eventually
  • Update related code docs
  • Introduce helper methods for applying dynamically named filters

In branches/2.6 for 2.6.9. Fixes #3395.

#26 @stephdau
2 months ago

🙇 Thanks most kindly, JJJ.

#27 @johnjamesjacoby
2 months ago

It is my genuine pleasure, @stephdau 🙏

Thank you, too, @dd32 💕

Note: See TracTickets for help on using tickets.