Opened 11 months ago
Closed 4 months ago
#1884 closed enhancement (fixed)
Strip phpBB bbcode, bbcode_uid and magic urls during import
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Priority: | normal | Milestone: | 2.3 |
| Component: | Importers | Version: | 2.0 |
| Severity: | normal | Keywords: | has-patch |
| Cc: |
Description
Currently during an import of data from phpBB all data is processed raw from the source database tables, ideally this BBCode and 'magic_url' should be stripped during import.
BBCode 'quote' with bbcode_uid '177gyb6t'
eg. [quote:177gyb6t]
BBCode 'img' with image width '799 and height '628'
eg. [IMG:799:628]
Truncated URL due to phpBB's magic_url truncating URL >55 characters
eg. http://example.com/subdir/sub/wind … se-preview
Further reading on these phpBB
http://wiki.phpbb.com/Tutorial.Parsing_text
http://wiki.phpbb.com/Strip_bbcode
Attachments (4)
Change History (22)
comment:3
Coolkevman — 10 months ago
For what it's worth, I've more or less implemented some BBCode clean-up code when I developped my migration plugin from e107 to WordPress+bbPress.
You'll find my code there: http://plugins.trac.wordpress.org/browser/e107-importer/trunk/e107-importer.php#L1289 (see the pre_cleanup_markup() and post_cleanup_markup() methods). This might inspire you for the feature you're requesting.
Example raw data from phpBB
[quote=someone]test text[/quote]
Parsing http://bbpress.trac.wordpress.org/browser/branches/plugin/bbp-admin/bbp-parser.php#L831 & http://bbpress.trac.wordpress.org/browser/branches/plugin/bbp-admin/bbp-parser.php#L681
(bbp-parser.php is the NBBC BBCode Parser http://sourceforge.net/projects/nbbc/)
$title = "Quote:"; else $title = htmlspecialchars(trim($default)) . " wrote:"; return "\n<div class=\"bbcode_quote\">\n<div class=\"bbcode_quote_head\">" . $title . "</div>\n<div class=\"bbcode_quote_body\">" . $content . "</div>\n</div>\n"; }
Resulting in bbPress
<div class="bbcode_quote"> <div class="bbcode_quote_head">someone wrote:</div> <div class="bbcode_quote_body">test text</div> </div>
The base bbcode_ CSS classes can be easily styled with CSS, any phpBB BBCodes with a 'bbcode_uid' needs to be stripped during the import/conversion otherwise bbp-parser.php leaves the BBCodes in tact eg. [quote:177gyb6t]
(I'll create a new ticket for bbcode_ CSS class names to be added to bbpress.css & bbpress-rtl.css)
comment:6
johnjamesjacoby — 6 months ago
- Owner set to netweb
Note: 1884.diff is only a 'work in progress'
Hope to finish the horrid last regex in the morning and then upload the finished diff
Due to the vast array of URL formats being imported from phpBB this ticket needs to be put on hold until #1932 is resolved and then the URL 'preg_replace' rules in this ticket can be fully tested.
Will finish this off shortly and watch for any gotchas related to #WP23050
- Milestone changed from 2.3 to 2.4
Moving to 2.4 so we can focus on importers then. If an urgent/compelling patch comes in, we can sneak this in to 2.3.
comment:11
netweb — 5 months ago
- Keywords has-patch added
comment:12
netweb — 5 months ago
- Milestone changed from 2.4 to 2.3
This is now finished ready for commit, moving back to 2.3
comment:13
hugoheinz — 5 months ago
@netweb: I tested your patch (1884.2.diff).
I seems that the $phpbb_uid variable is not used in the callback_html function. It should be set to $field, right?
However, if I do so, the conversion of the [quote="xyz"] tags does not work properly for me.
comment:14
netweb — 5 months ago
- Keywords has-patch removed
Hugo,
Thanks for testing it, I will update the patch shorty.
Fix now strips custom phpBB codes with regex from $field first then the clean $field is parsed with parser.php to decode remaining HTML | 8 new phpBB user profile fields mapped to _usermeta | Whitespace & Inline Docs also updated for import tools consistency.
comment:15
netweb — 5 months ago
- Keywords needs-testing added
An FAQ/Known Issues docs page is now on the codex http://codex.bbpress.org/import-forums/phpbb/
comment:16
hugoheinz — 4 months ago
Works fine for me now. Thanks a lot!
comment:17
netweb — 4 months ago
- Keywords has-patch added; needs-testing removed
Detailed breakdown of all changes contained in 1884.4.diff (inclusive of all diff's in this ticket)
- Cleaned whitespace and updated inline docs and phpdoc for consistency accross all import tools.
- Fixed '_bbp_forum_parent_id'
- Added Forum topic count -> '_bbp_topic_count'
- Added Forum reply count -> '_bbp_reply_count'
- Added Forum total topic count -> '_bbp_total_topic_count'
- Added Forum status -> '_bbp_status'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_date'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_date_gmt',
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_modified'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_modified_gmt'
- Added Topic reply count -> '_bbp_reply_count',
- Added Topic total reply count -> '_bbp_total_reply_count'
- Added Topic date -> '_bbp_last_active_time'
- Added Topic author ip -> '_bbp_author_ip'
- Added phpbb user ICQ -> '_bbp_phpbb_user_icq'
- Added phpbb user MSNM -> '_bbp_phpbb_user_msnm'
- Added phpbb user Jabber -> 'jabber'
- Added phpbb user Occupation -> '_bbp_phpbb_user_occ'
- Added phpbb user Interests -> '_bbp_phpbb_user_interests'
- Added phpbb user Signature -> _bbp_phpbb_user_sig','
- Added phpbb user Location -> '_bbp_phpbb_user_from'
- Added phpbb user avatar filename -> '_bbp_phpbb_user_avatar'
- Added function callback_topic_reply_count
- Added function callback_html (Strips custom phpBB 'magic_url' and 'bbcode_uid' first from $field before parsing $field to parser.php)
- Added bbPress @mentions to converted phpBB [quote] BBCodes
- Resolution set to fixed
- Status changed from new to closed
(In [4703]) Updates to phpBB converter:
- Cleaned whitespace and updated inline docs and phpdoc for consistency accross all import tools.
- Fixed '_bbp_forum_parent_id'
- Added Forum topic count -> '_bbp_topic_count'
- Added Forum reply count -> '_bbp_reply_count'
- Added Forum total topic count -> '_bbp_total_topic_count'
- Added Forum status -> '_bbp_status'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_date'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_date_gmt',
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_modified'
- Fixed Forum dates -> to_type = 'forum' and to_fieldname = 'post_modified_gmt'
- Added Topic reply count -> '_bbp_reply_count',
- Added Topic total reply count -> '_bbp_total_reply_count'
- Added Topic date -> '_bbp_last_active_time'
- Added Topic author ip -> '_bbp_author_ip'
- Added phpbb user ICQ -> '_bbp_phpbb_user_icq'
- Added phpbb user MSNM -> '_bbp_phpbb_user_msnm'
- Added phpbb user Jabber -> 'jabber'
- Added phpbb user Occupation -> '_bbp_phpbb_user_occ'
- Added phpbb user Interests -> '_bbp_phpbb_user_interests'
- Added phpbb user Signature -> _bbp_phpbb_user_sig','
- Added phpbb user Location -> '_bbp_phpbb_user_from'
- Added phpbb user avatar filename -> '_bbp_phpbb_user_avatar'
- Added function callback_topic_reply_count
- Added function callback_html (Strips custom phpBB 'magic_url' and 'bbcode_uid' first from $field before parsing $field to parser.php)
- Added bbPress @mentions to converted phpBB [quote] BBCodes
- Props netweb.
- Fixes #1884.
I also have the need for this but with Invision. I imagine the callback should be built into each converter to remove all special code. Perhaps some standard functions like the [quote] shortcode can be converted to a format that bbpress quotes plugin can recognize.