Opened 12 years ago
Closed 11 years ago
#2194 closed defect (bug) (fixed)
Import parser truncates topics & replies when source contains
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 2.5 | Priority: | normal |
Severity: | major | Version: | trunk |
Component: | API - Importers | Keywords: | needs-testing |
Cc: | john@…, stephen@… |
Description
While testing out the 2.3 beta 1, I ran multiple imports from SimplePress and found that some of the posts were being cut off. I believe I've tracked it down. It appears that when doing an import, any posts that contain the following string will get truncated:
<p> </p>
If the above shows up in a post, the content before it appears properly, but anything after that is truncated.
Attachments (4)
Change History (22)
#3
@
12 years ago
- Cc stephen@… added
- Component changed from General to Importers
- Summary changed from Import from SimplePress chopping off post content to Import parser truncates topics & replies when source contains
Confirmed with all importers, not just Simple:Press if the source topic or reply contains a
that topic or reply will be truncated.
Looking into options for a fix in parser.php
Notes:
is actually no-break space typically for multiple spaces.
eg. text with a double--space (Where the - is substituted here for a space)
#4
@
12 years ago
- Keywords reporter-feedback added
I'm attaching a patch 2194.1.diff that is specifically for Simple:Press only.
I have tried to kept the scope of this as narrow as possible so this will ONLY replace using unicode strings '<p> </p>'
with standard HTML '<p> </p>'
. If there are other
not wrapped in <p></p>
tags the same issue will occur, if it does let me know and I will modify the patch.
This is not being added to parser.php at this stage as far to much testing is required and all testing thus far breaks all kinds of htmlspecialchars
codes eg. <
, >
. Of all the forum databases I have none of them include HTML or Unicode
in the topic or reply content and I think this is an edge case specific to SimplePress.
If you can test this and let me know that would be great.
#5
@
12 years ago
@netweb Not sure if you saw my comment previous to your patch, but in the database it's not storing <p> </p>, but simply
Here's a screenshot of what the post looks like: http://d.pr/i/KQrO
That being said, I applied the patch and ran a test anyway. And as expected, it didn't work.
#6
@
12 years ago
@vegasgeek,
Can you dump that row from your wp_sfposts table for me please with phpMyAdmin and attach it to this ticket.
- Open the wp_sfposts table
- Click Export
- Export Method: Custom - display all possible options
- Rows: Dump some row(s)
- Number of rows: 1
- Row to begin at: 1234 (Whatever row number that post is)
- Click 'Go' at the bottom of the page
It should look similar to to the sp_sfposts.sql file I just attached to this ticket
#8
@
12 years ago
- Severity changed from normal to major
I uploaded 2194.2.diff and updated the regex to match 'new line' -> -> 'carriage return'
and replace it with an HTML line break <br>
It works in this specific case using the post you supplied in the SQL.
Again if there are ANY elsewhere in your data the same behaviour will occur with the topic or reply truncated.
I am not really happy with this as a solution to the core problem and will need to look at creating a patch for parser.php for now though this can be use as a workaround for the issue.
#9
@
12 years ago
I should mention that I solved this issue a different way. I did a search/replace on the wp_sfposts table to replace with nothing. The results for me ended up being spot on. What do you think of doing a str_replace( ' ', , $data) on just the one table prior to the rest of the conversion? It might mess up a little formatting (which was minimal, if not unnoticeable, I might add), but I'd personally take that over truncated posts.
Side note: I have kept a copy of the database prior to conversion so that I can help test patches for this as needed.
#10
@
12 years ago
I originally tried replacing all the occurrences but had issues depending on if it was used in a line break or used in inline text in the topic/reply and if it was the HTML
or Unicode u00A0
.
The built in NBBC BBCode parser.php file should be able to parse these properly and I have tested a couple of options to patch this but it needs a great deal more testing against all the current forum imports to ensure it works correctly and doesn't break other bits.
For now we can use the above patch and/or a manual search & replace with phpMyAdmin and I will work on testing a patch for a future release.
#11
@
12 years ago
I applied patch 2194.2.diff and re-ran the import process. Happy to report, it ran perfectly and didn't truncate the posts. Woot!
#12
@
12 years ago
Cool... Glad it worked but I am still hesitant to include this in the upcoming release of 2.3.
#13
@
12 years ago
I hear ya. As I said, my forums are already moved, so I'm not itching for it to get in to 2.3, but I'm happy to help test as needed.
#14
@
12 years ago
- Milestone changed from Awaiting Review to 2.4
Non-break spaces are just spaces, not line breaks.
Moving to 2.4 milestone.
Just ran another test. In the database the text doesn't have <p></p> wrapped around it. It's simply - But, by running a search/replace on the database first to replace with <br />, the import seems to have worked flawlessly.