I'm trying to use wordpress importer to import blog posts from my wordpress blog (using wordpress 2.7). I've got foreign language characters in some of my posts, and it looks like XMLReader is choking on them. Here are the warnings I'm getting:

warning: XMLReader::read() [xmlreader.read]: /home/arthur/Dev/adaptingtoscarcity/a2s.com/public_html/sites/default/files/wordpress/wordpress.2010-02-26.xml:6247: parser error : Entity 'iacute' not defined in /home/arthur/Dev/adaptingtoscarcity/a2s.com/public_html/sites/all/modules/contrib/wordpress_import-DRUPAL-6--2/wordpress_import.module on line 1425.
warning: XMLReader::read() [xmlreader.read]: cena, a Town Very Close to the Confluence of the Canal Ahogada and the Rí in /home/arthur/Dev/adaptingtoscarcity/a2s.com/public_html/sites/all/modules/contrib/wordpress_import-DRUPAL-6--2/wordpress_import.module on line 1425.
warning: XMLReader::read() [xmlreader.read]: ^ in /home/arthur/Dev/adaptingtoscarcity/a2s.com/public_html/sites/all/modules/contrib/wordpress_import-DRUPAL-6--2/wordpress_import.module on line 1425.
warning: XMLReader::read() [xmlreader.read]: An Error Occured while reading in /home/arthur/Dev/adaptingtoscarcity/a2s.com/public_html/sites/all/modules/contrib/wordpress_import-DRUPAL-6--2/wordpress_import.module on line 1425.
This file does not appear to be a valid WXR file. The file is either corrupted or invalid XML. In some versions of WordPress, the export function can produce malformed XML. Please see README.txt (included in the module archive) for further guidance.

The first warning, 'Entity 'iacute' ...', relates I believe to the foreign character entity 'í' which correlates to 'í'.

CommentFileSizeAuthor
#3 wordpress.2010-02-26.txt762.73 KBawjrichards

Comments

awjrichards’s picture

I took care of this by replacing all of the foreign character references with their ASCII equivalents, however when I run the import now, it appears to read the XML but then I am presented with a white page with what looks like output from a print_r().

lavamind’s picture

Assigned: Unassigned » lavamind
Status: Active » Postponed (maintainer needs more info)

Hi, could you please provide a sample WXR file so I can try to reproduce this problem? Thanks.

awjrichards’s picture

Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new762.73 KB

Sure - it is attached.

It is saved as a txt file but it is XML (drupal.org does not allow files with XML extension to be uploaded).

lavamind’s picture

Status: Needs review » Closed (duplicate)

This is a duplicate of #693220: Import chokes on XML errors produced by Wordpress

Please see README in current development version.