Jump to:
Issue Summary
In case others run into trouble importing WordPress categories or tags containing ampersands, here's some information that may be useful. This applies if you have a category (or tag) name that looks like this in WordPress:
War & Peace
The XML file you get when exporting your WordPress data will then contain entries like this:
<wp:cat_name><![CDATA[War & Peace]]></wp:cat_name>And when you use the wordpress_import module to bring the data into Drupal, your category will end up like this:
Drupal Category Name: War & Peace
Drupal Category Path Alias: tags/wordpress-tag/war-amp-peace
You'll notice the escaped ampersand & in the category name, and the resulting amp within the path alias.
I believe WordPress is at fault here for escaping ampersands within category/tag names in CDATA blocks, since they are text and not HTML. There was a similar issue with ampersands in WordPress feed categories in the past that was resolved by using literal ampersands in category/tag CDATA blocks.
As a result, I'm not necessarily recommending any changes to the wordpress_import module (though wordpress_import could perform un-escaping automatically) but I wanted to report my workaround. I ran a text search/replace to un-escape the offending ampersands within the WordPress XML file before using wordpress_import. Here are the regular expressions I used.
To update the main category/tag entries:
Find: (_name><!\[CDATA\[[^\]&]+)&([^\]&]+\]\]></wp:)
Replace with: \1&\2To update category/tag associations for posts and pages:
Find: (<category[^>]*><!\[CDATA\[[^\]&]+)&([^\]&]+\]\]></category>)
Replace with: \1&\2You'll need a text editor that supports regular expression searches, like Notepad++. These patterns only work with categories/tags that have a single ampersand inside. For example, they won't match a category named "Pride & Prejudice & Zombies".
I'm using the latest WordPress 2.8.6 and wordpress_import 6.x-2.x-dev (dated 2009-Dec-10).
Comments
#1
I created a new ticket in Wordpress Trac : http://core.trac.wordpress.org/ticket/14584
#2
good post and very useful information.here is new W4 post list Create a list of your categories, posts or category posts and show it your sites widget area, post or page content area by shortcode. You will have appropriate options while creating or updating a list. Only the list id is need while showing it with shortcode. So there is no heavy shortcode parameter to fill.
Visit the W4 post list website for more information..