Download & Extend

WordPress post and category list plugin

Project:Wordpress Import
Version:6.x-2.x-dev
Component:Documentation
Category:support request
Priority:normal
Assigned:workon
Status:active
Issue tags:amp, ampersand, category, cdata, escaping, tag, WordPress, wxr, xml

Issue Summary

In case others run into trouble importing WordPress categories or tags containing ampersands, here's some information that may be useful. This applies if you have a category (or tag) name that looks like this in WordPress:

War & Peace

The XML file you get when exporting your WordPress data will then contain entries like this:

<wp:cat_name><![CDATA[War &amp; Peace]]></wp:cat_name>

And when you use the wordpress_import module to bring the data into Drupal, your category will end up like this:

Drupal Category Name: War &amp; Peace
Drupal Category Path Alias: tags/wordpress-tag/war-amp-peace

You'll notice the escaped ampersand &amp; in the category name, and the resulting amp within the path alias.

I believe WordPress is at fault here for escaping ampersands within category/tag names in CDATA blocks, since they are text and not HTML. There was a similar issue with ampersands in WordPress feed categories in the past that was resolved by using literal ampersands in category/tag CDATA blocks.

As a result, I'm not necessarily recommending any changes to the wordpress_import module (though wordpress_import could perform un-escaping automatically) but I wanted to report my workaround. I ran a text search/replace to un-escape the offending ampersands within the WordPress XML file before using wordpress_import. Here are the regular expressions I used.

To update the main category/tag entries:

Find: (_name><!\[CDATA\[[^\]&]+)&amp;([^\]&]+\]\]></wp:)
Replace with: \1&\2

To update category/tag associations for posts and pages:

Find: (<category[^>]*><!\[CDATA\[[^\]&]+)&amp;([^\]&]+\]\]></category>)
Replace with: \1&\2

You'll need a text editor that supports regular expression searches, like Notepad++. These patterns only work with categories/tags that have a single ampersand inside. For example, they won't match a category named "Pride & Prejudice & Zombies".

I'm using the latest WordPress 2.8.6 and wordpress_import 6.x-2.x-dev (dated 2009-Dec-10).

Comments

#1

Component:Documentation» Code
Assigned to:Anonymous» lavamind

I created a new ticket in Wordpress Trac : http://core.trac.wordpress.org/ticket/14584

#2

Title:Ampersands in WordPress category/tag names» WordPress post and category list plugin
Component:Code» Documentation
Category:bug report» support request
Priority:minor» normal
Assigned to:lavamind» workon

good post and very useful information.here is new W4 post list Create a list of your categories, posts or category posts and show it your sites widget area, post or page content area by shortcode. You will have appropriate options while creating or updating a list. Only the list id is need while showing it with shortcode. So there is no heavy shortcode parameter to fill.

Visit the W4 post list website for more information..