I wrote a module wp_feeds (Wordpress XML for Feeds) that has many (most?) of the same features as wordpress_import, but uses Feeds to do so in a highly flexible and adaptable way. There is no technical reason I am aware that it can't do everything wordpress_import does. Additionally it imports comment hierarchy for threaded comments.
It would be great to combine efforts.
Comments
Comment #1
lavamind commentedOf course, I'm quite open to combining efforts in order to create the best possible module to import Wordpress data into Drupal.
I haven't looked into your module in much detail, but from what I understand from the docs, wp_feeds_wxr_importer is based on Wordpress's own parser, which is based on regular expressions. However, for Wordpress Import, we've developped a real XML parser, based on PHP XMLReader.
The main advantage is that XMLReader allows us to process WXR files in a stream-based fashion. This means that there's no size limit to the WXR file being imported. For large Wordpress sites with thousands of posts, this is important. The main downside, however, is that this parser is much more fickle and prone to fail when Wordpress produces faulty XML (mainly when 3rd party plugins are installed).
Now, I'm not really familiar with how Feeds works, or how your module taps into Feeds to do its magic, but if you think there's room for our parser in wp_feeds_wxr_importer, we're in business.
Comment #2
Bevan commentedwp_feedsdoes not use "Wordpress' own parser". It uses Feeds' SimplePie parser. I have only tested it with WXR files up to 200 items and 1.5Mb, so I don't know about it's upper limits. However Feeds is built with scalability and flexibility in mind, so I expect it should handle this.If not, Wordpress Import module's XMLReader parser could be turned into a sub-class of
FeedsParserwhichwp_feeds_wrx_importercan then use. This would also be useful for other users of Feeds. The developer's guide to Feeds describes the architecture surrounding this.Like Wordpress Import, SimplePie/Feeds also fails when WRX files have markup tags with an XML namespace (
xmlns) that has not been declared, such as theatomnamespace in the WXR files I am working with.Would you be able to try out
wp_feeds_wxr_parseron a large WXR file? Or perhaps privately send me a large WXR file so that I can test it? Note that the code forwp_feedsis not available on Drupal.org yet (See the project page for more detail). PM me and I will email you the unlicensed, copyrighted code.