Import images, clean MS meta data, change input format

scafmac - March 27, 2009 - 19:08
Project:Import Typepad / MoveableType
Version:5.x-1.0
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:patch (to be ported)
Description

This patch doesn't actually need to be ported, but I know the maintainer would probably prefer multiple patches. So the "to be ported" piece is only if the maintainer decides to break these enhancements into multiple issues & patches. This is the best I have time to offer right now.

The patch contains the following enhancements:

  • Offers to copy images from typepad to Drupal for both <img>s as well as any containing links (<a>); Updates href & img references to use Drupal versions of images. Although it does use IMCE settings (if installed) to determine destination in Drupal and exceptable image extensions, it does not resize images or check user quotas. When importing photos, you need to make sure allow_url_fopen is enabled & you have max_execution_time set high enough in your php.ini. The files are copied to the filesystem, not "uploaded" so you shouldn't have to monkey with the max memory or upload sizes.
  • Allows selecting an import format for all posts other than the default one.
  • Offers to remove empty tags - focus is to remove empty <p> & repeating <br> tags, but any empty tag will be removed
  • Offers to strip Microsoft garbage in posts that were pasted from MS Office; Removes meta tags & included inline style.
  • Hides taxonomy fieldset if no categories found in any posts to be imported.

That's it for now. All of these enhancements are optional and off by default. The code is well commented for easier integration.

AttachmentSize
patch.txt9.02 KB

#1

scafmac - March 30, 2009 - 15:35

I've found a couple of decencies in the patch and have a new patch. Basically it involves teasers. Currently the teasers are just copied from the import teaser field. As far as I can tell with the files I'm importing, they are all just exact duplicates of the bodies - does anyone know if that is always the case?

This patch offers to rebuild the teaser the "drupal way" after all other clean up & photo importing is finished. This means it will use the Drupal core node_teaser API call.

Another problem that isn't addressed in the patch are links to images that are not directly to the image. Some blogs I'm importing link directly to a larger version of an image and some link to "...image.html?images/photo/year/month/imagename.jpg" or something like that. Those links are not updated correctly. The images are still imported, because at least in all of the ones I checked the image is the exact same one that is used as the src in the following img tag. So the image is imported due to the image tag, but the a->href isn't updated. Although this is a bug, it is a "good" one because it prevents the image from being imported 2x.

This patch replaces the previous one - it contains all of the enhancements.

AttachmentSize
patch.txt 9.6 KB
 
 

Drupal is a registered trademark of Dries Buytaert.