Image handling - Image nodes? Imagefield integration?
| Project: | Import HTML |
| Version: | 5.x-1.2 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Hi Dan,
during my latest experiments with Import HTML, I ended up quite unhappy with the image handling. Basically, I don't like the idea of a bunch of files ending up without "proper" file-to-node relations. Since this IMHO is a weak spot in Drupal for years, I'm not even sure yet what I want exactly - beyond being able to re-use the images in a sensible way after the import (e.g. tag the images with categories and re-order them in dynamically created image galleries, or expose them to users through something like image assist).
Maybe other users of the Import HTML module are willing to share theire experiences or some kind of "best practice" recommendations?
Basically I'd like to be able to get the images into Image nodes, or at least into CCK imagefields (not sure yet which road I'll take). Because of the different handling of directory structures (flat/hierarchical file storage), this is not trivial, and it would add dependencies to at least one or two other modules like Image plus something like Uploadpath - or - Imagefield plus {whatever}. However, it could add considerable value to the Import HTML module.
Have you ever thought about adding image handling to Import HTML? I'm not sure how much work this would be, but if others would also jump in, I'd like to (partially) sponsor the development.
Thanks and greetings,
-asb

#1
I'm a huge fan of proper image handling, and usually support the images-as-nodes approach.
And yes, dropping the imported images in a big bag is not a full solution. Still, it's no worse than the original source data - it's a direct clone of the semantic structure (or lack thereof). Actually improving your content is a task beyond just importing.
Initially I DID consider and coded a method to attach embedded images in the files table. This was wrong - as Drupal did not allow multiple ownership and re-use of those file! Deleting one node with attachments removed the image for everyone else. And I couldn't actually see the advantage anyway, so stopped trying to do that.
Image.module Images-as-nodes however was not appropriate for most legacy sites I tested on. The identified embedded image always was just a single resolution, and had no distinguishable metadata or reason to be treated as a node.
There are sites that may benefit more from image nodes, but I have used a combination of import_html and image_import in those cases.
I don't work with imagefield much myself, but if it actually supports image re-use, it could be worked in, I guess.
Due to the batch handling of import_html, the images and the files are uploaded independently of each other, in either order. Pages are not actually parsed for their resources, and all resources found are just put in an expected location. The page and resource just meet up again later.
To create a semantic connection between page and resource would be hard under the current structure, and cause difficulties in the ability to process subdirectories in batches. Each file is processed atomically.
HOWEVER a separate filter process/module could be added that actually does analyze the HTML references, check the existence of the referred files and add a connection of some sort. import_html tries to do too much already. Developing this feature as a support/clean-up utility makes more sense to me.
I've already written several things very similar in a filter
Good thoughts...