Hello,

I have just installed drupal 4.7 on a Linux server and created several sections/subsections, taxonomy and so on.

Now, I have a *lot* of already existing articles for this website on my hard disk, in HTML format, one file per article. The question is how can I insert all of them in Drupal with a shell/mysql/python/perl/whatever script?

Basically I want to make one single tarfile, of all HTML pages and the script, upload it on the server, launch the script via ssh
and have all articles inserted in the right section/subsection, with the correct creation date (that of the original HTML file, not the date when the script was run), title and teaser, taxonomy tags etc....

One thing which is not mandatory, but of course would be a huge time saver would be smart handling of internal links. Several of these HTML files refer to each other, like "to know more, read this page (= file:///main/some_section/some_subsection/specific_article)". Could the script also make so that all the resulting Drupal nodes
would also link correctly to each other?

I have no problem to modify an existing script or just code myself any missing part, but of course I'd really like to start with
a working example, or at least to know what exactly I should study to do this job.

Also scripts, explanations, tricks and comments on how to pre-process the HTML files before inserting them, so they are surely Drupal/CSS compatible would be great. Consider that many of these HTML files were just generated inside OpenOffice, converting existing openoffice files with paragraph styles with the "Save as" or "Export" functions.

Again, even a step by step flow list in normal language of what the script(s) should do would be great.

Thanks in advance for any feedback,

O.

Comments

jakeg’s picture

I'd suggest downloading devel.module (only available as CVS/HEAD) which comes with some mass-content-generation PHP scripts. Copy them and hack away at them to meet your requirements. It'd be web based rather than command line.

Basically, you just define your $node->values in a loop, with a node_save() at the end of each loop. You can read in your existing files using PHP's various file related functions, such as file()

Jake
---
School and university yearbooks and Drupal web services, London

Christoph C. Cemper’s picture

just created a little script that works here

http://drupal.org/node/68153#comment-127897

Christoph C. Cemper’s picture

I just came across the same problem here http://drupal.org/node/68153

do you have a good solution already?

I figured your "step by step flow list in normal language of what the script(s) should do" already

ortles’s picture

Cemper,

I have looked at your script, but that is only to update the teasers, isn't it?

How do you upload the content/ create the node in the first place?

no, I haven't found a solution yet. I have been able only last week to get the domain
for the site which would need this feature, and am still doing basic Drupal settings.

What I would need is something that reads a .txt file like this:

ARTICLE_CATEGORIES="travel/england/london, food/foreign/restaurants"
TEXT="path/to/some/html/file/on/my/computer"
CREATED_ON_DATE="YYYY/MM/DD"
AUTHOR="John Doe"

etc etc.... (other variables like, for example, "create_a_forum_for_this_node")

and upload the node on the server, just if I had inserted everything by hand
in the proper drupal form.

How can I do this? Do you have a solution for this part?

TIA,
O.

andybold’s picture

If your blogapi is set to MovableType, then you maybe mtsend.py would fit the bill.

Find it at http://scott.yang.id.au/2002/12/mtsendpy/

I have used it in the past to post to a Drupal site. For reasons that I forget - I think because I started to use the Performancing plugin for Firefox, I changed my blogapi settings to MetaWeblog, so I haven't used mtsend.py for a while.

Worked great while it lasted though. I had a shell script that would fire up 'vi' with a template, and when I quit 'vi' the script gave me the option of posting with mtsend.py, or saving as a local draft.

Anyway - might be useful for you.