I was recently contracted by a friend of mine to fix a website which was 'built' overseas in India. (http://www.probeinternational.org/)

Much to my surprise the website was actually a cleverly hidden rip-off of this Open Source software. I don't know what they've done to the coding of the site, or if they have even changed anything aside from removing almost every reference to this website, but the site is a total cluster-fuck. Aside from the obvious design flaws, there are all kinds of irrelevant data types inside the drubel system, and the site just doesn't work smoothly. There are broken links, and pages displaying incorrectly. Some articles are one data-type and other articles are another for no apparent reason. This causes all kinds of weird issues when trying to add content. For some Categories you have to use one content-type and for another you have to use a different one, even though you're still just adding an article. Some Categories will display the Latest Article, and other won't, and some pages I can view all the articles, and other pages it brings up a 404 error. I can't comprehend why the site functions the way it does, and I can't make heads or tails of how they are organizing things. It's...just...fucked.

Here's an example of all the data types I feel are redundant:

Probe Alerts
    Notices of special news or reports, appeals for writing to MPs, activists jailed etc.
China stories
    Narratives associated with locations

Page
    If you want to add a static page, like a contact page or an about page, use a page.
Page without a title
    Same as a page, but with no title.

News and Opinion
    PI news content type
PI in the News
    Web version of the PI in the News mailing list
Sources
    Publications such as book, articles, reports etc. 
Story
    Stories are articles in their simplest form: they have a title, a teaser and a body, but can be extended by other modules. The teaser is part of the body too. Stories may be used as a personal blog or for news articles.

Oral History
Oral page
Oral History NEW

These data types are littered all over the place, articles all under the same section might have been added as a Page, or a Page without a title, or a News and Opinion, or a Source. And this is causing all kinds of weird problems.

I won't even get into the Categories, because I don't even know what some of them are or mean. What I want to do it simplify everything, and use the nodes appropriately. I know how I want to organize the nodes, and I know what kind of content types I will need. I just want one content type for articles, one content type for a static page, etc.

How difficult is it going to be to amalgamate these content types? Is this even possible? Since the data structure is the same for most of the content types I figured I could run some SQL query that might be able to change them all to the same content type, but I don't have access to the webspace yet (I just want to get a head start on things here). Is this possible? Or is there a better way to do a batch conversion of content types?

Any suggestions would be appreciated. I really feel terrible for this organization. They spent thousands upon thousands of dollars for a bunk website that was basically just stolen and made worse. Now they have to pay me to fix it, and I would like to be able to do this as quickly and cheaply as I can for them.

Comments

zlex’s picture

I forgot to mention that I plan on starting with a fresh version of drupel, and just work off the old backend. I'm sure I will have to convert to db to work with the new version, but I don't know what they've done to the software, if anything. So I'd rather start fresh and clean.

dnewkerk’s picture

You mentioned that the content types (for the most part at least) do not have any custom fields... which is a good thing (for simplifying this). I'm not an expert and I'm sure more developers will fill in the blanks better, however I've done a manual database import of one of my sites from a custom system into Drupal, so I have some experience.

The "guts" of every node of every type (besides custom CCK fields/data) is stored in a combination of the node table and the node_revisions table. The node table has "almost" everything, but the body field of each node is in node_revisions (this is since Drupal has a wiki-like revisions system for content, so if enabled for a given content type under the Workflow > Create new revision setting, you can go back to previous edits). The most current node title is kept in both node and node_revisions. You can use the nid (node ID) and vid (revision ID, aka version ID) fields to join the node/node_revisions tables to get the correct set of current data from both.

Since I'm not a very good programmer yet and don't know how to use the Drupal API to process data and programatically make new nodes with it (e.g. node_save()), what I would personally do in your case (particularly since the content types are simple) would be to write a number of SQL queries to directly extract the desired data out of the old site's database, and insert it into the new site's node and node_revisions tables.

In case it's helpful, I logged the steps and SQL queries needed when I converted my custom site to Drupal. They are specific to Drupal 5, but are "almost" the same for Drupal 6. I'll likely be updating them also to work with D6 in the next few days. Alternately there's also the Node Import module which might be useful (just export the needed data of the old site into a CSV file to import it with the module on the new site).

There are also very helpful table references in the book Pro Drupal Development. There's an eBook version available if you'd like it right away. The 1st edition covers Drupal 5, and the 2nd edition is Drupal 6. Since the current site you're working on is Drupal 5 (according to the changelog.txt) I'd suggest probably going with an initial Drupal 5 setup, and then just upgrading via the normal process to Drupal 6 once you're done importing the data.

As you mentioned, definitely start with a fresh copy of Drupal's files (which the other developers likely modified who knows how, which will mean unforeseeable problems down the line), and I wouldn't try to salvage the original database (just get the data out into a clean database).

If you need to "mass categorize" nodes into your own taxonomy terms after importing, there are a few modules that could help with that. Possibly Views Bulk Operations, or Taxonomy Multi Editor (maybe others too). Or if there is some sort of taxonomy on the old site categorizing things, you could pull out the term ID along with each node so you can put it back (or map it back) to a taxonomy term on the new site. I have an SQL example in my code for this, or Node Import module can probably help handle this.

Good luck!

-- David
davidnewkerk.com | absolutecross.com
View my Drupal lessons & guides