Why is the ID column a varchar and not an integer (surely int would be more efficient)?

I was going to ask why it was called id - but then I noticed it was joint with the type column which provides the option for tid's, vid's AND nid's. I see no oppertunity where its going to be a character though, surely?!

I assume no lookups are done purely on the ID itself? I assume a filter is done on the type first.

Comments

Robrecht Jacques’s picture

Currently the following is stored in the type and id columns of the nodewords table:
- "node" + nid for meta tags associated with node pages,
- "term" + tid for meta tags associated with term pages,
- "vocabulary" + vid for meta tags associated with vocabulary pages.

In 4.7.x-dev currently there is support for the panels module too (this is not released yet as it needs a patch to panels.module, when the author applies it, I'll release 4.7.x-1.1).
In that case, "panels" + did (the panel id) is stored for panel pages.

You are right, these are all integers instead of varchars. The reason varchar was chosen is because I envisionned supporting any page. The "id" of a page is then the "path". Currently there is only support for one such case: the front-page, in which case "page" + "" (the 'id' of the front-page is an empty string) is stored in the table.

I'm working on support for views.module too (not yet in 4.7.x-dev, but probably will be this week). When asked whether "views" + vid (the views id) is a good way to store the meta tags, the author of views.module said it was not and that I should use the views path instead.

So, basically, there are now two reasons to keep varchar as id:
- using "page" + "path" for any page (that can not be identified in any other way) - a planned feature on medium to long term,
- using "views" + "path" for views.module (because that's what the author advised) - a planned feature on short term.

The detection mechanism to know on which page we are is implemented in _nodewords_detect_type_and_ids() which looks at the path. The meta tags itself are loaded with one query on type and id. The nodewords table is not joined with any other table.

Any thoughts on that appreciated. If you know a db scheme that would be more efficient db wise, I'm happy to hear it.

Now that I think of it, it may be more efficient to store "node/nid" in the nodewords table instead of/in addition to an nid (same with term, vocabulary, panels, views, any page, ...) and query immediately on $_GET['q']. It would make the detection more straightforward.

nicholasthompson’s picture

Interesting points - and thanks for that explanation.

I can see a VERY valid case for using the whole path (the system path, obviously). There might need to be a little research into the most efficient index size so that the index doesn't end up doing a 1:1 list of the column (in that case, its no quicker to use the index than to do a scan of the column for the Database - AFAIK).

I am also incredibly pleased to hear about the potential Views integration! :-)

BioALIEN’s picture

Version: 4.7.x-1.0 » 4.7.x-1.x-dev

I think this is the right place to discuss the database structure.

Having upgraded a site to Drupal5.x and the 5.x-1.0 of this module I decided to test its speed with the Devel module. While everything runs smoothly, I'm getting strange readings. This module seems to be firing the SQL queries twice for every page view. This is also slowing down the site when tagging 250+ nodes with this module.

I will post the Devel logs later today if it helps but Robrecht I think the database definitely needs some optimisations.

R.Muilwijk’s picture

Version: 4.7.x-1.x-dev » 5.x-1.9

ID is still a varchar. When looking through the code I don't see any functionality which needs it in the module and it could just be a integer.

Leeteq’s picture

Title: Database structure » Meta tags tables structure

I assume that since we are still on 1.x, there has not been any major changes to the tables structure yet (since 5.0)?

I have not investigated, so can anyone confirm #3: "firing the SQL queries twice for every page view", and if so, give a brief rationale for doing double queries?. Practical to know.

Robrecht Jacques’s picture

There is no rationale for double queries - indeed they should not be present. Did you enable the Meta tags block? This might explain it (in this case the query should be cached). If you did not enable the block I'm unsure what triggers the double queries... I'll investigate.

The database structure has not changed. The reason the ID is VARCHAR is:

So, basically, there are now two reasons to keep varchar as id:
- using "page" + "path" for any page (that can not be identified in any other way) - a planned feature on medium to long term,
- using "views" + "path" for views.module (because that's what the author advised) - a planned feature on short term.

avpaderno’s picture

Status: Active » Closed (won't fix)

As the Drupal 5 version is not supported anymore, I am changing the status of this report.

This report is for an old Drupal 5 version; I would suggest to install version 5.x-1.12, which is the latest official release.