Hi to all.
I'm writing this post about a solution that I've found for generating a standard compliance Google Sitemap for Drupal 6.x, 7.x multilanguage sites.
Because I was not very happy and satisfied with any Drupal Google Sitemap modules, I've written a script that you can simply upload in the root of your site for generating automatically a google sitemap for every menu link and for each node, with multilanguage prefix if needed.
The basic difference that I've found between any other scripts that I've seen all around is that I've followed the database approach, instead of scanning pages link-by-link like a crowler: I've made a matching between "node" and "menu" with "url_alias" drupal database tables and, for example, this is a result:

<url>
<loc>
http://www.yoursite.org/it/this-is-a-test-content/
</loc>
<lastmod>2009-05-20</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>

The "node/NID" is served when a content hasn't a url alias and the prefix "/en/" is not inserted for the english default language.

I've specified "priority" to 0.8 and "changefreq" to weekly for the nodes and "priority" to 0.9 and "changefreq" to monthly for the menu links, but you can easily change this in the source code.

Pay attention that this is not intended as a drupal module but a simple script; maybe in the future I'll work for release a module but in this moment I hope that this script can be helpful to anyone, like me, for submit to Google a multilanguage sitemap for a drupal site.

I've optimized the script for automatically connect to the database, so you just have to upload the script in the root of your site and launch the script in the browser.

If you add this code before the last RewriteRule in ".htaccess" file, you can submit "http://www.yoursite.org/sitemap.xml" to Google.

 RewriteRule ^sitemap.xml$ sitemap.php [L]

Note that this script doesn't work on Postgres db.

Any help and feedback is strongly appreciated.

Thank you.

Download Multilanguage Drupal Google Sitemap Generator script

Comments

milksamsa’s picture

Awesome job!
You should include the predefined namespace though, as suggested by google, otherwise you'll get a warning in Google Webmaster Tools:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

bobighorus’s picture

In the actual version of the script I've used "<urlset xmlns="http://www.google.com/schemas/sitemap/0.9">" as Google suggests; but I'll make a test to investigate about your issue.
Thank you for your note.

avpaderno’s picture

In Join forces with others is clearly stated that:

Do not try to duplicate functionality because "you don't really like how it's done there". That only adds clutter. Work to improve an existing module rather than introduce yet another random module, as that leads to confusion and frustration for the development, support and end user communities.

This means you should not create another project that does the same thing done by XML sitemap.

bobighorus’s picture

Thanks for your contribute, I really appreciate it.
But, as you can read above, this is not a module: it's just a Drupal script that I hope could be useful for any other Drupal Community user, in order to "Join forces with other".
Since I had the necessity to generate a google sitemap suitable for a multilanguage site I wrote this code myself; any other module just wouldn't work.
You wrote: "you should not create another project that does the same thing done by XML sitemap"; yes, it's true; infact "XML sitemap" module seems to have some problems with multilanguage distributions.
Thank you for your attention.

avpaderno’s picture

I was referring to what you wrote:

maybe in the future I'll work for release a module

The meaning of another project that does the same thing done by XML sitemap is that another project that creates, and outputs the content of a sitemap in XML format is considerate a duplicate of XML sitemap. If XML sitemap has some bugs, that is not a sufficient reason for somebody to create another project that creates a XML sitemap starting from node data contained in Drupal core database tables.

bobighorus’s picture

Ok. In the future maybe I'll work to make XML Sitemap working properly.

avpaderno’s picture

The script calls drupal_set_message(), but the messages passed to that function will never appear to the user until the script does not call theme('page'), which is not done, and it should never be done from the script.

The script correctly bootstrap Drupal, but then uses MySQL functions to access the database. The correct way, especially after bootstrapping Drupal, is to use the Drupal core functions, as db_query(), db_fetch_object(), db_fetch_array(), db_result(), etc...).

The script should use the Drupal core functions; so far, it doesn't allow to third-party modules to filter the returned list of nodes, and the nodes it lists in the sitemap could not be accessible from the anonymous user.

After calling drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE), the $language variable is not initialized, yet; the part of the script that checks the value of that variable is perfectly useless.
To initialize $language, you should call drupal_bootstrap() passing at least DRUPAL_BOOTSTRAP_LANGUAGE; if then you would follow what cron.php does, it would be enough to call drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL).

The correct way to return a Drupal URL is to call url(), but the script cannot call it because the bootstrap level is not high enough for that.

It should be then said that the script is adapt for web site with few nodes because it doesn't even check the limits a sitemap has; a sitemap that doesn't respect the limitation it has is normally not accepted by a search engine.

bobighorus’s picture

Dear Kiam, thank you so much; as I wrote above this is just a young script and I surely have to work for Drupal compliance; but, anyway, I think that the important in this moment for any Drupal user is that the script does what it promises: a Google Sitemap for a Multilanguage Drupal site. What can you say about other modules?

Anyway, you're not totally right about what you wrote.

You wrote: "After calling drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE), the $language variable is not initialized, yet; the part of the script that checks the value of that variable is perfectly useless."

You're totally wrong: the $language is not refered to DRUPAL_BOOTSTRAP_LANGUAGE but it is refered to the result of "$array_nid" that comes from the first query I made. It's not a default variable. It's completely different. Undersand the code, first!

You wrote: "a sitemap that doesn't respect the limitation it has is normally not accepted by a search engine."

You're wrong: Google, according specifications, accepts 50000 urls. If there are more urls only the urls that exceed that limit are not taken. But, anyway, do you know a classic Drupal Website with more than 50000 urls?

Thank you for your contribute but read with attention the code, before posting, please. It leads confusion to end user communities. Thank you.

avpaderno’s picture

It's true that the $language is initialized from the script before it uses it, but it override a Drupal global variable.

The script does what it promises, but it does it in the wrong way, and probably it doesn't work if not in particular configurations.

  • I tested it on my test site, and it doesn't even return the correct URL for the links.
    The output it returns (I show only the first lines) is something like:
    <urlset xmlns='http://www.google.com/schemas/sitemap/0.9'>
      <url>
      <loc>http://localhost:8888/dr61//node-29001-devel_content/</loc>
      <lastmod>2009-05-16</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.8</priority>
    </url>
    <url>
      <loc>http://localhost:8888/dr61//node-29000-devel_content/</loc>
      <lastmod>2009-05-16</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.8</priority>
    </url>
    

    It doesn't seem to handle correctly the language, because in my case I had one node in Latin with a translation in English; in that case, none of the links reported contain the language prefix (la, or en, in this case).

  • It spends time to get the parameters to access the database, when Drupal already has those parameters, and calling Drupal functions they would not be even required.
  • It spends time to call a function that is supposed to show the message passed as argument in the same page that is being shown. That doesn't have really a sense when the output is not a HTML page; then, the messages passed to that function will never appear to the user because there isn't any call to theme('page') that would really show the messages to the user (and again, that should never happen because the content being output is not HTML, nor XHTML).
  • It cannot be even used from who has PostgreSQL, and that is not stated in anyplace; the user who is going to try the script (at his own risk) should know that before he even downloads it.

The script has been written for Drupal, but it seems written by somebody who is not a Drupal developer, nor he has not the necessary knowledge.

bobighorus’s picture

You wrote: "It's true that the $language is initialized from the script before it uses it, but it override a Drupal global variable."

Have you ever heard before about the PHP extract(array[]) function?

avpaderno’s picture

Yes, I know extract() as I also know the database abstraction functions that are available from Drupal; I will never call drupal_set_message() in code that will never output the messages passed to that function.

The idea can be good, but it has been implemented in the wrong way. Also, it would be helpful to report the limits of the written script, before the others think it suits their needs.

bobighorus’s picture

I will never call drupal_set_message() in code that will never output the messages passed to that function.

It's a mistake due of a bad copy/paste. I think it's not a so terrible thing!Expecially for an error message function that will never be displayed!

bobighorus’s picture

You wrote:
"The script has been written for Drupal, but it seems written by somebody who is not a Drupal developer, nor he has not the necessary knowledge."

I not define myself as a nerd, but I experienced Drupal developing, expecially for the multilanguage porting of some module, like "ulisting" for example.

I wrote since first post that this is not a module but a rapid script for generating a Google multilanguage sitemap for Drupal website. It's just a stubb.

Thank you for your contribute.

avpaderno’s picture

Google, according specifications, accepts 50000 urls. If there are more urls only the urls that exceed that limit are not taken. But, anyway, do you know a classic Drupal Website with more than 50000 urls?

Actually, most of the users that reports XML sitemap causes PHP timeouts have something close to 50,000 links, and maybe more.
The script would probably cause a timeout even with 5,000 links, also because it doesn't call set_time_limit(240).

bobighorus’s picture

Most of the users reports that XML sitemap simply doesn't works for multilanguage site; I tested XML sitemap module and the output is a blank page.
This is the reason why I've written the script.
If XML sitemap works good, why I've had to wasting my time writing this code?
Why people don't work to adjust their own stuff instead of write against other's?

avpaderno’s picture

The problem of the blank page is not related to the fact the site is multilingual, or not: the problem is with the cache files not being created.

Why people don't work to adjust their own stuff instead of write against other's?

I am fixing the code of the XML sitemap branch I am developing; you should know it if you read the project issue queue. Are you fixing the code of your script?

avpaderno’s picture

The parameters required to access the database are already known from Drupal, but the script tries to get them from the URL used to call it.
This means the script should be called passing such informations in the URL while Drupal (after it bootstraps) it's able to give those information that would not be required from the script if it would use the Drupal core function for the database access; db_query() doesn't have the password to access the database as parameter, nor it has the user name used to access the database between its parameters.

bobighorus’s picture

Another time, I think you've read too fast the code. Haven't you?

avpaderno’s picture

I think you've read too fast the code.

Somebody has read so fast the Drupal API documentation that he didn't notice the functions Drupal has, which can be called; or that, or he has not even start to read the documentation.

milksamsa’s picture

I'm a total noob when it comes to coding.
I obviously understand Kiam's point of view, since he's the maintainer of the XML Sitemap Module.
Since I really believe that sharing knowledge to build better drupal modules should be mandatory in most world countries, I recommend you 2 start a collaboration.

Bobighorus's script is working on my drupal website, that's all I needed and I appreciated his suggestion since I don't understand jargon, but needed a working sitemap as many people on drupal.org do...

It could obviously be better. As XML Sitemap should.

Thank you both.

Milk

avpaderno’s picture

The point is that you don't make public such code, with all the defects it has, and announce it like if it would be the next big thing. The script actually has more bugs than XML sitemap, and it does much less things than XML sitemap does.

The worst is that he even announced it on the XML sitemap issue queue, like if the script was much better than XML sitemap itself.
I don't think that is a correct thing to do, especially if the proposed code doesn't even seem written by somebody who has the necessary knowledge to write code for Drupal.

bobighorus’s picture

You wrote: "The worst is that he even announced it on the XML sitemap issue queue, like if the script was much better than XML sitemap itself."

This are your words, not mine. I wrote inside the XML sitemap issue queue, because I think that this script - at this time that XML sitemap module doesn't works properly - can be useful for someone.

You wrote:"The script actually has more bugs than XML sitemap, and it does much less things than XML sitemap does."
Are you really really sure about this? I'm not; this is the reason I wrote this script.

The most important point is sharing knowledge, I think.

avpaderno’s picture

Let me make an example.
I go to a project issue queue, I read an issue report and I write that I use a code that is lightly different, and I give the link for the code I am using; when the others check the code I am using, they discover it has more bugs than the code being used in the project.

If I offer an alternative, it must be a better alternative, not an alternative that has more bugs, and does less.

bobighorus’s picture

I agree with you.
I never wrote that this script is the solution to all the bad thing in this world.
Simply I had a necessity for my Drupal site; none of the modules worked properly; so I've decided to make a script myself and share it with others.
I wrote since first post that this is a script, not a complete module.
I think that, even there are mistakes from a developer point, this could be a good and helpful think for somebody; in my site and - as I can read until now - in someonelse's it works perfect.
I hope you understand this.
Thank you.

avpaderno’s picture

I wrote since first post that this is a script, not a complete module.

If it's a script, or a complete module, doesn't make any difference; after all, a module is just a PHP script that follows some rules in writing the code, nothing more.

It would have been better if you would have reported that your script has some limits, and it's not thought to be used with PostgreSQL, nor it's optimized (even if calling a function that will never show its output is not exactly an optimization problem).

bobighorus’s picture

So this is a collaborative effort?

It would have been better if you would have reported that your script has some limits

Infact I really appreciate your feedback, this is the sense of community; even if you can explain it in another tone.

and it's not thought to be used with PostgreSQL

I've provided to update my first description post.

even if calling a function that will never show its output is not exactly an optimization problem

Another time: I think you're too exaggerate and pedantic! You're relating of a minor question. And you don't really read my previous answers before posting.

jacob.letter’s picture

I understand kiam but I really appreciate bobighoru's script.
Maybe it could be better, from a developer point of view like kiam's, but from my side - as ending user - this script is perfect for my need and it works really good.
Thank you.

dave reid’s picture

bobighorus, I wanted to say thank you for sharing your code. There have been lots of people like yourself (me included) that went and wrote something that would work for them. I'm now the maintainer in charge of the rewrite of the 6.x-2.x version of xmlsitemap.module and I'd encourage you to give it a try and help us make the module better and more reliable so people don't have to continue writing their own scripts. Participation is deeply appreciated in the issue queue (http://drupal.org/project/issues/xmlsitemap) and we'd love to see you there!

bobighorus’s picture

Thank you so much, Reid!
I really like your positive comment; this is the spirit of Drupal and of the Open Source in general!
I hope I can help with XML Sitemap module rewrite.

bobighorus’s picture

I'm very annoyed for steryl polemics and for the aggressive tone I've had to read and for discrimination of my work and professionality from a jealous and not collaborative guy.
This is not the correct way to encourage this community, according to me.
Finally, another time, I say: this is a stubb, a script not a module, that I wrote for my need and that I've decided to share with others, celebrating the spirit of the Open Source, so the spirit of Drupal; this script has some limits that you can read above; but in this moment I hope and I believe that it can be helpful, 'cause it works.
I'll not answer anymore to any other polemic and I try to help in the XML Sitemap module issue queue.
Any useful feedback is deeply appreciated.
That's all folks.

avpaderno’s picture

... except if you say how the script should be coded, and it's not (or what the script doesn't do, and it should).

bobighorus’s picture

aggressive and pedantic tone from a jealous and not collaborative guy.

avpaderno’s picture

It's hard to be jealous for who wrote the script has it is now.

bobighorus’s picture

ok.

avpaderno’s picture

The script is not thought to work on the genericity of Drupal installations, nor it's optimized. It makes some assumptions about the actual configuration that could not be met in your Drupal site; in particular, it is not thought to support all the database engines that Drupal actually support, but it is written only for MySQL.

You use it at your own risk.

prodosh’s picture

Why not be positive about this and help the contributor improve his script instead of picking on him? This script fills a gap that is not covered by exisiting modules - so let's be constructive and either work on implementing the additional functions the contributor provides in XML Sitemap or help the contributor make a great module out of this. And let us not forget that Open Source is also about choice.

_______
Safe Swiss Cloud: Enterprise grade cloud infrastructure for Drupal B2B solutions and platforms.

avpaderno’s picture

The script doesn't use any Drupal function to verify if the anonymous user (which includes also the search engines) has access to the links being added. This can cause the search engines to report that some links caused a 403 error message.
If you have some modules that change the node access permissions that Drupal normally applies, then you are going to have some error reports in the webmaster tools page some of the search engines offer; this is particulary true for Google Webmaster Tools that reports in a more detailed way every error it finds when crawling the links reported in the sitemap.

avpaderno’s picture

If you want to use an alternative script that creates the sitemap content from the list of Drupal nodes, and that resolves the issues present in the script described here, look at #468590: Script to generate a sitemap content from the node data.

The script correctly set the changefreq field of the sitemap with the value calculated from the creation timestamp, and the change timestamp; it doesn't use a static value for each link. It uses 0.5 as priority, but you can change that to the value you like better; the value of the priority is the same for all the links added to the sitemap.

EDIT: I was forgetting to say that the script has been developed for Drupal 6; there aren't guarantees that it works on Drupal 7 too.

jacob.letter’s picture

I made tests onto three different sites for my clients, with standard Drupal 6 distribution and i18n module: the script in this page (bobighorus') works good; no issue.
For this moment it's all I need.
I hope I was helpful.
Thanks.

Jacob

paku’s picture

Hi all,

First of all I would like to say I am working with Kiam's module for months now. That's why I would like to thanks him for a work he has put in it.

But....

I had to switch my sites to international version and after some investigation have found I am only able to submit EN site version.

So I am unable to submit full site xml to google using Kiam's module. Or please advice how can I do it, DO PLEASE.

That's why I am just installing the script.

That's simple, some (a lot of ) people need ANY solution for ML sites, that's the point.

Kiam: Using your experience it should not a be a big stuff to manage such a problem :)

Paul

paku’s picture

Life's not easy.

For me it's not working.

Here is what I've got:

www.4x4.org.pl/site/sitemap2.xml

www.4x4.info.pl/site/sitemap2.xml

So still looking for multilanguge multisite google sitemap :(

Paul

momo18’s picture

The 6.x-2.x-dev version of the XML Sitemap module I just tried out has multi-language capabilities, and generates sitemaps in multiple languages. For example sitemap.xml, he/sitemap.xml, nl/sitemap.xml and so on and so forth. I tried out the 6.x non-dev version before, and that clearly didn't provide multi-language functionality. You can see it in active on my web design services site at http://www.itwrite.com So far it seems to be working OK.

Regards,
Moses