Hi to all.
I'm writing this post about a solution that I've found for generating a standard compliance Google Sitemap for Drupal 6.x, 7.x multilanguage sites.
Because I was not very happy and satisfied with any Drupal Google Sitemap modules, I've written a script that you can simply upload in the root of your site for generating automatically a google sitemap for every menu link and for each node, with multilanguage prefix if needed.
The basic difference that I've found between any other scripts that I've seen all around is that I've followed the database approach, instead of scanning pages link-by-link like a crowler: I've made a matching between "node" and "menu" with "url_alias" drupal database tables and, for example, this is a result:
<url>
<loc>
http://www.yoursite.org/it/this-is-a-test-content/
</loc>
<lastmod>2009-05-20</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
The "node/NID" is served when a content hasn't a url alias and the prefix "/en/" is not inserted for the english default language.
I've specified "priority" to 0.8 and "changefreq" to weekly for the nodes and "priority" to 0.9 and "changefreq" to monthly for the menu links, but you can easily change this in the source code.
Pay attention that this is not intended as a drupal module but a simple script; maybe in the future I'll work for release a module but in this moment I hope that this script can be helpful to anyone, like me, for submit to Google a multilanguage sitemap for a drupal site.
I've optimized the script for automatically connect to the database, so you just have to upload the script in the root of your site and launch the script in the browser.
If you add this code before the last RewriteRule in ".htaccess" file, you can submit "http://www.yoursite.org/sitemap.xml" to Google.
RewriteRule ^sitemap.xml$ sitemap.php [L]
Note that this script doesn't work on Postgres db.
Any help and feedback is strongly appreciated.
Thank you.
Download Multilanguage Drupal Google Sitemap Generator script
Comments
Google Sitemap
Awesome job!
You should include the predefined namespace though, as suggested by google, otherwise you'll get a warning in Google Webmaster Tools:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">In the actual version of the
In the actual version of the script I've used "
<urlset xmlns="http://www.google.com/schemas/sitemap/0.9">" as Google suggests; but I'll make a test to investigate about your issue.Thank you for your note.
The policy is to not create duplicate projects
In Join forces with others is clearly stated that:
This means you should not create another project that does the same thing done by XML sitemap.
Thanks for your contribute, I
Thanks for your contribute, I really appreciate it.
But, as you can read above, this is not a module: it's just a Drupal script that I hope could be useful for any other Drupal Community user, in order to "Join forces with other".
Since I had the necessity to generate a google sitemap suitable for a multilanguage site I wrote this code myself; any other module just wouldn't work.
You wrote: "you should not create another project that does the same thing done by XML sitemap"; yes, it's true; infact "XML sitemap" module seems to have some problems with multilanguage distributions.
Thank you for your attention.
I was referring to what you wrote
I was referring to what you wrote:
The meaning of another project that does the same thing done by XML sitemap is that another project that creates, and outputs the content of a sitemap in XML format is considerate a duplicate of XML sitemap. If XML sitemap has some bugs, that is not a sufficient reason for somebody to create another project that creates a XML sitemap starting from node data contained in Drupal core database tables.
Ok. In the future maybe I'll
Ok. In the future maybe I'll work to make XML Sitemap working properly.
The script has some defects
The script calls
drupal_set_message(), but the messages passed to that function will never appear to the user until the script does not calltheme('page'), which is not done, and it should never be done from the script.The script correctly bootstrap Drupal, but then uses MySQL functions to access the database. The correct way, especially after bootstrapping Drupal, is to use the Drupal core functions, as
db_query(),db_fetch_object(),db_fetch_array(),db_result(), etc...).The script should use the Drupal core functions; so far, it doesn't allow to third-party modules to filter the returned list of nodes, and the nodes it lists in the sitemap could not be accessible from the anonymous user.
After calling
drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE), the$languagevariable is not initialized, yet; the part of the script that checks the value of that variable is perfectly useless.To initialize
$language, you should calldrupal_bootstrap()passing at leastDRUPAL_BOOTSTRAP_LANGUAGE; if then you would follow what cron.php does, it would be enough to calldrupal_bootstrap(DRUPAL_BOOTSTRAP_FULL).The correct way to return a Drupal URL is to call
url(), but the script cannot call it because the bootstrap level is not high enough for that.It should be then said that the script is adapt for web site with few nodes because it doesn't even check the limits a sitemap has; a sitemap that doesn't respect the limitation it has is normally not accepted by a search engine.
You're not totally right. But it does what it promises.
Dear Kiam, thank you so much; as I wrote above this is just a young script and I surely have to work for Drupal compliance; but, anyway, I think that the important in this moment for any Drupal user is that the script does what it promises: a Google Sitemap for a Multilanguage Drupal site. What can you say about other modules?
Anyway, you're not totally right about what you wrote.
You wrote: "After calling drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE), the $language variable is not initialized, yet; the part of the script that checks the value of that variable is perfectly useless."
You're totally wrong: the $language is not refered to DRUPAL_BOOTSTRAP_LANGUAGE but it is refered to the result of "$array_nid" that comes from the first query I made. It's not a default variable. It's completely different. Undersand the code, first!
You wrote: "a sitemap that doesn't respect the limitation it has is normally not accepted by a search engine."
You're wrong: Google, according specifications, accepts 50000 urls. If there are more urls only the urls that exceed that limit are not taken. But, anyway, do you know a classic Drupal Website with more than 50000 urls?
Thank you for your contribute but read with attention the code, before posting, please. It leads confusion to end user communities. Thank you.
That is true
It's true that the
$languageis initialized from the script before it uses it, but it override a Drupal global variable.The script does what it promises, but it does it in the wrong way, and probably it doesn't work if not in particular configurations.
The output it returns (I show only the first lines) is something like:
It doesn't seem to handle correctly the language, because in my case I had one node in Latin with a translation in English; in that case, none of the links reported contain the language prefix (la, or en, in this case).
theme('page')that would really show the messages to the user (and again, that should never happen because the content being output is not HTML, nor XHTML).The script has been written for Drupal, but it seems written by somebody who is not a Drupal developer, nor he has not the necessary knowledge.
You wrote: "It's true that
You wrote: "It's true that the $language is initialized from the script before it uses it, but it override a Drupal global variable."
Have you ever heard before about the PHP extract(array[]) function?
Did you use any of the Drupal database abstraction functions?
Yes, I know
extract()as I also know the database abstraction functions that are available from Drupal; I will never calldrupal_set_message()in code that will never output the messages passed to that function.The idea can be good, but it has been implemented in the wrong way. Also, it would be helpful to report the limits of the written script, before the others think it suits their needs.
I will never call
It's a mistake due of a bad copy/paste. I think it's not a so terrible thing!Expecially for an error message function that will never be displayed!
You wrote: "The script has
You wrote:
"The script has been written for Drupal, but it seems written by somebody who is not a Drupal developer, nor he has not the necessary knowledge."
I not define myself as a nerd, but I experienced Drupal developing, expecially for the multilanguage porting of some module, like "ulisting" for example.
I wrote since first post that this is not a module but a rapid script for generating a Google multilanguage sitemap for Drupal website. It's just a stubb.
Thank you for your contribute.
The script would probably cause a PHP timeout with 5000 links
Actually, most of the users that reports XML sitemap causes PHP timeouts have something close to 50,000 links, and maybe more.
The script would probably cause a timeout even with 5,000 links, also because it doesn't call
set_time_limit(240).Most of the users reports
Most of the users reports that XML sitemap simply doesn't works for multilanguage site; I tested XML sitemap module and the output is a blank page.
This is the reason why I've written the script.
If XML sitemap works good, why I've had to wasting my time writing this code?
Why people don't work to adjust their own stuff instead of write against other's?
The blank page is output for sites that are not multilingual too
The problem of the blank page is not related to the fact the site is multilingual, or not: the problem is with the cache files not being created.
I am fixing the code of the XML sitemap branch I am developing; you should know it if you read the project issue queue. Are you fixing the code of your script?
The password to access the database is taken from the URL
The parameters required to access the database are already known from Drupal, but the script tries to get them from the URL used to call it.
This means the script should be called passing such informations in the URL while Drupal (after it bootstraps) it's able to give those information that would not be required from the script if it would use the Drupal core function for the database access;
db_query()doesn't have the password to access the database as parameter, nor it has the user name used to access the database between its parameters.Another time, I think you've
Another time, I think you've read too fast the code. Haven't you?
OK. Take out another one, but it still 2 over 5
Somebody has read so fast the Drupal API documentation that he didn't notice the functions Drupal has, which can be called; or that, or he has not even start to read the documentation.
It does what I need.
I'm a total noob when it comes to coding.
I obviously understand Kiam's point of view, since he's the maintainer of the XML Sitemap Module.
Since I really believe that sharing knowledge to build better drupal modules should be mandatory in most world countries, I recommend you 2 start a collaboration.
Bobighorus's script is working on my drupal website, that's all I needed and I appreciated his suggestion since I don't understand jargon, but needed a working sitemap as many people on drupal.org do...
It could obviously be better. As XML Sitemap should.
Thank you both.
Milk
That is not the point
The point is that you don't make public such code, with all the defects it has, and announce it like if it would be the next big thing. The script actually has more bugs than XML sitemap, and it does much less things than XML sitemap does.
The worst is that he even announced it on the XML sitemap issue queue, like if the script was much better than XML sitemap itself.
I don't think that is a correct thing to do, especially if the proposed code doesn't even seem written by somebody who has the necessary knowledge to write code for Drupal.
The most important point,
You wrote: "The worst is that he even announced it on the XML sitemap issue queue, like if the script was much better than XML sitemap itself."
This are your words, not mine. I wrote inside the XML sitemap issue queue, because I think that this script - at this time that XML sitemap module doesn't works properly - can be useful for someone.
You wrote:"The script actually has more bugs than XML sitemap, and it does much less things than XML sitemap does."
Are you really really sure about this? I'm not; this is the reason I wrote this script.
The most important point is sharing knowledge, I think.
Let me make an example
Let me make an example.
I go to a project issue queue, I read an issue report and I write that I use a code that is lightly different, and I give the link for the code I am using; when the others check the code I am using, they discover it has more bugs than the code being used in the project.
If I offer an alternative, it must be a better alternative, not an alternative that has more bugs, and does less.
I agree with you. I never
I agree with you.
I never wrote that this script is the solution to all the bad thing in this world.
Simply I had a necessity for my Drupal site; none of the modules worked properly; so I've decided to make a script myself and share it with others.
I wrote since first post that this is a script, not a complete module.
I think that, even there are mistakes from a developer point, this could be a good and helpful think for somebody; in my site and - as I can read until now - in someonelse's it works perfect.
I hope you understand this.
Thank you.
You should document the limits of the script
If it's a script, or a complete module, doesn't make any difference; after all, a module is just a PHP script that follows some rules in writing the code, nothing more.
It would have been better if you would have reported that your script has some limits, and it's not thought to be used with PostgreSQL, nor it's optimized (even if calling a function that will never show its output is not exactly an optimization problem).
What a collaborative spirit!
So this is a collaborative effort?
Infact I really appreciate your feedback, this is the sense of community; even if you can explain it in another tone.
I've provided to update my first description post.
Another time: I think you're too exaggerate and pedantic! You're relating of a minor question. And you don't really read my previous answers before posting.
I understand kiam but I
I understand kiam but I really appreciate bobighoru's script.
Maybe it could be better, from a developer point of view like kiam's, but from my side - as ending user - this script is perfect for my need and it works really good.
Thank you.
Thanks for sharing!
bobighorus, I wanted to say thank you for sharing your code. There have been lots of people like yourself (me included) that went and wrote something that would work for them. I'm now the maintainer in charge of the rewrite of the 6.x-2.x version of xmlsitemap.module and I'd encourage you to give it a try and help us make the module better and more reliable so people don't have to continue writing their own scripts. Participation is deeply appreciated in the issue queue (http://drupal.org/project/issues/xmlsitemap) and we'd love to see you there!
Thank you!
Thank you so much, Reid!
I really like your positive comment; this is the spirit of Drupal and of the Open Source in general!
I hope I can help with XML Sitemap module rewrite.
Final consideration
I'm very annoyed for steryl polemics and for the aggressive tone I've had to read and for discrimination of my work and professionality from a jealous and not collaborative guy.
This is not the correct way to encourage this community, according to me.
Finally, another time, I say: this is a stubb, a script not a module, that I wrote for my need and that I've decided to share with others, celebrating the spirit of the Open Source, so the spirit of Drupal; this script has some limits that you can read above; but in this moment I hope and I believe that it can be helpful, 'cause it works.
I'll not answer anymore to any other polemic and I try to help in the XML Sitemap module issue queue.
Any useful feedback is deeply appreciated.
That's all folks.
Any useful feedback is deeply appreciated...
... except if you say how the script should be coded, and it's not (or what the script doesn't do, and it should).
aggressive and pedantic tone
Aggressive e pedantic tone doesn't come from me
It's hard to be jealous for who wrote the script has it is now.
ok.
ok.
The script is not written for a generic Drupal installation
The script is not thought to work on the genericity of Drupal installations, nor it's optimized. It makes some assumptions about the actual configuration that could not be met in your Drupal site; in particular, it is not thought to support all the database engines that Drupal actually support, but it is written only for MySQL.
You use it at your own risk.
Let's be constructive
Why not be positive about this and help the contributor improve his script instead of picking on him? This script fills a gap that is not covered by exisiting modules - so let's be constructive and either work on implementing the additional functions the contributor provides in XML Sitemap or help the contributor make a great module out of this. And let us not forget that Open Source is also about choice.
_______
Safe Swiss Cloud: Enterprise grade cloud infrastructure for Drupal B2B solutions and platforms.
Added links could not be accessible from the anonymous user
The script doesn't use any Drupal function to verify if the anonymous user (which includes also the search engines) has access to the links being added. This can cause the search engines to report that some links caused a 403 error message.
If you have some modules that change the node access permissions that Drupal normally applies, then you are going to have some error reports in the webmaster tools page some of the search engines offer; this is particulary true for Google Webmaster Tools that reports in a more detailed way every error it finds when crawling the links reported in the sitemap.
Alternative script
If you want to use an alternative script that creates the sitemap content from the list of Drupal nodes, and that resolves the issues present in the script described here, look at #468590: Script to generate a sitemap content from the node data.
The script correctly set the changefreq field of the sitemap with the value calculated from the creation timestamp, and the change timestamp; it doesn't use a static value for each link. It uses 0.5 as priority, but you can change that to the value you like better; the value of the priority is the same for all the links added to the sitemap.
EDIT: I was forgetting to say that the script has been developed for Drupal 6; there aren't guarantees that it works on Drupal 7 too.
It sounds good!
I made tests onto three different sites for my clients, with standard Drupal 6 distribution and i18n module: the script in this page (bobighorus') works good; no issue.
For this moment it's all I need.
I hope I was helpful.
Thanks.
Jacob
Hi all, First of all I would
Hi all,
First of all I would like to say I am working with Kiam's module for months now. That's why I would like to thanks him for a work he has put in it.
But....
I had to switch my sites to international version and after some investigation have found I am only able to submit EN site version.
So I am unable to submit full site xml to google using Kiam's module. Or please advice how can I do it, DO PLEASE.
That's why I am just installing the script.
That's simple, some (a lot of ) people need ANY solution for ML sites, that's the point.
Kiam: Using your experience it should not a be a big stuff to manage such a problem :)
Paul
Life's not easy. For me it's
Life's not easy.
For me it's not working.
Here is what I've got:
www.4x4.org.pl/site/sitemap2.xml
www.4x4.info.pl/site/sitemap2.xml
So still looking for multilanguge multisite google sitemap :(
Paul
Latest version of XML Sitemap Has Multi-Language Capabilities
The 6.x-2.x-dev version of the XML Sitemap module I just tried out has multi-language capabilities, and generates sitemaps in multiple languages. For example sitemap.xml, he/sitemap.xml, nl/sitemap.xml and so on and so forth. I tried out the 6.x non-dev version before, and that clearly didn't provide multi-language functionality. You can see it in active on my web design services site at http://www.itwrite.com So far it seems to be working OK.
Regards,
Moses