This is just a thought I had, which is not relative to a particular branch, or a particular Drupal version.
As it is, all times that the data of a node, or a user is changed, all the sitemap must be rewritten. I mean that the process of asking to all the modules the links for the sitemap is started, the links are collected, and the sitemap content is generated (whatever it's the content that goes in cache files, or the content directly returned from the web server).
I noticed that:
- I could have a sitemap made of a single chunk, which contains a single link; this means that sitemap.xml could always be a list of chunks, even if that is only one.
- The limit of the links in each chunk is 50,000, but that doesn't mean all the chunks must contain the same number of links; I could have a sitemap chunk containing 20 links, and the other containing 20,000 links.
Considering this, I imagine that each module could populate a different set of sitemap chunks; this would allow to the code to act as follow:
- The sitemap is tagged as needing update; xmlsitemap.module asks to the other modules how many chunks they will use.
- xmlsitemap.module advises the other modules that they must rebuild the cache files. Each module verifies if its data have changed (it's enough they use a Drupal variable for this), and rebuilds the cache files for its sitemap chunks; if a module doesn't have changed data, it will simply do nothing.
This would reduce the time took for rebuilding the sitemap content, as it doesn't make sense to ask to xmlsitemap_user.module to return all its links (in whichever way it does it) only because a node has been changed.
This would also resolve another problem related with the user permissions. If the anonymous user doesn't have permission to access the content, xmlsitemap_node.module will report that it needs 0 chunks; in the case the anonymous user doesn't have the permission to see the user profiles, then xmlsitemap_user.module will report that it doesn't need any chunks.
I am thinking to change 6.x-1 to reflect the changes I described here.
Comments
Comment #1
avpadernoThe phrase should be read as If the anonymous user doesn't have ANYMORE permission to access the content. I know that this is maybe a particular case, but the module should be able to return the correct sitemap in each circumstances.
Rather than to set the status of each single node link present in a database table, the module must simply returns 0 as number of chunks it uses.
Comment #2
avpadernoPlease pardon my bad English; I cannot even use the time excuse, as it's almost noon for me.
Comment #3
avpadernoI wrongly set the referring version, but like I said this a topic that is not specific for a particular branch.
Comment #4
avpadernoI am assigning this task to myself.
Comment #5
avpadernoThis is not even happening in a normal case, as it's more probable that the user would enable at least two different modules (xmlsitemap.module is required from the other modules, and at least xmlsitemap_node.module will be enabled); therefore, even assigning a different set of chunks to each module, the sitemap chunks present will be not less than two.
Comment #6
dropchew commentedSubscribe. I was going to bring this up but you beat me to it lol.
Then you will be doing changes to 1.x and 2.x will be scrape in the end?
Comment #7
Anonymous (not verified) commentedI've retitled this so that I understand it.
Is this practical in practice? If I have chunks created because of the number of entries or the size of the file will this interfere? I don't have a good feeling about this.
Comment #8
Anonymous (not verified) commentedYes, I agree. These changes should not go in 1.x.
Comment #9
avpadernoYes, it is.
As the code actually works, the cache files are created for:
This will not be changed with the new code. The differences are the following:
The consequences of [1] are:
If a module must remove all its links from the sitemap because the anonymous user doesn't have a particular permission (this could happen to xmlsitemap_node.module, when the anonymous user doesn't have anymore the access content permission), it simply reports that it needs 0 (zero) chunks.
There are some details that needs to be handled, but that is nothing so hard to do.
Comment #10
avpadernoPerfect; then we can simply sit down, and wait the 6.x-2 branch will have an official release.
When I said that it doesn't make any sense to develop two different branches, I was said that it makes sense since it gives the users an official release before the 6.x-2 branch comes out with an official release.
Does that mean the 6.x-1 branch must keep its bugs? I don't think; differently we are going to give the users a buggy version telling them that they must wait until the 6.x-2 branch will come out with a perfect version. Does that make any sense? What would the purpose of giving a buggy version to the final users be? I hope that the purpose is not to make the users think See? The 6.x-2 version is better of the crappy code Kiam has been able to put together in all this time.
I quote also what Darren Oh said in #454190-8: Remove xmlsitemap_menu.module:
Comment #11
avpadernoComment #12
Anonymous (not verified) commentedThen we (Kiam and Earnie) need to come to an agreement because I'm of the opinion that 6.x-1.0 should be continually improving the code but getting a stable code base for more work in 6.x-1.1. Our goal was to get a 6.x-1.0 release to use. It used to be close to that, now I'm not sure where we sit. The code is changing too much to make a 1.0 release which goes against the goal that was stated by Darren Oh. At the rate you're changing the code my guess is that 6.x-2.0 will release before we even have a 6.x-1.0 release ready.
If you claim inefficiencies as bugs, then yes. Bugs that cause the non production of sitemap.xml to not produce results then no. Ever changing code always produces new bugs that need fixed.
Comment #13
avpadernoIt's hard to believe that, also because I have almost all the code already; the only part I still have to make is the part that populates the sitemap, which is not populated when the sitemap is requested. In the code I changed, I also removed the central database need.
The reference to what Darren Oh said was because you tagged the task as a 6.x-2 issue; Dave will do what he wants in that branch, but that must not stop us to do the correct thing for the 6.x-1 branch.
Comment #14
avpadernoComment #15
avpadernoComment #16
avpadernoThe changes described here have been implemented, and the new code has been committed in CVS.
Comment #17
avpadernoComment #19
avpaderno