Closed (won't fix)
Project:
XML sitemap
Version:
5.x-1.4
Component:
xmlsitemap
Priority:
Critical
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
19 Dec 2007 at 10:04 UTC
Updated:
19 May 2008 at 06:34 UTC
Jump to comment: Most recent file
Comments
Comment #1
darren ohWhy not use a separate files directory for each site and include a symlink to your avatars directory?
Comment #2
niteman commentedThe fact is that i'm considering that solution but i'm not sure about the implications. It will need hard and carefull testing since a couple of modules I use store some files under that tree.
Nevertheless I think a module that store some files under the drupal file system should take care about multisite implications or at least privide a setting to overrride default file location.
Regards
Comment #3
wayland76 commentedI have to agree with NITEMAN. I'm using the Multidomain Module ( http://drupal.org/project/multidomain ) and it does this by default. While it may be possible to change this, there will presumably be more people using this module as time goes on, and more people reporting the problem.
The problem mentioned by NITEMAN is the most obvious conflict mentioned between these modules. I hope to report the other problems as bugs, but I'd prefer to see this fixed first; it may resolve some of the other things I'm seeing.
If you want to see my XML sitemap, try http://www.jdarx.info/sitemap.xml
Comment #4
wayland76 commentedI've attached a patch that contains the fix suggested by NITEMAN. I haven't tested it, but the PHP is valid (I tested that). The two things I haven't done:
1. If the line in xmlsitemap/xmlsitemap.install that does an SQL DELETE needs alteration, I haven't altered it
2. This change moves the sitemap; it would be nice if there was an alias of some sort from the old address
HTH,
Comment #5
wayland76 commentedDarren, is there any chance you could advise on what needs to happen to this patch before you'll accept it into the core?
Comment #6
wayland76 commented(Sorry, not core, but CVS)
Comment #7
wayland76 commented...other than testing, I mean :). I know it needs:
- Testing
- Comments above the new functions
Is there anything else?
Comment #8
darren ohIt needs to uninstall cleanly. That means no left-over files. If multiple sites are sharing the same files directory, each site needs a way to keep track of its files, and a way to detect when a directory is not used by any other site and can be deleted.
Comment #9
darren ohComment #10
darren ohFurther clarification: a distinction must be made between multiple sites sharing the same files directory and one site with multiple domains. If we support both, the uninstall script must be able to uninstall the files for all of its domains without touching the files of other sites.
Comment #11
wayland76 commentedOk, I'll work on the uninstallation stuff; I might be able to make some sort of intelligent comment after working on it.
Comment #12
wayland76 commentedHmm. Well, I changed the code (not attached) to be recursive. I'm not clear what you're saying in your comment 10 above. I have essentially multiple sites sharing one codebase and one database, and they are separated using the Domain Access module. The recursive delete will work in this case.
I'm not sure what you mean by "multiple sites sharing the same files directory". Are you referring to the situation where two sites share a codebase, but not a database?
Comment #13
wayland76 commentedBtw, have been testing it on my site, and, while the above patch didn't work 100%, I now have it working and commented. The only thing I haven't tested is the uninstall stuff.
Comment #14
darren ohYes. It's not technically correct to use the term "multi-site" to refer to anything else. Most of what has been discussed in this issue has actually been duplicate domains.
Comment #15
wayland76 commentedOk, I have a question, but first I have to ensure I understand the situation.
My experience to date has been that if I manually delete the XML file, it will be automatically regenerated on the next sitemap access.
So, how about if we deleted all the files. Then, if xmlsitemap is uninstalled from a multisite setup, everything gets deleted, but regenerates no problems. Is that a problem?
Comment #16
wayland76 commentedOk, I decided to hope that my comment above is correct, and upload an updated patch. It seems to work for me.
Comment #17
wayland76 commented(Btw, if there were some easy way to figure out which files belong to which sites, I'd give it a go, but if you're using the multidomain module, you find out one way, and if you're using the Domain Access module, then it's another way, and if you're manually setting prefixes, then its another way, and there could be others in the future, so that's why I went for the approach that I did).
Comment #18
darren ohIf we're going to depend on the files getting regenerated, we need to make sure the gss.xsl file is automatically regenerated, too.
Comment #19
wayland76 commentedOk. I've moved the gss.xsl to the per-site folder as well.
Now that I've removed hook_enable, I found that this patch didn't install cleanly against 1.4.
Maybe its naughty of me, but, while I think I've been developing against HEAD, I've been testing by applying that HEAD patch to 1.4.
Comment #20
TKS commentedsubscribing...
Comment #21
wayland76 commentedDarren, can we ask what the status on this is?
Comment #22
Comitto commentedThere are more problems, that seems not to be covered by the xmlsitemap_fix_multisites-3.txt patch.
One can access a site by HTTPS or a non-default port. (Fixed in attached patch 3b)
There is also a problem with which sitemap URL should be posted to search engines.
You may want to submit only your preferred URL. My opinion is, there should be an additional configurable variable to set preferred URL site.
There is also possibility to optimize caching mechanism. Huge amount of resources may be spent for building sitemap, so if we build sitemap with root URL replaced with some tag (ie. @@@ROOT_URL@@@), we will be able to simply replace this marker using str_replace with requested URL (ie. https://www.drupal.org:443/), so map will be build only once per site, no matter how many URLs are being used to access the root site.
Comment #23
wayland76 commentedHmm. The newest patch (3b) doesn't seem to work with multiple hosts; we need to use HTTP_HOST instead of SERVER_NAME.
Comment #24
wayland76 commentedAlso, the caching suggestion wouldn't work universally -- I have different content on different sites.
Comment #25
wayland76 commentedOk, new patch attached that fixes the two aforementioned problems. Patch diffs against head.
Comment #26
wayland76 commentedApologies to any who tested the previous patch; this works better.
Comment #27
hass commentedThis will not work behind a proxy:
$proto = $_SERVER['HTTPS'] ? 'https' : 'http';and I saw some code style issues... try to use coder module, please.Comment #28
wayland76 commentedI'll use coder and submit another patch.
I don't understand in what circumstances a proxy would affect the code above. If $_SERVER['HTTPS'] is unset (the default), then $proto is 'http', which is fine for all non-https connections. But my understanding was that https connections (unlike http connections) don't go through a proxy, but are direct connections, which also means that there's no problem.
The getenv was bad though; it's now fixed in the version on my home machine (which I'll keep working on).
Comment #29
hass commentedAehm, yep... proxy is wrong... loadbalancer with SSL acceleration may break this
Comment #30
wayland76 commented@hass: Well, I'm afraid I don't have a spare one of those to test with :). If someone can tell me another way to make this work which works with that, I'll go with it, but I'm afraid this is the only way I know at the moment :).
But I still have to make another patch :).
Comment #31
hass commentedWell, in such a case the webserver cannot know - maybe you can use a relative path!? Not sure if this is possible, but this would work. Otherwise leave it as is.
Comment #32
fellow commentedI use domain access and have unique url for my nodes (only subdomain.example.com/node1 not example.com/node1). In sitemap.xml I see example.com/node1 insteed of subdomain.example.com/node1... i tried to apply xmlsitemap_fix_multisites-5.txt (using cygwin) but get xmlsitemap.install: command not found.. what should i do? thank you!
Comment #33
wayland76 commented@fellow: A couple of questions:
1. What URL are you accessing your sitemap at?
2. Do you get that error when you're applying the patch, or in Drupal afterwards, or in your apache log, or what?
If it helps, you'll find you have two sitemaps, probably something like:
http://example.com/sitemap.xml
http://subdomain.example.com/sitemap.xml
You may also find that you need to patch the domain module, as specified here: http://drupal.org/node/216148
HTH,
Comment #34
fellow commented1. As anonymous user:
At example.com/sitemap.xml I see all nodes, including nodes only associated with subdomain1.example.com, subdomain2.. help.example.com..
URL is example.com/helptopic, not help.exemple.com/helptopic. Anonymous user can't access example.com/helptopic, only help.exemple.com/helptopic.
At subdomain1.example.com/sitemap.xml everything it's ok, only nodes from subdomain1.
2. I used Cygwin (http://drupal.org/node/32875). Please see the attachment.
I have Domain Access 5.x-1.4, which patch should I apply?
Thank you.
Comment #35
wayland76 commented@fellow: Well, you'll have to read the surrounding code, but the last one is probably the sort of thing that you're looking for. But it's not a proper Patch, it's just a matter of pasting the code into the right place and checking that it looks sensible. My advice is to back up first :).
Comment #36
wayland76 commentedNote: Comment #35 related to the patch to Domain Access that I linked above, not to any patch here.
Comment #37
wayland76 commentedOh, btw -- you have to check out HEAD, and patch that; this patch fails against the latest XML Sitemap because of other changes to HEAD.
Comment #38
wayland76 commentedOk, new patch (against HEAD); please test this and let me know what you think.
@hass: This fixes the getenv and the coder problems, but doen't address the load balancer. That's probably as good as it's going to get at this point.
Comment #39
fellow commentednow it worked for Me! Thank you
Comment #40
wayland76 commentedWell, that's two people it works for, then. Others?
Comment #41
Comitto commentedworks ok
Comment #42
darren ohI am working on a number of performance enhancements for large sites which will allow us to generate the site map directly from the database without slowing down the site. At that point we will no longer use the file system.
Comment #43
hass commented@Darren: Why are you not going to use "batch API" for D6? Aside the above patch is more for the i18n support we need in D6... as a side effect we get the multi-site issue fixed.
Comment #44
wayland76 commented@Darren: Can I also mention that less than 50% of this code is related to the filesystem, and that even the parts that are mostly do abstraction, which will hopefully make it even easier to get rid of those parts when the time comes.
Basically, this issue is holding up the D6 port, so there are a lot of people keen to see it go ahead :).
Comment #45
darren ohUnderstood. I will not be ignoring the work here; I just wanted everyone to know that other issues require changes that will affect this issue.
Comment #46
wayland76 commented@Darren; are you able to give us a timeframe on this? (ie. 1 week vs. 1 month vs. 6 months?)
Comment #47
hass commentedI'd like to understand what you are changing there, too. The way would be interesting...
Comment #48
darren ohSee issue 201644 for current status.
Comment #49
darren ohPlease use the new 2.x-dev release if you require this feature. The 2.x branch no longer uses the files directory.