Closed (fixed)
Project:
Boost
Version:
6.x-1.x-dev
Component:
Cron Crawler
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
18 Aug 2010 at 08:17 UTC
Updated:
3 Jan 2014 at 01:42 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
agence web coheractio commentedAnyone else having this issue ?
Comment #2
mikeytown2 commentedwhat about pages already in the cache? Does it work correctly with those? My guess is your loading up URL's from the alias table. Code in question
This query has been heavily optimized so it would load up millions of alias in a short amount of time. Looking at the code I might have a solution...
Comment #3
mikeytown2 commentedComment #4
agence web coheractio commentedWorks perfectly with the patch.
Many thanks for that great module
Just one comment : there is a missing "." in
$url = $base_url . '/' $row['language'] . '/' . $row['dst'];Should be
$url = $base_url . '/'. $row['language'] . '/' . $row['dst'];Laurent
Agence Web Coheractio
Comment #5
mikeytown2 commentedcommitted
Comment #7
edjay commentedI put this thread as active because i think this patch can cause errors.
In fact, since i install the latest dev release, i see in my apache access logs that the crawler search my nodes in a wrong place. It prefix the path with '/fr' although my site is not in multilingual.
My default language is french and it is the only language activated in the language interface, the prefix 'fr' is indicated in the "admin/settings/language/edit/fr" page. So i've removed the prefix to see if boost change the path where it search my nodes but no result.
In the 'url_alias' table, the nodes have kept the prefix 'fr' in the row 'language' and I see that in the patch, no verification is made to know if the path must be prefixed, it just verify that the node language is corresponding to the active language or if the node language is empty.
In my case for example, boost has to know if the prefix indicated in the 'url_alias' table must be included has prefix (no, in this case).
What's more, the $row['language'] is used to rewrite the url. No use of the $row['prefix'] variable which seems better to correspond with the site settings.
this is what i change temporary and i know it is not the good way but it works with my configuration for now.
if (empty($row['language']) || $language->language != $row['language'] || empty($language->prefix)) {
$url = $base_url . '/' . $row['dst'];
}
else {
$url = $base_url . '/'. $language->prefix . '/' . $row['dst'];
}
I'll see if i have time to correct that.
Sorry for my english !
Comment #8
edjay commentedComment #9
mikeytown2 commentedComment #10
ressaThis still happens with the latest dev version (6.x-1.x-dev 2010-Dec-21) when you enable "Crawl All URL's in the url_alias table". It prefixes the path with the language (for example '/fr') although the site is not multilingual, and can't find those pages, because they don't exist.
EDIT: I have now added the prefix under "Statically cache specific pages" excluding 'fr/*' -- perhaps a temporary fix?
Comment #11
ressaExcluding 'fr/*' under "Statically cache specific pages" didn't work, the crawler still visits the 'fr/' urls...
Comment #12
mladenu commentedI had a problem that crawler won`t crawl entire url_alias table (only built-in english, not my serbian (sr) lenguage), and solve mentioned with this:
function boost_crawler_add_alias_to_table() {
// Insert batch of html URL's into boost_crawler table
global $base_url, $language;
if (!variable_get('boost_crawl_url_alias', FALSE)) {
return TRUE;
}
Removed "$language" string:
function boost_crawler_add_alias_to_table() {
// Insert batch of html URL's into boost_crawler table
global $base_url;
if (!variable_get('boost_crawl_url_alias', FALSE)) {
return TRUE;
}
All works fine now and crawl do job...
Comment #13
mikeytown2 commentedlooks like I can't use high performance logic along with i18n... need to call url instead of trying to glue the URL together in SQL.
Functions that should help: language_list('enabled'), language_default()
Comment #14
bohz commentedSame problem as #7 here.
The fix worked for me too.
Thanks a lot!
Comment #15
hedac commentedlatest dev and I still have problems with this too...
crawler goes into /en/alias... 404 error. the alias has language set to English... but default site language is english so no /en/ should be on the url...
the alias entries without language assigned or other languages are working ok
Comment #16
hedac commentedok I have it working now... I changed the #7 to :
Comment #17
bgm commentedThanks for reviving this issue. I have reviewed and committed to 6.x-1.x the patch in #16/#7.