Ice Scan is a product search engine for consumer electronics, computer hardware and software, telephony, mobile phones, household appliances, and much more...
www.icescan.com - USA version
Ice Scan uses the domain and i18n modules to build a fully multi-domain, multi-site and multi-lingual system: 12 languages, 11 domains, 11 Solr databases - all from 1 Drupal core using 1 MySQL database, on 1 server.
When I read TheFind.com was funded with 20 million USD investment capital,... (back in 2005)... I knew it could be done on Drupal for nothing. And it looks like I'm going to be right. I imagined these large corporations spent millions $$ alone on their 'patentend search technologies'. But a similar, equally powerful search technology ships for free in Drupal's Apache Solr Integration Project. No need to re-invent the wheel.
This is a personal project and still just a try-out for a much bigger project. Eventually the idea is to compete with TheFind.com and Google Shopping. The system will also be released as a Drupal installation profile, so others can setup their own affiliate product search system. Release date for such a profile will take a while, expect it around late 2011, early 2012, and will be done on Drupal 7, when the time comes. I first need to focus on building the new business and a series of such search sites.
This Drupal 6.x site now has almost half a million nodes and 5 million taxonomy tags (SKU, Brand, EAN, etc. etc.), but will continue to grow to over 10-20 million product nodes. At that point "Ice Scan" has reached its pre-determined limit. But another project in planning will go much further, to the sizes of Google and TheFind (i.e. 500 million products and more).
I will occasionally report back to the community what obstacles had to be overcome.
So far, dealing with 500K nodes:
- avoid taxonomy_get_tree() at ALL cost!! Write custom workarounds if you really need it. That function is just not intended for large sites.
- watch out for pathauto: it uses taxonomy_get_tree(), i.e. don't use paths with child-terms (our PHP memory constantly spiked to 1200MB due to tax_get_tree...)
- switch to Drupal Pressflow "fork", http://www.pressflow.org, for performance
- stick to memcache for anonymouse users, it works great (see http://2bits.com/ for a SF2010 video). There's a lot of caching modules out there, but all you need is memcache and a lot of RAM.
P.S. Why the name "Ice Scan"? Because good short domain names are very hard to come by. I follow the principle "build first, brand later". If a project becomes a success, and the money comes, you can always opt to buy a cool 4-letter domain.
Ice Scan's search technology is based on Drupal's Apache Solr Search Integration,
http://drupal.org/project/apachesolr
Also part of the series are:
http://ca.icescan.com - Canada
http://www.icescan.co.uk - United Kingdom
http://www.icescan.de - Germany
http://www.icescan.fr - France
http://www.icescan.be - Belgium
http://www.icescan.nl - Netherlands
http://www.icescan.it - Italy
http://www.icescan.es - Spain
http://jp.icescan.com - Japan (* planned, not live)
http://www.icescan.in - India (* planned, not live)
More domains will be added, as we scale it up to 10-20 million products total.
Top 5 most important Drupal modules to create such a site:
http://drupal.org/project/apachesolr (with custom modding, additional patches)
http://drupal.org/project/feeds (with patches for mapping on import, and parser-level batching)
http://drupal.org/project/i18n
http://drupal.org/project/domain
http://drupal.org/project/views
Morningtime also developed a series of custom modules. Some of these will eventually be realease to the Drupal module repository. Among others the ApacheSolr-Money integration module, with a price-slider to filter results.
You can also vote for Ice Scan on Drupal sites.net
http://www.drupalsites.net/weblink/ice-scan
Comments
Very interesting
Very interesting information.
I'm glad that you've listed out the main modules and the problems you are having with some of the modules/functions.
Please keep the information coming. It will be interesting to see your progress with Drupal.
and GOOD LUCK!
Dead project?
Hi, interesting. Is it a dead project now?
$100,000
He quoted me $100,000 when I asked how much to sell me a license.