Ice Scan is a product search engine for consumer electronics, computer hardware and software, telephony, mobile phones, household appliances, and much more...

www.icescan.com - USA version

Ice Scan uses the domain and i18n modules to build a fully multi-domain, multi-site and multi-lingual system: 12 languages, 11 domains, 11 Solr databases - all from 1 Drupal core using 1 MySQL database, on 1 server.

When I read TheFind.com was funded with 20 million USD investment capital,... (back in 2005)... I knew it could be done on Drupal for nothing. And it looks like I'm going to be right. I imagined these large corporations spent millions $$ alone on their 'patentend search technologies'. But a similar, equally powerful search technology ships for free in Drupal's Apache Solr Integration Project. No need to re-invent the wheel.

This is a personal project and still just a try-out for a much bigger project. Eventually the idea is to compete with TheFind.com and Google Shopping. The system will also be released as a Drupal installation profile, so others can setup their own affiliate product search system. Release date for such a profile will take a while, expect it around late 2011, early 2012, and will be done on Drupal 7, when the time comes. I first need to focus on building the new business and a series of such search sites.

This Drupal 6.x site now has almost half a million nodes and 5 million taxonomy tags (SKU, Brand, EAN, etc. etc.), but will continue to grow to over 10-20 million product nodes. At that point "Ice Scan" has reached its pre-determined limit. But another project in planning will go much further, to the sizes of Google and TheFind (i.e. 500 million products and more).

I will occasionally report back to the community what obstacles had to be overcome.

So far, dealing with 500K nodes:

- avoid taxonomy_get_tree() at ALL cost!! Write custom workarounds if you really need it. That function is just not intended for large sites.
- watch out for pathauto: it uses taxonomy_get_tree(), i.e. don't use paths with child-terms (our PHP memory constantly spiked to 1200MB due to tax_get_tree...)
- switch to Drupal Pressflow "fork", http://www.pressflow.org, for performance
- stick to memcache for anonymouse users, it works great (see http://2bits.com/ for a SF2010 video). There's a lot of caching modules out there, but all you need is memcache and a lot of RAM.

P.S. Why the name "Ice Scan"? Because good short domain names are very hard to come by. I follow the principle "build first, brand later". If a project becomes a success, and the money comes, you can always opt to buy a cool 4-letter domain.

Ice Scan's search technology is based on Drupal's Apache Solr Search Integration,
http://drupal.org/project/apachesolr

Also part of the series are:

http://ca.icescan.com - Canada
http://www.icescan.co.uk - United Kingdom
http://www.icescan.de - Germany
http://www.icescan.fr - France
http://www.icescan.be - Belgium
http://www.icescan.nl - Netherlands
http://www.icescan.it - Italy
http://www.icescan.es - Spain
http://jp.icescan.com - Japan (* planned, not live)
http://www.icescan.in - India (* planned, not live)

More domains will be added, as we scale it up to 10-20 million products total.

Top 5 most important Drupal modules to create such a site:

http://drupal.org/project/apachesolr (with custom modding, additional patches)
http://drupal.org/project/feeds (with patches for mapping on import, and parser-level batching)
http://drupal.org/project/i18n
http://drupal.org/project/domain
http://drupal.org/project/views

Morningtime also developed a series of custom modules. Some of these will eventually be realease to the Drupal module repository. Among others the ApacheSolr-Money integration module, with a price-slider to filter results.

You can also vote for Ice Scan on Drupal sites.net
http://www.drupalsites.net/weblink/ice-scan

Comments

cookiesunshinex’s picture

Very interesting information.

I'm glad that you've listed out the main modules and the problems you are having with some of the modules/functions.

Please keep the information coming. It will be interesting to see your progress with Drupal.

and GOOD LUCK!

HenLab’s picture

Hi, interesting. Is it a dead project now?

Garry Egan’s picture

He quoted me $100,000 when I asked how much to sell me a license.

Mathijs Koenraadt - Morningtime
11:06 AM (6 hours ago)
I'll give you a copy of the newest system (see an example at www.shopcircuit.nl/search) for $100,000.

I'll be happy to install it for you on your servers. It runs on MongoDb, Solr, Memcached etc.