Jump to:
| Project: | Memetracker |
| Version: | 6.x-1.1-alpha6 |
| Component: | Miscellaneous |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
| Issue tags: | hosting, installation requirement |
Issue Summary
Hello,
Newbie user here. I am interested in installing Memetracker on a live Web server. I've tried installing it on a local machine with limited success -- I am importing feeds, but I am unable to create clusters, I believe because the machine I am using (a 2002-era Vaio running XP) doesn't have enough processing power (one of the symptoms: There are lots of timeouts when I try to run cron).
I have some past experience working with commercial hosting companies like Go Daddy and Host Excellence, but am concerned that they might not be suitable for memetracker, for the following two reasons:
- Many commercial services may not support the installation of Pycluster, Python Numeric, Drupal, or other required components of Memetracker.
- I suspect a shared hosting service may not offer sufficient processing resources to support tracking memes (based on the comment here: http://drupal.org/node/281365#comment-923835 )
So, here are my questions:
- What are the baseline processing and hosting requirements that I should be looking for in a commercial service?
- What hosting services do you use for your Memetracker installation, or which ones would you recommend? (I have seen one recommended on another thread here, http://drupal.org/node/333326, but would like to hear of some others)
Many thanks!
Ian
Comments
#1
re: cron timing out -- you'll probably need to increase the max_execution_time for php on your machine (see this issue: http://drupal.org/node/287376#comment-1061521)
On shared hosting -- it was reported in this comment (http://drupal.org/node/283114#comment-1707822) that you could run memetracker's python dependencies in a virtual python environment. I haven't tried it myself but it seems feasible if your fairly comfortable with linux / python.
On answering the question for baseline processing / hosting requirements -- that's hard to say. The CPU time to calculate memes increases exponentially on the number of articles you're processing. For example, if you setup memetracker to find memes from the past day and you have 7 feeds you've added and there's only 15 articles, memetracker will take barely a second to process things. But if you have, say, 800 articles you're trying to cluster, it can take 1-2 minutes (that's what it takes anyways on my 1.6 ghz dual-core processor). Your ram usage balloons as well the more articles you cluster. Clustering 800 articles will use ~350 MBs.
I'd suggest getting memetracker working on your local machine to get a feel for how things work and how much cpu/memory you'll need for your application and go from there. As far as hosting goes -- I've heard lots of good things about http://www.rackspace.com/ -- their virtual-private-servers are fairly inexpensive.
#2
Thanks Kyle.
If anyone else is reading this thread, and has Memetracker running, could you share some information about the system you are using, the number of feeds and articles, and the time it takes to cluster? It could be helpful to people setting up or trying to optimize their settings.
-- Ian
#3
Hi have a machine with 6gb RAM, and Drupal PHP has 2GB max_limit, so no Cron memetracker/machinelearningapi can use about that much. I cluster about 100 feeds (200-240 articles per day).
Very often, the Cron dies, with disastrous results even at that limit - table mememtracker_search_x will show duplicates which are not warrented, and manual cleaning-up of the database is required once or twice a day.
I am wondering is this module is worth anything at al, given it is not updated in a very long time, and the code is primitive in many aspects from a production point of view. Searching around for alternatives, and any suggestion is welcome.
#4
Thanks deltab. Is there a link for your site, or is this a local experiment not facing the Web?
Despite the implementation and operational bugs, I think the module is definitely worth something -- it's free and lets people experiment with memetrackers and clustering. It could be the foundation of great things, but naturally it's hard to build it up with just one author who has full-time obligations elsewhere.
What would be the best course for taking this module to the next level? Elsewhere in the Drupal ecosystem, how are successful modules built up?
Ian
#5
hi newsio, it is a local experiment - I am held back on a public deployment just because the Memetracker code needs a lot of work. Some kind of throttle controls (using native Drupal stuff) are absolutely the first priority, I guess - the way I am handling memory is with Eyleisacron, and a very long expiration time with the set 2GB, and the Naive Bayes in memetracker are only limited. You will notice some "need to do still" markers in the code and the Todo file has many good ideas that need to be implemented.
So I think it needs a lot of work from (a) the finishing point of view, and (b) by using other core drupal capabilities or modules to become a more effective module.
Yes, the module is worth a lot, I don't know how the modules are built up, and I am not a coder, but can contribute a lot from a testing and analysis point of view. I think successful modules are built through usage, no?