Bulk submission of existing nodes
robertDouglass - July 4, 2008 - 16:20
| Project: | Calais |
| Version: | 5.x-1.5 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | febbraro |
| Status: | closed |
Jump to:
Description
When you install and enable this module there is no convenient way to submit the existing nodes of a site to Calais. One option would be to make a table at install time with a list of all node ids that are currently on the site, and then batch through them on cron.

#1
Hey Robert,
Thanks for the submission, and yes, this is on the list of upcoming features. It is kinda tricky though, we don't just want to assign any/all tags to content b/c those could be numerous/overwhelming. Calais just released a relevancy rating with each term so we were considering implementing this in conjunction with that so you can set a threshold to apply. With our publishing folks it is not so cut an dry to just assign everything so we are trying to devise a good/workable solution that allows flexibility for everyone.
Any suggestions from your experiences?
Thanks,
Frank
#2
I think at a minimum there should be an option to start a batch process (using batch API in D6) which sends all existing nodes (of a content type?) to the service. This is something that will be relevant to every Drupal site that adopts OC that has legacy content. I've started down the path of hitting the edit tab and pressing submit on every node... fingers get tired =)
#3
It would be cool if OpenCalais had a bulk service because sending each one via http is very costly.
#4
Moshe, I agree with you. They currently don't have a bulk API (I also saw your question on the Calais Forum), but this one is one of the next on our list, regretfully it will have to be one at time though. We'll make the # per batch run configurable.
#5
Hmmmm,
actually, there is a
not-so"hackish" way to process existing nodes even now. Calais processing is invoked in hook_nodeapi, so anything that saves a node can trigger it if Calais is enabled for the node type. Writing a very simple cron task that can go through nodes and invoke node_save() on them, will do the job.Now, that is from code and for non-geeks a user-interface is probably needed. One thing to keep in mind: such ui console may not try to process all nodes in one go!!! Since there may be a large queue of nodes, processing may hang/freeze. I guess, ideally, it should work something like Search Re-Index screen, where you trigger the process and then it does a batch on every cron run?
#6
subscribe
#7
A bulk processor has been committed to the DRUPAL--6-2 branch. Backporting will likely not take place as the Batch API used does not exist in D5.
#8
Automatically closed -- issue fixed for two weeks with no activity.