Closed (won't fix)
Project:
OpenCalais
Version:
6.x-2.1
Component:
Code
Priority:
Normal
Category:
Task
Assigned:
Reporter:
Created:
30 Oct 2008 at 23:57 UTC
Updated:
22 Jul 2010 at 23:38 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
bacchus101 commentedSo am I correct in reading this that a custom CCK body field will not be processed by Calais?
I was using Calais, but needed to change to a custom CCK field for the body so that I could manipulate the output via contempate - but now I am no longer getting term extraction.
Comment #2
febbraro commentedNo, this currently works to the best of my knowledge, this is just a placeholder to make sure I test a whole bunch of CCK field configurations, etc. but if things broke for you something may be happening.
Can you try printing out $node->body in calais_process_node in the calais.module file. That will tell us if it is not getting the body from the cck field.
Comment #3
bacchus101 commentedThanks for your quick reply. I ended up getting it sorted out by using the default body field and making some adjustments with contemplate. Once back to the standard body, Calais immediately started providing terms.
Before that I had a custom multi-row text field that was substituting for the body within the custom content type and Calais was not pulling the terms from that specific field (nor for instance does it pull from the image field description.)
I'd try to help debug (if there is indeed an issue beyond my installation) but I am not sure how to implement the debug code that you need from above. I'm less of a coder and more of a high functioning copy and paster. ;)
Comment #4
febbraro commentedThat sounds like a resounding "not working well for CCK fields" I'll address this asap and try to get a new release out in the coming days.
Comment #5
febbraro commentedbacchus101: Here is a quick patch that puts all cck fields into the content that is submitted to Calais. I'm thinking we may want to allow configuration of *which* fields get submitted, but for now they will all get in there.
Comment #6
bacchus101 commentedIt doesn't seem to be completing the patch.
..and this ends up in the rej file.
Am I approaching this wrong? Thanks.
Comment #7
febbraro commentedOoops. Sorry I patched my dev version and not the version from the latest release. Give this a whirl.
Comment #8
bacchus101 commentedAlright, patch applied successfully.
I am still testing things out, but thus far I see "no joy" with term extraction from CCK fields.
I created a new content type and kept the body field and added a text field. I then created two identical articles with the content type, one pasted only into the body and the other pasted only into the text field. The one with the body field suggested and chose terms for Calais upon publication, the other did not.
I don't want to tie you up with this as I have a vast array of 3rd party modules that could be causing unforeseen issues, unless of course it is an area you feel like actively pursuing.
Comment #9
febbraro commentedNo worries, I appreciate you testing the patches out and this will ultimately make this module much better for everyone else, so it is no hassle at all.
In calais_process_node where the patch was applied, line 106 or so, can you print out what $body is before getting send for processing. It might also help to print out what $node->content is as well. $node->content is supposed to return an array of all of the CCK fields for that node and should contain all the data you are looking to get processed.
Let me know what turns up.
Comment #10
bacchus101 commentedI attached a file showing both the working and non-working versions of the article (with data in the body field and data instead in the CCK field.)
Comment #11
febbraro commentedSo the return of
Is empty?
Comment #12
bacchus101 commentedThis is where my lack of knowledge will handicap this investigation. I am not sure how to find out what that statement is returning.
Comment #13
febbraro commentedThat statement is in the function calais_process_node in the file calais.module around line 110 or so. It was added by that patch in #7.
You can do something like this....
Let me know if that helps.
Comment #14
bacchus101 commentedWhen I add print under that statement, it outputs the text from the CCK field in question upon updating the node.
Comment #15
febbraro commentedOk, that is good, that means that it is grabbing the text and attempting to send it to Calais for processing.
Can you now print out the return from Calais itself?
opencalais/includes/Calais.inc around line 81 is
Thanks. I really need to make this a debug option for folks, this is not the first time :)
Comment #16
bacchus101 commentedHere it is:
Comment #17
bacchus101 commentedI am not sure what happened between point A and point B, but it is now pulling the terms for some CCK content types.
Here is one that is not:
...and another:
Comment #18
bacchus101 commentedYes, it is working now with new content. I just posted a test page in a CCK field and it rendered the terms.
I am not sure what I did between when I first installed the patch and where I am now, but I have been doing a variety of things with my drupal installation over the last few hours (adding and removing other modules) so I can't be sure what changed the environ.
Comment #19
febbraro commentedOk, the 1st one is working, that is great.
2nd one is not working, you need to also apply the patch on #344279: no tags created from comment #17
3rd one. I have a feeling that the text is too short to find any entities within it.
Comment #20
bacchus101 commentedThat is what I thought with the 3rd one (although I was secretly hoping it could cull data from short descriptions of images and such.)
I applied the other patch and all is well. It created the terms for the 2nd one.
Thanks!
Comment #21
febbraro commentedNice. Glad it is all worked out. Enjoy.
Comment #23
Johnny vd Laar commentedthe problem with this line:
$loaded_node = node_build_content($node);
is that it changes the $node->body variable
we have built a module that sends the $node->body to another webservice as well. but the node_build_content function wrapped the $node->body with html tags
so we replaced this line with:
$loaded_node = node_build_content(clone $node);
such that the function works on a clone of the $node object, and thus doesn't change the $node object
Comment #24
Johnny vd Laar commentedforgot to reopen the issue
Comment #25
febbraro commented@Johnny vd Laar I created a new issue just for this. #412796: Calais changes the node body when agregating CCK fields
Comment #26
Anonymous (not verified) commentedDoes it make sense to let Calais process all CCK field labels? In my case, I have a list of hundreds of CCK field per node type. The labels are not interesting (e.g. WebDAV Support). Now all my nodes get a Calais for WebDAV Support, which is nonsense.
So I'd need to be able to specifiy which CCK fields and when to process them, e.g. if WebDAV Support == Yes. Then it would make sense again for me.
Comment #27
Anonymous (not verified) commentedI decided to increase the relevancy threshold for terms. But it's not a real solution -- CCK field labels shouldn't be indexed, to my opinion.
Comment #28
febbraro commentedWe can't expect this module to know what you intend or don;t intend to get processed with OpenCalais. Labels might be a part of the content for all I know. If they are not useful then why are they rendered on the node body itself? That said, I do provide a hook hook_calais_body_alter that will allow you do change the body however you want like possibly render it without the labels, etc.