@ phase 2 people
(This should probably be in the OpenCalais issue queue but I'm consolidating this as an OpenPublish feature request.)
Jeff from Adaptive Themes sent me the Adaptive Theme subtheme for OpenPublish. I'm very excited to get my hands dirty with it. I think that there is going to be a huge potential for sophisticated ways of organizing content.
I've installed OpenPublish 1.6 and the Adaptive Theme and am set to get to work. My big goal is to utilize OpenCalais, Context and Gpanels. I'm hoping that the Gpanels will let me effectively use the Block Accordion module because accordions are cool.
I would like to have links on each page to similar content on my site, all other content by the author and content from around the web. When I first used the more like this module it didn't really return the content that I wanted so I had to remove it. I'm going to try again. However, what I'm really interest in is understanding how to use the semantic proxy and the Feed API. I would love to have 100 feeds coming into my site and have OpenCalais parse the destination content to create relevant links in articles on my site to other sites in the industry throughout the web. This would bring the real power of an aggregate website such as the Huffington Post to this little newspaper. How do I do that????????? <--- Good question.
The second thing that I really need OpenCalais to do is not be so judgmental with people and places. I need OpenCalais to do its thing which is great, but it also needs to tag all people, businesses and places in each piece of content on the website no matter it's relevancy ranking. The last time I checked relevancy rank was set global for the whole node type not just for particular entities within the node type. Is this possible? That would be amazing because I could create a database of people, businesses and places. Perhaps, I could create a module to do this?
BTW, the website is becoming a huge success. Thanks for the OpenPublish platform.
Comments
Comment #1
mmorris commentedHere are some pointers.
Check out this FAQ about getting the most out of your semantic content: http://www.opensourceopenminds.com/openpublish/faq/how-do-i-configure-ca.... (This is a new post, btw, so don't think you missed it :) We've posted a document that may help you in determining the ideal threshold settings for Calais and More Like This. It's very, very configurable and I'm confident you will find the right mix for your content. Your suggestion of being able to control thresholds at not just the content type level but also at the Calais entity level is a great one, and I'm sure could be added to the OpenCalais module. If you're serious about doing this, please contact febbraro.
Getting your 100 feeds set up and semantically tagged shouldn't be too hard. If it helps, here is another FAQ you might be interested in: http://www.opensourceopenminds.com/openpublish/faq/how-do-you-populate-y.... The hardest part will be getting your content organized into groups. Not sure if you have a controlled vocabulary for doing that already, but you might have to make some manual decisions about where a piece of content "lives" based on how it gets semantically tagged.
Couple other things you should know. Check out the Calais Tag Modification settings (/admin/settings/calais/calais-tagmods) where you can blacklist terms you don't want and rename terms that you do want. You could also install the Taxonomy Manager module which gives you tons of features for merging and managing terms. With all that you've got a pretty powerful set of features at your hands.
Good luck, and congrats on your site's success!
Mike
Comment #2
alfthecat commentedHi,
I'd really like to join this discussion because I'm really impressed with Openpublish and Open Calais. Please forgive me for shooting the next four questions into this discussion, I know the configurations can be complex and answering them may be too...
I guess my questions are a bit more of the same, I looked at the suggested documentation but couldn't find the right answer for me.
Semantic Proxy would be where my most pressing question lies. When I create a feed (in the open publish website) I have it process the items into Articles. This works fine. I configured the mapping by adding a CCK field as a link and I map the original URL to it. I also point Calais towards this CCK field as a source for its semantic processing. This seems to work fine, no errors and all the nodes get treated when I run bulk processing. However, I'm having a hard time getting the full content of feeds returning to the individual article nodes. If I visit the original url I get the full article from the source website of the feed. With some feeds it just feels to me like calais should have been able to grab the full text, but I have no way of being sure. Have I missed a vital step in my configurations or is there an external reason why semantic proxy couldn't provide the full text of the article?
The second question is about mapping images. Open publish is set up with CCK fields to control a neat and uniform display of pictures. I really love this but I noticed when trying to map the images I get no source containing an image that I can map to the appropriate CCK field of Open Publish. I realize off course, that it may depend on the distributer of a particular feed to supply a tag along with the original image. However, I did notice that none of the feeds I tried, from various sources, give me the right mapping sources to populate the image CCK fields. Images do get included in the body of certain articles though. So my question nr.2 is: is there a smart way to have the images included in a feed, appear in the appropriate CCK fields?
Now for question nr. 3.... From the articles that do arrive in full into my Open Publish website, I often get the adds as well. I tried tinkering with input filters and the feed api settings but I ended up losing a lot of orignal content as well. Is there a way of blocking adds?
Finally, question nr.4: Some of the feeds coming in include premium content of which I can guess semantic proxy typically fails to retreive full body texts. Can I somehow prevent certain nodes from being automatically created based on certain keywords? Like 'You need to register a premium account to read the rest....'? And, is it possible to automatically create a field displaying a text like 'Read the full article on example.com' with example.com being substituted by the original URL of the source content?
Thanks in advance for considering these questions!
Comment #3
alfthecat commentedI'd like to correct my previous post on the subject of question number 1... I noticed now that no additional content is retreived at all....
Some of my feeds distribute the full content, semantic proxy never seems to deliver content by itself.....
Comment #4
alfthecat commentedFound a solution for the More Like This terms not being populated by Calais on imported nodes through feedapi.
See this post: http://drupal.org/node/384170 (You'll need to modify the MLT module's code a little but for me it worked great. Even all the existing nodes were immediatly picked up by MLT!)
Comment #5
emcconnell commentedIs the above fix still the best way to correct More Like This checkbox being unchecked when importing a feed into Article nodes?
Thanks!
Ed