Hi,
I'm not uploading it to CVS because i don't want to struggle with removing it later (for example, if name will change or some files will be removed (like parser file)).
Code is still messy, and needs more work, but whole thing seems to work (ie. it can aggregate news items ;)
We have 3 modules (for now, opml module will be 4th):
1. leech module - handles url and leeching it at cron run.
2. leech_news module - if leeched data is xml, it parses it and creates news items.
3. node_template module - it's used for creating "templates" of nodes. template node is then copied by news module to create news item (so it's not like with agg2 where feed node is copied and becomes item node). Now You can have "story" (or whatever created with CCK) type of news items :)
XML parsing part will change for sure. It will not be 3 stage process (apply changes to text data, parse xml to php object, use object to create nodes). I plan to make it so parser file contains code for creating nodes, and main part of the leech_news module to just save them... or maybe just call parser part to save nodes when it's ready...
Another idea is to have parser call some api function (like it's done with nodeapi and leechapi) for each tag opening and closing - so basically like expat parsing works. That way each module can modify generated object while xm is parsed, and we'll not need 3rd pass (and it will not make other modules depend on current way xml is parsed to php). That will also allow autotaxonomy to work (currently there's no simple way for it to work with leech module), but it may slow down process of parsing (i can't be sure until i'll try that ;)
So... this is just for brave people wanting to try and test it. Do not test it on "production" sites!
I'm not responsible for any data loss (it shouldn't happen, but e warned :).
To test it:
1. enable all 3 modules.
2. create node (it can be unpublished and unpromoted) and save it.
3. view it, and go to "node template" tab to create template from it.
4. create node which will leech news (You'll see some ajax magic if You allow for it - highly recommended :).
5. click "leech data now" link on new node, or just wait for cron to run.
Thanks,
Regards
ahwayakchih
| Comment | File | Size | Author |
|---|---|---|---|
| #53 | leech_3.zip | 31.56 KB | ahwayakchih |
| #15 | leech_2.zip | 30.66 KB | ahwayakchih |
| #11 | leech_1.zip | 30.6 KB | ahwayakchih |
| #10 | leech_0.zip | 31.23 KB | ahwayakchih |
| #1 | leech.zip | 30.17 KB | ahwayakchih |
Comments
Comment #1
ahwayakchih commentedSmall bugfix update :).
Comment #2
rooey commentedWhat version of Drupal is this for?
Comment #3
ahwayakchih commented4.7.3
Comment #4
colorado commentedI've got this new version of Agg2 installed, and all 3 modules enabled on a fresh 4.7.3 install, and I enabled the Leech feature for the Story content type. No other modules are installed, except for the default drupal modules.
I created a new node and added http://obama.senate.gov/podcast/index.xml in the Source Url field. I don't understand how the Node Template feature works, so I left those PHP code fields in the Template tab empty/blank. So far nothing seems to aggregate when I click on "Leech Data Now" (no new nodes/no new feed items are created). Am I missing something?
Also, can you please provide a copy-and-paste example of what is supposed to be in those PHP code fields in the Template tab?
THANK YOU and I will be happy to work on testing this module!
Comment #5
ahwayakchih commentedHi,
You need to first create "template" for news items. On my test site i created story node called "Story template" (how original ;), and made it into template (click on "template" tab when viewing node, which You want to be "cloned" for news items. setup some title, i made it the same as node title, but it can be different, and save it. PHP fields are not required).
Then create another node which will leech news items. It's like aggregator2_feed node, only it can be of any type You setup leech to support (You've setup story so, create story :). Enter url, and after few seconds You should see new form fields show up below. Setup options (they are almost the same as they were for agg2 feed), select template to be used (the one You've created earlier) and save.
Now it should work :).
Also be sure to check permissions if You're not trying all of that on admin account (i tested all as admin, so there may be some bugs there).
As for PHP fields:
"Loading PHP code" is run once, when template is loaded (they are cached for the time of whole "open this page" process). You can use it to modify "original" node's data. Some modules (including leech :) don't like when more than one node uses the same values (leech doesn't allow for duplication of url field), so this PHP code is mainly to make them work ok, while we wait for some additional module/code.
"Saving PHP code" is run for each new node created from template. So for each news item (when it's first created). It is run at save time, so You can change some data depending on what data is set.
But simplest example would be to change body (and/or teaser). For example, if content of news is "Hello world" and author name is "drupal", You could try to change it:
to get following body of node
drupal writes: "Hello world".
Of course that can (and should) be done in theme, not here, but it's just an example. One could call drupal functions, modules functions, etc... in this code. So, for example, it could be used to automatically create download images linked in html content of body and create image nodes for them (i don't know what for, it's just an example. and again, it would be better to have special module for that, but if there is none, and one wants to get/test functionality fast... this can be a good start before creating whole module :).
You could also change node depending on who creates it.
You can also make more templates and use CCK to create more types of nodes. Then in content type settings use different defaults for different types. And then allos one type of users to create only one type of leech nodes. So in the end different users will have different default settings (for example less trusted users will create leech ndoes which are freezed and moderated by default, more trusted users will be able to create leech nodes which are "active" from start.. etc..).
Regards
ahwayakchih
Comment #6
colorado commentedOK, Here's one little tiny buggy I found with the ajax form that appears in the page editor when a Source URL is entered:
Once I enter a Source URL, it cannot be removed. If I remove the Source URL and Submit, it re-appears. I have to delete the node entirely (Just changing the URL does work, though). Not really a problem, but could be confusing.
Comment #7
colorado commentedIn the old Aggregator2 there was a Description field that automatically pulled description data from the feed - this seems to be missing in this new version. In addition, this new version forces the user to manually enter text into the Body field (the last Agg2 version had no Body field).
This version IS AGGREGATING ITEMS - WOOHOO!!
Comment #8
colorado commentedAt the end of the leech process, arriving at "The page cannot be displayed" can be confusing to users. It would be better to give the user some sort of confirmation page instead.
Comment #9
jt6919 commentedI'm trying very, very hard to use this new version of aggregator2, and I'm finding the instruction very, very confusing and can't get anything to work at all. I disabled my old agg2 module, then deleted the files from the server. I uploaded the new files and fresh copy of orig agg2 code and enabled the module again. It said the tables were created (for leech, etc) fine. It listed the 2 feeds I had in the database from before fine. When I went to a post and "view souce" and agg2 page came up, but watchdog had a ton of errors:
so I proceeded to follow your instructions. I created a new "story". Then I made it a "template". I can access the template under admin->node templates. Now I go to create a new node like your instructions say. Do I create a story by doing create->story? If so, there is no field to 'enter url' as you instruct - it's just the normal create story page. If I go to node templates - my only option is to view my new template. Where is this "url" field where the magic ajax takes place? I've tried to add 2 feeds without using the leech thing (that I can't figure out) and they won't refresh at all under admin->aggregator2.
could you be more specific about what to do and where to do it? I've been using Drupal a long time on many, many sites - and I don't know what you mean.
Comment #10
ahwayakchih commentedThanks for reports :)
I'm attaching updated version.
Yes, You're right. Do You think module should delete leech from database? What about feed data? Should it be deleted too (and if it should, what about items)?
Fixed.
Fixed.
Sorry for confusing You. Leech is a bit harder to setup than agg2, and my english is poor :(
First of all leech and agg2 do not share any data for now (so it should be safe to use both at the same time. of course it can make cron run out of time ;).
You've created template ok. What You need to do is to go to admin/settings/content-types and configure node type (for example: story type) to enable leech module for that type.
After that You should be able to see url field on story edit page.
Comment #11
ahwayakchih commentedI forgot to remove leech/leech_access directory from zip file. Here's updated version.
For those curious - leech_access is just a cut&paste of old code. There will be leech_access module in future and it will allow setting up username and password for http(s) logging in. There are also some basic plans sketched to make it possible to login through regular "login" html forms... but that's just a plan. First i need to cleanup leech "core" and leech_news :).
Comment #12
Rob T commentedThanks so much for your work on leech and the prior aggregator 2.
The one thing I don't like about aggregator2 is that I can't ever get a validatable rss feed from my drupal. Is there a chance that this can be addressed?
Comment #13
colorado commentedI replaced leech.zip files with leech_1.zip files, but now the ajax form does not appear at all when I enter the Source URL.
Comment #14
ahwayakchih commentedI used http://feedvalidator.org/ and it just warns that source and dc:source are the same and shouldn't be used together. It's wrong because RSS (http://blogs.law.harvard.edu/tech/rss#ltsourcegtSubelementOfLtitemgt) says that "source" tag containx link to xmlfied version (which is closer to link to rss feed than to link to html article AFAIK), while DublinCore (http://dublincore.org/documents/library-application-profile/index.shtml#...) talks about URI (which means direct link to article AFAIK).
Anyway... i'll make both rss:source and dc:source optional and admin will be able to turn off one or both of them. So i will not hear about this "not validating" ever again ;)
Strange. What browser do You use? I made module to pass a lot more data in javascript now (not it warns about errors, and passes whole node data too), so maybe it's too much of it for browser.
Comment #15
ahwayakchih commentedUpdated version:
- shows themed messages
- output of source and dc:source tags is optional and can be changed on admin/settings/leech_news page
Thanks
Regards
ahwayakchih
Comment #16
jt6919 commentedI can't thank you enough....one thing I want to clarify that doesn't seem to be addressed at all is that this is NOT a new version of aggregator2 at all is it? It's a NEW module called "leech", right? If so, in this thread you should instruct people to remove aggregator2, and look for "Leech" in the admin section instead and use this fully instead of that. Am I right?
Also - does this version accept feed URL's with longer than 255 chars (which is essential)?? I am going to test this new version now going over what you posted yesterday, and then report back....
Comment #17
jt6919 commentedI added a new 'story' (feed) and used http://musicthing.blogspot.com/atom.xml for the feed, did 'leech data now' and got nothing. This feed works fine for me in bloglines. I had to edit the 'story' to change the category, and when I saved and came back it reset the last checked info to zero. I tried to 'leech data now' again - but got nothing. It least 1.) nothing is listed after the 'leech data now' and 2.) no items are listed on the 'view items' page.
But then when I went to admin->content, there were all kinds of new stories from when I leeched data for that feed?!? Then I go to the leech page for 'view items' (/leech_news/sources/482) and even though admin-> content has all kinds of things listed - this page has nothing? Why is this? Where do I go to see all the stories created by that template feed??
thanks for any help - I'll keep testing...
Comment #18
jt6919 commentedone thing I notice is that every time you edit one of the leech feeds, it resets the last checked info to zero. I wish the 'leech overview' showed how many stories a feed has leeched, and a link to view them all (for that feed).
I just leeched the ask Yahoo feed, and after I did 'leech data now' on the confirmation page that showed how long until last update - it didn't say it leeched anything at all (or not). When I click 'view items' this time I actually see them, unlike the last feed I did.
Comment #19
jt6919 commentedalso the length of URL in the URL: feed when you add a new leech 'story' - the form field maxlength property is hardcoded in the html to 127 characters? Why is this? It was varchar 255 in aggregator 2 - and that's not even long enough. For eBay and amazon RSS feeds you need at least 300 characters +.
How can we make the URL field larger to be able to submit all RSS feeds, especially ones with long URL's?
Comment #20
colorado commentedI'm using IE6, and also have now updated to leech_2.zip. The error information popup says this:
Line: 154
Char: 5
Error: Unknown runtime error
Code: 0
In Firefox it works fine :-)
Comment #21
jt6919 commentedSince you can't have real long URL's (yet) with leech, I just took the eBay URL I wanted to parse and set it up over at Feedburner to make it shorter. However, look how it chops up the page when formatting leeching the data:
http://smorgasbord.net/leech_news/sources/513
Comment #22
jt6919 commentedAlso, leech is not honoring options selected....like in my last post, even though I selected and saved not to promote to front page, on the next hourly update -it promoted everything it leeched to the home page (even though every single node had promote to home page unchecked).
Comment #23
hickory commentedHere's an attempt at some instructions:
1) Create a new content type: 'feed item'.
2) Create a new 'feed item' node, then in its Template tab add a title - 'feed item template' - and press save.
3) Create a new content type: 'feed'.
4) Edit the content type 'feed': enable Leech, set it to use the 'feed item template' as a template and set the other options as required.
5) Create a new 'feed' node: give it a title and the URL for the feed to be fetched.
6) Use the 'leech data now' link under your new 'feed' node and it should fetch the feed and turn each entry into a 'feed item'.
It's still not working perfectly for me (only fetching one item; creating duplicate nodes each time) but at least it's doing something :-)
Comment #24
jt6919 commentedThe instructions will help new people vising this page for sure....this module is so close to being there - it just has a few hiccups for me:
1.) When you tell it not to promote to homepage - it does it anyway...at least it (usually) honors the number of items to promote.
2.) For some feeds it parses them wrong, and makes your whole homepage nest like an ordered, indented list
3.) When you "leech data now" and the (feed) page updates - it doesn't tell you that it leeched anything at all, you have to check admin->content to see what it got.
4.) Unfortunately there is no good way to list your feed items categorically....meaning that you will create invidual nodes for each feed item, and you can list them in a term, and by viewing that term you can see all the posts that feed leeched....but you also see all other items for that term. Which limits you to categorizing feed items by themselves sometimes. What is needed is if I create a node "Ask Yahoo" using my template and leech items.....when I go to "Ask Yahoo" - it should list all feed items that were created/exist with that node/template....or at least give the option for me to turn this on/off. Otherwise, it's very, very diffucult to navigate and/or approve your feed items
Last - it's been almost 5 days since we have heard anything from AHWAYAKCHIH in any way at all for any issues or bugs in this new version (leech) or any version that I could find for aggregator2. Are you still out there AHWAYAKCHIH? I pray you are, because this module is so darn close to being what everyone needs....please respond and thank you for all your hard work to date!!! You and all your work on this are very, very appreciated!!!!
Comment #25
jt6919 commentedOne last item I forgot - when an item is leeched, and then you "edit" it yourself (adding or changing anything to the node at all), and then if it 'updates' on the next run - all of your changes will be lost in favor of the new update. There should be some way to override this, other than just making (that node) never update again (frozen) - because then you would never get any additional updated content for that node at all. But otherwise, you are relegated to just posting anything additional you want to say in a 'comment' to that node.
Comment #26
jt6919 commentedAlso, when leech updates and it promotes to the home page - it does it in a very strange way. If I have one node/template and tell it to promote 3 things to the home page, and then an hour later it updates again - since it already promoted 3 things to the home page (from that node/template) it will NOT promote anymore (from that node/template) until either those 3 items expire or you demote them manually. This is not effective at all - since I end up demoting them all myself manually every time.
Comment #27
sethcohn commentedI like the idea of this... but it's missing something critical that would make it great: let the template parse the rest of the item from the feed.
One thing missing and sorely needed: the ability to pull other info out of the feed item.
You aren't storing the item's other info, for instance, enclosure location, image url, or other info.
Since leech_news_item only get certain fields, there is no good way to get the missing ones (yet)
Ideally, YOU DON'T have to parse it all, just make it available to the template, so it can parsed then.
If you store the item array inside leech_news_item, as a new field (ie the full item as array), the template can then dig it out (since we'd have php to parse it), and work magic for displaying it (so it'll do images as images, use an mp3 player for enclosures, build links, use theming tags, whatever) Right now, it's lost, since the node only stored the description/title/etc and the leech_news_item only stores things like author and guid. Save the entire item, and this is suddenly the missing link lots of people are waiting for: complete aggregating and display of any feed (including enclosures), for any purpose.
Comment #28
sethcohn commentedI can't see to get the templating working. Pointers? Bug? Can someone post a few working examples?
Comment #29
sethcohn commentedI've added some error checking code to check on this, and I'm convinced the templating isn't modifying the node correctly... $node->body returns no different, even when I completely replace it with something new in the save templating.
Anyone else having the same problem: the templating is just ignored?
Comment #30
Christoph C. Cemper commentedHi,
this sounds like to be a completely new code base - i.e. a rewrite
before changing from agg2 to to leech I'd rather want to know about
- new features + advantags
- a migration path
can you help me with that? I feel I missed some previous information when I got here
thanks,christoph
Comment #31
hadishon commentedI'm running drupal 4.7.3 and I've installed Leech.
I can't seem to find how to add a feed url.
I followed the instructions above. I created a story node. Click on template and then 'create.' Then when I clicked on 'create contect' story again... I didn't see anywhere to put the feed url.
I looked in all the menus that leech created and can only find blacklist urls.
I read about ajax fields to put feed urls in but I don't see them... where are they?
Do I need CCK for this to work? Do I need to unistall Agg2 first? Any help would be appreciated.
Comment #32
publishing commented?? --- Does anyone have the new leech version of Agg2 up and running on their site?
Mr. Ahwayakchih, will there be a stable version of the "Leech" module available soon?
Thank u for your effort on this project, it is such a vital tool for many users.
Comment #33
m3avrck commentedThis sounds great!
However, will there be a stable DRUPAL-4-7 branch made for Aggregator2 before these changes go in? Development should be kept to HEAD.
If we can create the branch, then you commit these changes to HEAD and work on it there and not have to upload files like this.
If you need help doing this please let me know, thanks!
Comment #34
jt6919 commentedI don't think anyone will work on this again....it's been weeks since ahwayakchih posted anything related to either leech or aggregator 2 - it doesn't look like he's coming back. The other 2 developers who posted code to aggregator 2 haven't done anything for months, it would appear that both leech and aggregator 2 have been abandoned.
Comment #35
asimmonds commentedRestoring title
Comment #36
jt6919 commentedI got email from the assigned developer ("ahwayakchih") of aggregator2 and the author or leech_news and he is coming back Sept 18!! Just want a little over a week for development to begin again and the bugs to get worked out in a brand new version of leech_news...
Comment #37
colorado commentedWoo-HOO!!
Comment #38
dfletcher commentedThank you ahwayakchih!
Comment #39
hickory commentedNothing seems to be getting stored in leech_news_item, using the leech_1.zip files.
Comment #40
hickory commentedNothing seems to be getting stored in leech_news_item, using the leech_1.zip files.
Comment #41
funana commentedHey ahwayakchih,
I hope you will continue your wonderful work here! We all wait for you :))
Comment #42
jurriaanroelofs commentedis this module stable yet?
Comment #43
jt6919 commentedabsolutely not...also - I think I spoke too soon since the maintainer never came back and hasn't worked on it now for over 2 months.
Comment #44
micheleannj commentedCan someone clarfiy why a new version of Agg2 is/was needed? I'm using Aggregator2 on my development site and it seems to work fine, though I'm going to make some modifications to suit our needs... is there a reason I should abandon it and try this other thing instead?
Comment #45
jt6919 commentedif it works fine for you - great. It hasn't worked out of the box bug free for anyone I've talked to so far. It won't accept feed URL's over 256 chars, it won't read some types of feeds, and creates tons of watchdog errors - amongst other things.
Comment #46
jboeger commentedOh how I would love a 4.7 version of Aggregator 2. This is quite frustrating!!
Comment #47
nathanraft commentedWell it looks like we need a new approach.
How about we put together some $$ and get a developer that knows how to do build an aggregator working on this?
I see that some people are happy with Agg2 but given all the bugs (lack of ability to update a feed properly, old parsing settings, etc...) I just think that we should start from scratch here and try and get something built that will continue to be maintained and move towards 5.0 +.
I will pledge $300.00 to start this off. If it seems to be going further I will put more down.
Any others?
Any developers out there that feel that they have what it takes to do this properly? Let me know!
Comment #48
jboeger commentedMaybe we could turn these guys onto Drupal...
http://www.geckotribe.com/rss/
I've been using Carp and it's great!
Comment #49
jt6919 commentedI wouldn't worry about trying to get a new version of agg2 anymore.
read here:
http://groups.drupal.org/node/1489
and here:
http://groups.drupal.org/node/1485
looks like we're very close to having the regular aggregator in Drupal do what is needed (and more) in v. 5.0
Comment #50
nathanraft commentedOK.. Well that's good news! Then we will have to wait until this gets rolling a bit more....
In the meantime is there anyone out there that has a well working version of 4.7 aggregator 2that they can post here? Ideally one with feed updates working pretty well and categories for feed items showing up?
Comment #51
jt6919 commentedsure, it's the one in #11 of this post...
Comment #52
ahwayakchih commentedHi,
Here is a version with few things fixed, and first part of OPML import.
I see there is work on new module already, so i guess this work is obsolete now... maybe someone will want to try it anyway :)
Regards
ahwayakchih
Comment #53
ahwayakchih commented... and of course i forgot to attach a file ;]
Comment #54
two2the8 commentedI wouldn't say that this work is obsolete -- feedparser is pretty experimental, and I'd bet that it won't work until 5.0 is released. In the meantime (and afterward, too), all of us on 4.7 sites need something to use!
Thanks for all your work on the leech module!
Comment #55
colorado commentedYES!! I also agree it is NOT obsolete -- this leech module was getting soooo close, it would provide so much usefulness to so many if it were finished. Ahwayakchih pleeeeeez finish it if you can...
Comment #56
alex_b commentedleech isn t obsolate, it's cool! let's not give up!
Comment #57
alex_b commentedfound a bug in duplicate handling line 1103 in leech_news.module of leech_3.zip in leech_news_parse_feed()
should not be
$result = db_result(db_query("SELECT COUNT(nid) FROM {leech_news_item} WHERE link = '%s'", $temp->link));
but
$result = db_result(db_query("SELECT COUNT(nid) FROM {leech_news_item} WHERE link = '%s'", $temp));
in order to find duplicates.
i won't submit a patch here, i am waiting for leech being submitted as proper module.
cheers,
alex
Comment #58
alex_b commentedsorry, line nr in #57 probably wrong -alex
Comment #59
alex_b commentedhi,
there are a couple of feeds that don't get leeched. i can create them, but leech wouldn't pluck their items.
http://news.google.com/news?hl=en&ned=us&q=drupal&ie=UTF-8&
http://news.google.com/news?hl=en&output=rss
http://rss.csmonitor.com/feeds/top (feedburner feed)
this is one that works:
http://www.saveroe.com/blog/feed
i could isolate the problem down to leech_news_parse_feed(&$data) in leech_news.module, where the program branches into the section for RSS feeds
the problem here is, that &$data['rss'][0]['channel'][0]['item'] does not contain any data. if you have $data print out (print_r to screen) you can see why: in the case of the first three feed urls i posted, the array looks misarranged. somehow sibling data of [title] fields become children data.
this bug most probably origins from xml_preg_parser_parse_xml_data() in xml_preg_parser.module. i couldn't nail it though - the parsing recursion is quite complex for my little brain. would be cool, if the creator of that mighty function could have quick look at it. i d appreciate it a lot.
thanks for the awesome work so far,
alex
the modules i work with are from leech_3.zip
Comment #60
alex_b commentedhi,
leech has its own project page now:
http://drupal.org/project/leech
this is the latest functional version of leech (*not* leech_3.zip, due to its parsing issues).
i would like to encourage you to post your bug reports and comments over at the new page.
leech is also in the CVS repository now (finally!):
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/leech/
soon there will be a "Download latest release" link on the project page, in the meantime pull the latest copy from the repository (http://drupal.org/node/321)
cheers,
alex