can you point me in the right direction for formatting the regex filter? I am trying to test this module to aggregate 2 feeds, a simple news feed from my tiny tiny rss installation (the "published feeds" feed, if you are familiar), and my last.fm "loved tracks" feed, and pump these into my identi.ca account.

the news feed, I just want to post the title of the news article, and a link to the article.

the last.fm feed, I want to grab the feed item and spit it out with a bit of extra text, such as "I just loved at last.fm"

I just don't understand the regex filter. Any help is appreciated.

Also, THANK YOU FOR THIS MODULE! It is exactly what I have been waiting for, if I knew how to code I would have wrote it myself! I have been using rssdent to pump feeds to identi.ca for some time, but it's just too buggy, and I do not like using services like twitterfeed for security reasons.

Thanks again

Comments

cleaver’s picture

Thanks for using this module... here's a much needed sample of a regex filter. I've been focusing on getting the module code ready for release, but at this point the documentation is nil.

It matches the feed from Environment Canada for weather.

An item would be formatted as pseudo RSS feed as follows:

<item>
  <title>Current Conditions: Mostly Cloudy, 17.8°C</title>
  <link>http://www.weatheroffice.gc.ca/city/pages/on-143_metric_e.html</link>
  <description><b>Observed at:</b> Toronto Pearson Int'l Airport 4:00 PM EDT Sunday 19 September 2010 <br>
<b>Condition:</b> Mostly Cloudy <br>
<b>Temperature:</b> 17.8°C <br>
<b>Pressure / Tendency:</b> 102.1 kPa falling<br>

<b>Visibility:</b> 24.1 km<br>
<b>Humidity:</b> 52 %<br>
<b>Dewpoint:</b> 7.9°C <br>
<b>Wind:</b> NNE 13 km/h<br>
<b>Air Quality Health Index:</b> 2 </description>
</item>

My Regex match to pull out the parts is like this:

/<item><title>Current (Conditions.+?)<\/title>.*?Humidity:<\/b>.*?(\d+).*?Wind:<\/b> (.+?)<.*/

The Regex replace to format the display is like so:

$1 Hum:$2% Wind:$3

The resulting output of the replacement is:

Conditions: Mostly Cloudy, 17.8°C Hum:52% Wind:NNE 13 km/h

Things to watch out for:

- I'm using the PHP PCRE functions, so it will behave with all the quirks of preg_replace.

- This means the Regex match statement has to match the entire RSS item (that's why I have the ".*" at the end to match everything after wind up to and including ).

- You must have the slash (or other begin / end pair) at the beginning and end of the Regex match. I let you have the choice, just in case you are matching a lot of slashes.

NOTE: I'm assuming you are familiar with regular expressions, if you need more info in that direction please let me know.

Also, if you have any ideas for easier matching of feeds, please share your ideas. I'm hoping to improve this module over time.

alienseer23’s picture

Yeah, so I am totally lost as far as regex in general is concerned :) Any good direction is much appreciated!

Perhaps there could be an extra option in the admin page where the "type of feed" drop down is currently only "TweeRSS regex", to just pull the most basic of info, like the title of an item and (or as) a link, and calling it something like "TweeRSS Simple"?

Or, could there be a way to enter in the feed info, and before saving it add in a "test feed" button which would pull down all of the potential items (ie: Conditions, Humidity, Wind) and display them in a list, offering a check box type of selection for republication to twitter, making the regex language an under the hood ordeal?

All I am looking for is a simple way to grab the title of the feed item, and a link to it to shoot out to twitter, essentially replacing the need for twitterfeed! It would be great to be able to add in pre and/or post text to the tweet as well. If you can show this using regex that would be great as well!

Thanks!

cleaver’s picture

First thing, you'll need to be sure the feed is working properly: Just look at example.com/aggregator/sources -- you should see your feed data. If you just set it up, you will need to run cron. Cron should be running on schedule. Whenever cron runs, your twitter feed will be updated. Currently, there is no point in running it more often than 15 minutes, since your feeds won't update any quicker than that.

Next, here's a Regex match expression that will match just the title. The subexpression between the brackets is the text returned. Everything else including the <title> tags will be ignored.

/.+?<title>(.+?)<\/title>.*/

Now, to send this to Twitter, the Regex replace should be:

$1

IE. $1 prints what is matched between the first pair of brackets. $2 would match the second pair and so on.

To have a prefix and suffix, just put that before and after the $1.

Example:

My title: $1, is great

Now that I look at the examples, I'd have to say that regular expressions are a bit confusing, but very powerful. I'll see if I can create a simple "title match" submodule.

Cheers,

CB

alienseer23’s picture

THANK YOU!

Question: In your example, will the output $1 link to the original article item, a title AS a link? If not, is there a way to make the title as a link, or is it only possible to make the output a title AND a link?

Using html where $2 is the link?

My title: <a href="$2">$1</a> is great!

cleaver’s picture

Twitter won't accept the anchor tag, so the best thing would be to put the link at the end of the post. The Twitter module can use url shortening service tinyurl and I think it can give you even more choices of you configure the Shorten module (http://drupal.org/project/shorten) as well.

The Regex match should be:

/.+?<title>(.+?)<\/title>.*?<link>(.+?)<\/link>.*/

Then your Regex replace would be:

My title: $1 my link: $2

Of course, it will cut it off at 140 characters, so if the title is too long, the link could be truncated.

This makes me think that I could create a simple version of the filter that used tokens, eg. !title, !link

alienseer23’s picture

ok, i set up my 2 feeds this way, and got this on a cron run

Fatal error: Call to undefined function twitter_account_load() in /path/to/drupal/sites/default/modules/tweetrss/tweetrss.module on line 71

cleaver’s picture

Just to help me keep track of everything, could you create a new issue in the queue with the message error message above. Since the regex problem is sorted out, I should close this issue.

Also, to help me diagnose the problem, please answer these questions:

1. Which version of Twitter module are you using?

2. Is your Twitter account configured with Oauth? Which version?

3. Have you got all the Twitter submodules enabled?

4. Can you post to Twitter using the twitter_post functionality (eg. When creating a new page or blog post.)

Thanks.

alienseer23’s picture

Status: Active » Closed (fixed)