Create a CCK date field mapper

alex_b - January 30, 2008 - 19:03
Project:Feed Element Mapper
Version:5.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:ekes
Status:patch (code needs work)
Description

Create a CCK date field mapper that maps any tag that contains a date to a CCK date module style field. Together with an iCal parser this would make FeedAPI the solution for parsing iCal feeds.

http://drupal.org/node/214688

#1

amanire - February 3, 2008 - 16:42

I need this feature. I'm not sure if I have the time this week, but I would like to give it a try. I will contact you by email for advice.

#2

alex_b - February 3, 2008 - 18:12

Sounds great :)

#3

ekes - February 15, 2008 - 17:29
Assigned to:Anonymous» ekes
Status:active» patch (code needs work)

Here's my starter at it.

As it stands the code works with the timestamps created by the parsers [1] and with textual dates in feeds [2].

I'll keep working on it soon -
* ensuring the text parsing (see below)
* looking into timezones
* multiple dates for a single field (this would also be the case with iCal start and end times)
* other possible arrays
- that is unless someone else wants to ;-)

[1] The parsers are inconsistent about what information they create the timestamp from:

parser_simplepie - uses get_date() - which returns in order of preference:
ATOM_10: published - updated
ATOM_03: issued - created - modified
*: pubDate - (DC11):date - (DC10):date,

parser_common_syndication - returns in order of preference
ATOM: published
RSS1.0/RDF: (DC):date - pubDate
RSS0.91/2.0: pubDate

[2] This is not rigorously tested (yet). List follows of RSS and ATOM date formats that are in the documentation (not counting other extensions and badly formed feeds). iCal also needs to be added:

== RSS and Atom Dates ==

(in RSS item / Atom entry - not at channel level)

=== RSS 0.90 ===

Has no date format

=== RSS 0.91+ ===

* pubDate (RFC 822) - Publication date

=== RSS 1.0 + ===

No date format in specification http://purl.org/rss/1.0/spec

Official Module Dublin Core http://purl.org/rss/1.0/modules/dc/ has
* dc:date (W3CDTF) - Publication date

Unofficial Modules:
* Aggregation http://web.resource.org/rss/1.0/modules/aggregation/
* ag:timestamp (ISO 8601)
* Audio http://web.resource.org/rss/1.0/modules/audio/
* audio:year (YYYY)
* Context http://nurture.nature.com/rss/modules/mod_context.html
* various not 100% clear from above
* Qualified Dublin Core http://web.resource.org/rss/1.0/modules/dcterms/
* dcterms:* (see below)
* e-mail http://web.resource.org/rss/1.0/modules/email/
* email:date
* event http://web.resource.org/rss/1.0/modules/event/
* ev:startdate (W3CDTF) Start Date (timezone can be implied by ev:location!)
* ev:enddate (W3CDTF) End date, optional
* prism http://nurture.nature.com/rss/modules/mod_prism.html
* prism:coverDate (W3CDTF)
* prism:coverDisplayDate
* prism:creationDate
* prism:embargoDate
* prism:expirationDate
* prism:modificationDate
* prism:publicationDate
* prism:receptionDate
* RSS 091 http://web.resource.org/rss/1.0/modules/rss091/
* rss091:pubDate (RFC 822)
* rss091:lastBuildDate
* Service Status http://web.resource.org/rss/1.0/modules/servicestatus/
* ss:lastChecked (W3CDTF)
* ss:lastSeen
* Streaming http://web.resource.org/rss/1.0/modules/streaming/
* str:live.scheduledStartTime (W3CDTF)
* str:live.scheduledEndTime (W3CDTF)

=== RSS 2.0 ===

As RSS 0.91

Could be extended, probable ns dc: and dcterms: (below)

=== Atom 1.0 ===

* published (RFC3339) http://atompub.org/rfc4287.html#element.published time early in lifecycle of entry
* updated (RFC3339) http://atompub.org/rfc4287.html#element.updated most recent significant modification

=== Dublin Core ===

As in RSS 1.0 (possible in 2.0 and atom)

If added with name spaces as:
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"

Recommended(!) to be using W3CDTF

* dc:date

* dcterms:created
* dcterms:valid (could be a range)
* dcterms:available
* dcterms:issued
* dcterms:modified
* dcterms:dateAccepted
* dcterms:dateCopyrighted
* dcterms:dateSubmitted

=== SSE ===

xmlns:sx="http://www.microsoft.com/schemas/rss/sse"
For RSS http://msdn2.microsoft.com/en-us/xml/bb190613.aspx

Elements:
• sx:history
• sx:update
• sx:conflict
Can all have _attribute_ when. Could be nested.

I guess these might actually end up in an array return rather than as text?

=== Formats ===

RFC822 (RFC1123 RFC2822)
W3CDTF http://www.w3.org/TR/NOTE-datetime
ISO 8601 http://en.wikipedia.org/wiki/ISO_8601
RFC3339 http://tools.ietf.org/html/rfc3339 ISO 8601 subset

AttachmentSize
feedapi_mapper_date.inc_.txt1.71 KB

#4

MrKatz - February 19, 2008 - 17:45

Works for me!

Thanx for that feature. Saves me a lot of time!

Alex

#5

alex_b - February 19, 2008 - 18:37

This looks great. I just had a look at the code. What's missing is a check for the type of the field to be mapped - see link mapper feedapi_mapper_link
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/feedapi_map...

and usage of feedapi_mapper_content_is_cck_type().

Will test as well. Need to get some time first.

Alex

#6

gustav - February 20, 2008 - 08:13

This is great.

Have you started on iCal yet (see http://drupal.org/node/214688)? If not, perhaps I can find some time to take a look at it over the next weekend.

#7

ekes - February 20, 2008 - 12:39

Right, I'd not seen the feedapi_mapper_content_is_cck_type() before (should cvs up more often :-/ )
The date mapper does do a check for type though:

  $field = content_fields($field_name);
  if (! ($field['type'] == 'date' || $field['type'] == 'datestamp')) {
    // if not a date just return
    return;
  }

It is actually useful doing it this way because then the $field is available to the rest of the mapper (see further down the mapper code).

The code in the feedapi_mapper_content seems to cycle through all the possible fields for the content_type, not just the one for the field (which is already known). Could it not become simpler and return array or false for the field at hand like above? So it would become something like:

function feedapi_mapper_content_cck_field($field_name, $field_types) {
  $field = content_fields($field_name);
  if (in_array($field['type'], $field_types)) {
    return $field;
  }
  else {
    return FALSE;
  }
}

Or have I missed something about content_type?

#8

amanire - February 22, 2008 - 05:51

Great job, ekes! I'm mapping ISO dates with the greatest of ease now and I feel a bit guilty. The only issue that I've noticed so far is that there is no option to map to a Date To field. Not sure if that falls within the scope of date_feedapi_mapper(), though.

#9

alex_b - February 28, 2008 - 21:18

@ekes - thanks for the content_fields() tip - I incorporated your suggestion: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/feedapi_map...

Attention: this change alters the paramter list of feedapi_mapper_content_cck_field().

I will take a look at the actual patch now.

#10

alex_b - February 28, 2008 - 22:45
Status:patch (code needs work)» patch (code needs review)

I did a basic test (mapped unix timestamps to date) and committed:

http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/feedapi_map...

Thank you for this really important contribution.

I read the code and I don't quite understand the calls to date_text_make_dbdate() and date_set_date() as in the end we wind up setting only the $node->$field component anyway.

I did an alternative approach - which is actually a side product of a bug I was chasing, but is simpler. Please have a look.

AttachmentSize
feedapi_mapper_date.patch2.86 KB

#11

ekes - March 8, 2008 - 15:58

The reason for the convoluted dbdate thing was the // TODO Timezone bit. dbdate as I understand it does timezone handling.

There certainly is Timezone information in some iCal feeds (sadly in a standard and a non-standard way even from Apple themselves :( I'm guessing if people are interested in attaching timezones to other feeds to correct the times displayed/stored that it will be needed then too.

I have say I've not worked through the implications timezones on pulling feeds yet. I'm just going to start trying to pull in some iCal feeds and I'll see.

#12

ekes - March 10, 2008 - 14:48

Version 2 of the Date API changes the functions for creating and accessing dates. Looks like the integration of Timezone handling is easier. But, it does seem, at the moment however, to drop some strtotime in the date parsing which certainly would be needed for some of the formats above.

I'm not sure if this want mapper wants to work with Version 2 only or both?

#13

alex_b - March 12, 2008 - 00:16

The reason for the convoluted dbdate thing was the // TODO Timezone bit. dbdate as I understand it does timezone handling.

The principal problem with time zone handling is that when simple feed hands us the time, it is already a unix timestamp with the time zone lost. I don't know whether it converts the time to the local time, but I guess it does not.

As for the drupal side of storing the time zone, I didn't look, but I expect that that's a property of the date cck field - is this part of the actual date string? something like "2003-10-12 12:20 -0500" ? I am just guessing. We should find out.

The question now is how the iCal parser is going to hand us time zones - but I know that you have some power over that :)

Alex

#14

alex_b - March 12, 2008 - 00:18

I'm not sure if this want mapper wants to work with Version 2 only or both?

Is version 1 going to be supported in the future or is version 2 going to replace version 1 ? If v2 replaces v1, let's go with v2.

#15

ekes - March 12, 2008 - 12:26

Timezones:

Many of the dates with RSS/Atom feeds are pretty timezone unaware. For the moment I'm just assuming passing them to the date_api and it will assume they are local time without additional info. I could see a point in the future that it would be desirable for users to enter the Timezone associated with each feed - I'm planning for it, not planning to do it at the moment.

With iCal the Timezone can be fed with the *TIME element or it seems can be set using an (non-documented as far as I can tell, but very commonly used) X-WR-TIMEZONE for the feed. This should be passed to the date_api

The Date fields themselves can be timezone correcting or not. It depends how they are set up, and as far as I can see not the concern here as long as the information is passed to date_api.

I've already written - in testing code - a sub_field for from/single date mapper, a sub_field to date. These accept text dates, but they should accept iCal arrays too which include the timezone separately.

Date api 2:

It seems that v2 is to replace v1 Even though Calendar for example is still being worked on to use v2 I'm more tempted now to move onto just working for v2

#16

alex_b - March 12, 2008 - 15:03

I just wrote a patch that exposes timezone information for parser_simplepie:

http://drupal.org/node/233285#comment-766610

The suggested format is (example) 2005-12-23T12:10-0500 - does this look good? Let's use the same format for iCal parser. I had a quick look on how to feed date module a time zone - what's the right way to do this?

Alex

#17

alex_b - March 17, 2008 - 22:42

This patch adds date 2 and timezones support.

Missing: to field support.

This patch assumes an ISO date with timezone delivered as described in comment #16.

Please check time zone parsing and use of date api - there might be more efficient ways or more date API compliant ways to create date field information.

The code extract below shows the core part of changes to feedapi_mapper_date.inc. It is what's being executed on feed aggregation ($op = 'map'), which is currently being executed on nodeapi('prepare').

        // Convert UNIX timestamps to ISO.
        if (is_numeric($feed_element)) {
          $feed_element = date(DATE_FORMAT_ISO, $feed_element);
        }
        else {
          // Time is not numeric, try to detect time zone.
          $parsed = date_parse($feed_element);
          if (isset($parsed['zone'])) {
            // This is strange: date_parse() returns an inverse offset in minutes.
            $zone_offset = $parsed['zone'] * -60;
            $zone_name = _feedapi_mapper_date_get_zone_name($zone_offset);
          }
        }
        $date = date_make_date($feed_element, $zone_name);
        if ($iso = date_format($date, DATE_FORMAT_ISO)) {
          $items = $node->$field_name;
          $items[0]['value'] = $iso;
          // @todo: to date
          // $items[0]['value2'] = '2007-04-02T06:03:02';
          $items[0]['timezone'] = $zone_name;
          $items[0]['offset'] = $zone_offset;
          // $items[0]['offset2'] = -14400;
          $node->$field_name = $items;
        }

AttachmentSize
feedapi_mapper_date_2_timezones.patch3.1 KB

#18

alex_b - March 17, 2008 - 22:41

Cleaned up debugging code.

AttachmentSize
feedapi_mapper_date_2_timezones.patch3.13 KB

#19

ekes - March 19, 2008 - 13:03

Taking a first look at integrating feedapi_mapper_date_2_timezones.patch with the from and to date stuff and I notice it is using the PHP > 5.1.2 date_parse which also doesn't have an equivalent in the date_php4 includes.

I'll experiment with it but I'm pretty sure if the formatting is one recognised the DateAPI date_make_date is good. It's just for some of those other wierdness that a strtotime (or equivalent #233431: Has strtotime functionality gone from api date object creation?) will be needed.

#20

alex_b - March 19, 2008 - 15:27
Status:patch (code needs review)» patch (code needs work)

timezone support for > PHP5.1.2 seems fine to me. Code shouldn't break on PHP4 though.

What do you think about the exotic format conversion next to the comment 'This is strange'?

#21

ekes - March 20, 2008 - 16:03

It seems the way date2 has gone is to using the PHP 5 datetime functions and objects, and recreating them for PHP 4. Seems to me that date_create does everything needed with the companion the date_*_get.

This isn't fully tested as I'm having some difficulties retaining the timezone on with cvs version of date2 for hand submitted forms.

I've added 'to' and 'from' date support, and handling for the full iCal DTSTART and DTEND arrays (which can have timezones in them).

AttachmentSize
feedapi_mapper_date.inc_.gz1.01 KB

#22

alex_b - March 20, 2008 - 17:57

I'm having some difficulties retaining the timezone on with cvs version of date2 for hand submitted forms

is this related? http://drupal.org/node/235424

#23

DanielTheViking - April 29, 2008 - 14:25

Subscribing.

#24

activelyOUT - May 3, 2008 - 21:36

subscribing

#25

david_g - July 10, 2008 - 15:21

Hello,

I've been assigned by Alex to look into this addition. There seem to be some remaining issues with date 2 that cause some unreliability. However, I believe these changes mean that the date mapper is storing the dates properly, even if the date module sometimes garbles them on display and edit. It seems that the date module expects dates to be stored by UTC in all cases, with the timezone and offset optional. The new version should handle that adequately. One thing to pay attention to is that a bug in date_api_ical.inc's date_ical_date function basically sets the date to null half the time. The line:
$date = date_timezone_set($date, timezone_open($to_tz));
should be:
date_timezone_set($date, timezone_open($to_tz));
This is because date_timezone_set returns null on success and true on false...not a date object in either case. However, using this change to date_api_ical.inc, try out the new feedapi_mapper_date.inc file.

David

AttachmentSize
feedapi_mapper_date.inc_.tar_.gz10 KB

#26

geodaniel - July 7, 2008 - 13:38

subscribing

#27

KarenS - July 31, 2008 - 21:34

Subscribing, I haven't had time to try this out, but I think the bug in #25 is fixed in the latest rc. If there are any other bugs related to getting this working, be sure to post them in the Date issue queue with a link back to this issue.

 
 

Drupal is a registered trademark of Dries Buytaert.