Working XPath and mapping examples [#838172]

This is a thread for people to post their working XPath and mapping settings. Hopefully this will help those who are struggling to get to grips with XPath, particularly non-coders like me. We can create a handbook page once we have enough examples.

Comments

Comment #1

podox commented 9 July 2010 at 09:58

*UPDATED 9 JULY 2010*

These examples were provided by our institution's resident XPath/XML expert (not me). Hopefully I have the details correct. Note that I'm using the feeds_xmlparser code from http://github.com/chrisirhc/feeds_xmlparser which allows for namespaces.

Example Feed: http://rss.oucs.ox.ac.uk/engfac/notshakespeare-audio/rss20.xml

XPath: /rss/channel/item or //item

/rss/channel/item will only look for items that nested within /rss/channel e.g.:

<rss>
    <channel> 
        <item>blah</item>
        <item>more blah</item>
        <item>another blah</item>
    </channel>
<rss>

//item will look for items anywhere in the XML, regardless of nesting. For this particular RSS feed, either XPath setting is suitable. With my settings, 6 nodes would be created from this feed, one for each item.

Mapping Sources:

1. title - maps the item title e.g. "The Spanish Tragedy: Thomas Kyd"
2. link - maps the URL e.g. http://media.podcasts.ox.ac.uk/engfac/fhs/01_spanish_tragedy.mp3
3. category - maps *all* instances of category. So if I mapped 'category' to a Taxonomy field, I would get the following comma-separated values for the first node:

english, language, jacobean, elizabethan, theatre, renaissance, Q323, 1, ukoer, 106104

In my feed, Q323, 1, ukoer, 106104 are auto-generated from certain category choices in the feed source, and I wish to separate them from the user-entered freetext keywords english, language etc. So instead I use:

4. category[not(@domain)] - this excludes any category element that has 'domain' in it. Fortunately that is all the ones I don't want and so this gives me:

english, language, jacobean, elizabethan, theatre, renaissance

Now, I may want to map Q323, 1, ukoer, 106104 to individual taxonomy or CCK fields. Here is how I do that:

5. category[@domain='http://rss.oucs.ox.ac.uk/transcripts_available'] - maps 1
6. category[@domain='http://rss.oucs.ox.ac.uk/jacs_codes'] - maps Q323
7. category[@domain='http://www.jisc.ac.uk/oer/'] - maps ukoer
8. category[@domain='http://www.itunesu.com/feed'] - maps 106104

Perhaps I might want to combine some of these into a single field, for example ukoer and Q323, as two comma-separated values in a single taxonomy field.

9. category[@domain='http://www.jisc.ac.uk/oer/' or @domain='http://rss.oucs.ox.ac.uk/jacs_codes']

The namespaces were initially a little trickier to figure out.

10. *[local-name()='author'] - maps itunes:author e.g. Emma Smith. Strictly, this is mapping all instances of ***:author which would cause problems if I had an additional element called, say, podcast:author. Fortunately I don't, but let's say I did:

11. *[starts-with(name(),'itunes:') and local-name()='author'] - maps the element that begins with the namespace 'itunes:' and has the local name or 'author'. But we can simplify this:

12. *[name()='itunes:author'] - maps itunes:author - this is the one we would use

I'll continue to post more detailed examples from this feed once I've had a chance to test them all. Please post your own examples below including a link to your feed.

Comment #2

fereira commented 16 July 2010 at 10:55

Can this thread also be used for posting some XML that needs parsing so that others can help with coming up with the right XPath to do the mapping. I"m working on something that uses multiple namespaces, has nested elements, and the unique ID is in an attribute. Using the feeds_xpathparser I'm able to grab the title field (dc:title) and although it detects all of the unique elements (it tells me that it created 8 nodes) it's updating the same node such that only the last title shows up. Using the feeds_xmlparser module, it gets the same " mysql_real_escape_string() expects parameter 1" mentioned here http://drupal.org/node/800430. In any case, here's the sample XML I'm working with:

http://mayfly.mannlib.cornell.edu/agrisdata/agris.xml

Comment #3

mokko commented 17 July 2010 at 09:21

This reminds of a problem I had earlier when in source xml there were several title elements per item which i tried to map to Drupal's (regular node) title. However, there can be only one Drupal title per node. Try using a cck field for title which can have multiple values instead. At least in xmlparser the error message should disappear. I woudn't be surprised if this also applies to the xpath module, but I still haven't tried that one out.

PS: Since you already ask, I guess it would be better to open new threads for different things

Comment #4

valeriap commented 20 May 2011 at 13:15

Hi John,

I'm having a similar problem with the same XML.
I managed to get everything except the following attributes:

(Context: //ags:resources/ags:resource)
@ags:ARN
dc:title/@xml:lang

I also tried:
attribute::ags:ARN
dc:title[not(*)][1]/attribute::xml:lang
but it is not working.

So, if I map dc:title[not(*)][1] to the GUID (which should be mapped to ags:ARN), I can import records smoothly, otherwise Feeds only imports the first record.

It seems an issue with getting the attributes...

Valeria

Working XPath and mapping examples

Comments

Comment #1

Comment #2

Comment #3

Comment #4

News items

Our community

Documentation

Drupal code base

Governance of community