When writing text fields to the ARC triplestore, no language tags are attached.

See http://groups.drupal.org/node/170079 for an example use case.

One issue I can see is that for any particular text field, a language could be set for that field (if using Content translation module), the node/entity level (using node translation) or not at all (for monolingual sites). As a result we should look in more than one place for the correct language and use sensible defaults.

I can also imagine that we might not want to attach language tags for 'Language neutral' nodes, for example.

CommentFileSizeAuthor
#6 rdfx_get_rdf_model-1259494-6.patch844 bytesdjevans

Comments

scor’s picture

What is the current set up of your site? do you have a node for each language, or are you using a global non-English language on your site (e.g. all nodes are the same language)?

The language should be added to the RDF graph which is built in memory based on the data provided by the entity API. Once the language tag is reflected in this RDF graph, it should be carried through in the RDF serializations and SPARQL endpoint.

alejandrodf’s picture

Hi Scor,

The sites is in English and Spanish. I have a node for each language.

Should I add the languaje when I created/updated the node? I though I have to add it when function _rdfui_mapping_save it's called, but I don't know the values of the parameters yet. I need to debug it first!

scor’s picture

I did some quick debugging. From what I can tell using the core locale and content translation modules, language is only supported for nodes. Even though fields support multiple languages, their value is in the LANGUAGE_NONE (und) array key. the language of each node is $node->language, so use that to detect the language of a node.

You have to look into rdfx_get_rdf_model() and the rdfx_add_*() helper functions, in particular rdfx_add_literal(). With ARC, you can use 'lang' => 'en', in the $index to add a language tag. you might have to add an extra parameter to rdfx_add_literal() for the language tag.

alejandrodf’s picture

It works!!!

I'll copy the first version of the code. Just two lines!! However I should add some conditionals to avoid empty lang tags in single language site. I'll create one to test and correct the code this weekend.

Thanks scor for your help :)

function rdfx_add_literal(&$index, $uri, $property, EntityMetadataWrapper $wrapper, $name) {
$predicates = rdfx_get_predicates($wrapper, $name);
$object_value = $property->value();
$lang=$wrapper->value()->language;

// Extracts datatype and callback from the RDF mapping.
$datatype = '';
if ($wrapper instanceof EntityDrupalWrapper) {
$entity = $wrapper->value();
if (!empty($entity->rdf_mapping[$name]['datatype'])) {
$datatype = $entity->rdf_mapping[$name]['datatype'];
}
if (!empty($entity->rdf_mapping[$name]['callback']) && function_exists($entity->rdf_mapping[$name]['callback'])) {
$object_value = $entity->rdf_mapping[$name]['callback']($object_value);
}
}

foreach ($predicates as $predicate) {
$index[$uri][$predicate][] = array(
'value' => $object_value,
'type' => 'literal',
'datatype' => $datatype,
'lang' => $lang,
);
}
}

Anonymous’s picture

I would just like to comment on how rad it is that you worked with scor to figure this out, alejandrodf... nice work! And a big thanks to djevans for pointing the right direction.

If you post this as a patch, then you'll get a nifty credit to your name on your drupal.org profile :) You can find out more about patching on Jacine's blog or in my screencast (though my screencast is about patching core, so I'd recommend Jacine's first).

djevans’s picture

StatusFileSize
new844 bytes

alejandrodf, I'm sure you know this already, but as per the code in #4 'language neutral' nodes will have a $lang value of LANGUAGE_NONE, or 'und'. This isn't a valid language tag so (AFAIK) you should replace it with a blank string.

I have a separate patch for rdfx.module which accounts for entity (field-level) translation, where a node may have the certain fields translated into (say) Spanish.
I've altered the signature of rdfx_get_rdf_model() so that you can pass an $info array with a 'langcode' => 'es' key/value pair.

I think the best place to call this function would be inside a function implementing either 'hook_rdf_model_alter' or 'hook_entity_translation_save'. You'd then need to make alterations to the returned model before writing it to the triplestore (to remove non-translatable fields).

This patch assumes that 1237078_1_rdf_get_rdf_model.patch has been applied.

scor’s picture

Title: Language tags aren't attached to string literals » Support for language tags in string literals
Category: bug » feature
Status: Active » Needs work

However I should add some conditionals to avoid empty lang tags in single language site.

@alejandrodf Single language site could still use the language tag I think. looking forward to your patch! thanks :)

@djevans your patch looks interesting and in line with the entity API. Questions:

I have a separate patch for rdfx.module which accounts for entity (field-level) translation, where a node may have the certain fields translated into (say) Spanish.

I'm curious to know if this use case is possible with core alone: to have different langcode for fields in the same node: for example an English and a Spanish version of the body field (within the same node). afaik, content translation will create a separate node for each language. What are the use cases / modules leading to multiple languages in the same entity?

I think the best place to call this function would be inside a function implementing either 'hook_rdf_model_alter'

The thing is calling such alter hook makes sense to do in rdfx_get_rdf_model() (see #1240778: drupal_alter() support in rdfx_get_rdf_model() - waiting for review), so you would end up with an infinite loop in that case.

You'd then need to make alterations to the returned model before writing it to the triplestore (to remove non-translatable fields).

I'm not following. could you detail this use case a bit more please? (you could alter the RDF graph before it's saved to the triple store by implementing hook_rdf_model_alter(), see link to issue above.

This patch assumes that 1237078_1_rdf_get_rdf_model.patch has been applied.

Committed that patch, so make sure to git pull before rolling your patches, folks :)

scor’s picture

More question which we haven't addressed.

1. Should we allow sites which only use one single site wide language to disable language tags in RDF?

2. How to decide whether a literal field value should inherit the node language or not?
- We could check the field language and skip the language tag if it's LANGUAGE_NONE, but afaik with nodes using the content translation module, all fields remain LANGUAGE_NONE even if the node has its own language, so I don't think we can use that.
- Does the use case where a given field instance can have multiple languages really exist in the wild in contrib?

Typical example of a literal which should not get a language tag is date and numbers. In fact RDF 1.0 does not allow a typed literal (datatyped) to have a language tag so we can ignore langcode if a datatype is defined in the RDF mapping. But RDF 1.1 might allow both so it's worth discussing how to define such language tag on/off setting in the RDF API maybe.

There are also some cases where a text string may not get a language tag like the username (like it is the case in Drupal 7 core RDFa).

alejandrodf’s picture

Edit by alejandrodf. Wrong comment.

gábor hojtsy’s picture

@scor: Not sure I understand all questions, just running in :)

(1) I think there is value in tagging data with their language even if its a single language on the site.
(2) Second part: the contrib entity_translation module provides a UI for field translation which stores translated values under the field
First part: even entity_translation uses LANGUAGE_NONE for non-translatable fields, since the node (entity) should store the "base" language information, that is the default language regarding that node/entity

Hope this helps. Let me know if I misunderstood the questions :)

Anonymous’s picture

Should we allow sites which only use one single site wide language to disable language tags in RDF?

I'm guessing that you brought this up because SPARQL endpoints (if they are implemented to spec) do not return results if the endpoint does specify the language and the query does not specify the language.

I haven't looked at how the code for this works, but maybe it is something that the caller could specify... for example, that SPARQL endpoint module could specify that it doesn't want language tagged data (or could simply not use the language tag). This way, the user could specify in the endpoint UI whether they want content to be language tagged.

scor’s picture

Thanks Gábor for stopping by, your answers re entity_translation module are helpful. it'll give us a contrib module to develop + test this patch.

@Lin, yes, that's the reason I brought it up. I fear though that leaving it up to the Drupal administrator to decide is not something they are likely to know in advance: this is really depends on the consumer and the SPARQL queries which will be ran against a Drupal endpoint. It might also be a matter of personal preference, some people might publish their English-only dataset without language tags, others might stick @en everywhere... I don't think there is a best practice, I'll ask on semantic-web@w3.org.

We could however export both language tag-less and language tagged literal for commodity when running SPARQL queries, as an interim solution at least.

scor’s picture

Issue tags: +RDF, +sprint

tagging

djevans’s picture

Assigned: Unassigned » djevans
no2e’s picture

I don't understand the case of this issue, but I want to chime in to clarify something about language tags; maybe it is of help:

as per the code in #4 'language neutral' nodes will have a $lang value of LANGUAGE_NONE, or 'und'. This isn't a valid language tag so (AFAIK) you should replace it with a blank string.

und is a valid language tag which stands for "Undetermined".

Typical example of a literal which should not get a language tag is date and numbers.

There is the language tag zxx which stands for "No linguistic content; Not applicable".

scor’s picture

Thanks @no2e for chiming in. In both of these quotes, people were referring to the language tag of the plain literal in RDF. The issue is about how to avoid outputing 'und' or 'zxx' in RDF, where no language tag should be present in the case of an undetermined language in Drupal.