When writing text fields to the ARC triplestore, no language tags are attached.
See http://groups.drupal.org/node/170079 for an example use case.
One issue I can see is that for any particular text field, a language could be set for that field (if using Content translation module), the node/entity level (using node translation) or not at all (for monolingual sites). As a result we should look in more than one place for the correct language and use sensible defaults.
I can also imagine that we might not want to attach language tags for 'Language neutral' nodes, for example.
Comments
Comment #1
scor commentedWhat is the current set up of your site? do you have a node for each language, or are you using a global non-English language on your site (e.g. all nodes are the same language)?
The language should be added to the RDF graph which is built in memory based on the data provided by the entity API. Once the language tag is reflected in this RDF graph, it should be carried through in the RDF serializations and SPARQL endpoint.
Comment #2
alejandrodf commentedHi Scor,
The sites is in English and Spanish. I have a node for each language.
Should I add the languaje when I created/updated the node? I though I have to add it when function _rdfui_mapping_save it's called, but I don't know the values of the parameters yet. I need to debug it first!
Comment #3
scor commentedI did some quick debugging. From what I can tell using the core locale and content translation modules, language is only supported for nodes. Even though fields support multiple languages, their value is in the LANGUAGE_NONE (und) array key. the language of each node is $node->language, so use that to detect the language of a node.
You have to look into rdfx_get_rdf_model() and the rdfx_add_*() helper functions, in particular rdfx_add_literal(). With ARC, you can use
'lang' => 'en',in the $index to add a language tag. you might have to add an extra parameter to rdfx_add_literal() for the language tag.Comment #4
alejandrodf commentedIt works!!!
I'll copy the first version of the code. Just two lines!! However I should add some conditionals to avoid empty lang tags in single language site. I'll create one to test and correct the code this weekend.
Thanks scor for your help :)
function rdfx_add_literal(&$index, $uri, $property, EntityMetadataWrapper $wrapper, $name) {
$predicates = rdfx_get_predicates($wrapper, $name);
$object_value = $property->value();
$lang=$wrapper->value()->language;
// Extracts datatype and callback from the RDF mapping.
$datatype = '';
if ($wrapper instanceof EntityDrupalWrapper) {
$entity = $wrapper->value();
if (!empty($entity->rdf_mapping[$name]['datatype'])) {
$datatype = $entity->rdf_mapping[$name]['datatype'];
}
if (!empty($entity->rdf_mapping[$name]['callback']) && function_exists($entity->rdf_mapping[$name]['callback'])) {
$object_value = $entity->rdf_mapping[$name]['callback']($object_value);
}
}
foreach ($predicates as $predicate) {
$index[$uri][$predicate][] = array(
'value' => $object_value,
'type' => 'literal',
'datatype' => $datatype,
'lang' => $lang,
);
}
}
Comment #5
Anonymous (not verified) commentedI would just like to comment on how rad it is that you worked with scor to figure this out, alejandrodf... nice work! And a big thanks to djevans for pointing the right direction.
If you post this as a patch, then you'll get a nifty credit to your name on your drupal.org profile :) You can find out more about patching on Jacine's blog or in my screencast (though my screencast is about patching core, so I'd recommend Jacine's first).
Comment #6
djevans commentedalejandrodf, I'm sure you know this already, but as per the code in #4 'language neutral' nodes will have a $lang value of LANGUAGE_NONE, or 'und'. This isn't a valid language tag so (AFAIK) you should replace it with a blank string.
I have a separate patch for rdfx.module which accounts for entity (field-level) translation, where a node may have the certain fields translated into (say) Spanish.
I've altered the signature of rdfx_get_rdf_model() so that you can pass an $info array with a 'langcode' => 'es' key/value pair.
I think the best place to call this function would be inside a function implementing either 'hook_rdf_model_alter' or 'hook_entity_translation_save'. You'd then need to make alterations to the returned model before writing it to the triplestore (to remove non-translatable fields).
This patch assumes that 1237078_1_rdf_get_rdf_model.patch has been applied.
Comment #7
scor commented@alejandrodf Single language site could still use the language tag I think. looking forward to your patch! thanks :)
@djevans your patch looks interesting and in line with the entity API. Questions:
I'm curious to know if this use case is possible with core alone: to have different langcode for fields in the same node: for example an English and a Spanish version of the body field (within the same node). afaik, content translation will create a separate node for each language. What are the use cases / modules leading to multiple languages in the same entity?
The thing is calling such alter hook makes sense to do in rdfx_get_rdf_model() (see #1240778: drupal_alter() support in rdfx_get_rdf_model() - waiting for review), so you would end up with an infinite loop in that case.
I'm not following. could you detail this use case a bit more please? (you could alter the RDF graph before it's saved to the triple store by implementing hook_rdf_model_alter(), see link to issue above.
Committed that patch, so make sure to git pull before rolling your patches, folks :)
Comment #8
scor commentedMore question which we haven't addressed.
1. Should we allow sites which only use one single site wide language to disable language tags in RDF?
2. How to decide whether a literal field value should inherit the node language or not?
- We could check the field language and skip the language tag if it's LANGUAGE_NONE, but afaik with nodes using the content translation module, all fields remain LANGUAGE_NONE even if the node has its own language, so I don't think we can use that.
- Does the use case where a given field instance can have multiple languages really exist in the wild in contrib?
Typical example of a literal which should not get a language tag is date and numbers. In fact RDF 1.0 does not allow a typed literal (datatyped) to have a language tag so we can ignore langcode if a datatype is defined in the RDF mapping. But RDF 1.1 might allow both so it's worth discussing how to define such language tag on/off setting in the RDF API maybe.
There are also some cases where a text string may not get a language tag like the username (like it is the case in Drupal 7 core RDFa).
Comment #9
alejandrodf commentedEdit by alejandrodf. Wrong comment.
Comment #10
gábor hojtsy@scor: Not sure I understand all questions, just running in :)
(1) I think there is value in tagging data with their language even if its a single language on the site.
(2) Second part: the contrib entity_translation module provides a UI for field translation which stores translated values under the field
First part: even entity_translation uses LANGUAGE_NONE for non-translatable fields, since the node (entity) should store the "base" language information, that is the default language regarding that node/entity
Hope this helps. Let me know if I misunderstood the questions :)
Comment #11
Anonymous (not verified) commentedI'm guessing that you brought this up because SPARQL endpoints (if they are implemented to spec) do not return results if the endpoint does specify the language and the query does not specify the language.
I haven't looked at how the code for this works, but maybe it is something that the caller could specify... for example, that SPARQL endpoint module could specify that it doesn't want language tagged data (or could simply not use the language tag). This way, the user could specify in the endpoint UI whether they want content to be language tagged.
Comment #12
scor commentedThanks Gábor for stopping by, your answers re entity_translation module are helpful. it'll give us a contrib module to develop + test this patch.
@Lin, yes, that's the reason I brought it up. I fear though that leaving it up to the Drupal administrator to decide is not something they are likely to know in advance: this is really depends on the consumer and the SPARQL queries which will be ran against a Drupal endpoint. It might also be a matter of personal preference, some people might publish their English-only dataset without language tags, others might stick @en everywhere... I don't think there is a best practice, I'll ask on semantic-web@w3.org.
We could however export both language tag-less and language tagged literal for commodity when running SPARQL queries, as an interim solution at least.
Comment #13
scor commentedtagging
Comment #14
djevans commentedComment #15
no2e commentedI don't understand the case of this issue, but I want to chime in to clarify something about language tags; maybe it is of help:
undis a valid language tag which stands for "Undetermined".There is the language tag
zxxwhich stands for "No linguistic content; Not applicable".Comment #16
scor commentedThanks @no2e for chiming in. In both of these quotes, people were referring to the language tag of the plain literal in RDF. The issue is about how to avoid outputing 'und' or 'zxx' in RDF, where no language tag should be present in the case of an undetermined language in Drupal.