Updated: Comment #44
Problem/Motivation
The RDF module assumes that each prefix is mapped to a unique namespace URI and that this namespace will never change in time. This is because these prefixes are used as is in Drupal config, so changing the namespace would mean a change in the meaning of the mapping. This is easy to avoid in core since there is only one single implementation of hook_rdf_namespaces(), but contrib or custom modules might try to register a prefix which is already registered by another module with a different namespace, and thus end up with conflicting namespaces for the same prefix. This is unlikely to happen in practice given that popular namespaces usually have a unique best practice prefix associated to them, but we need a way to warn developers if they try to register conflicting namespaces.
Proposed resolution
Throw an exception if this situation occurs.
Remaining tasks
Review patch.
User interface changes
none
API changes
none. No change record required since conflicting namespaces were already not supported in Drupal 7, it's just that Drupal 8 will complain loudly about it instead of silently ignoring them.
Older report:
Problem/Motivation
RDFa supports prefix-based indirection, a mechanism for abbreviating URIs. The prefix is a pointer to a full URI. When the local term is combined with the full URI, it creates the meaningful identifier for the term.
The Drupal implementation of this mechanism is flawed. It mistakes the prefix as the global identifier for the vocabulary, instead of the URI it points to. The mappings are stored in the database using CURIEs; e.g. schema:name. However, the mapping between prefix and URI can be changed independently.
Drupal's two collaborating hooks, hook_rdf_namespaces and hook_rdf_mapping, do not properly handle the indirection. The namespace mapping is set in hook_rdf_mapping, which compiles the list of namespace mappings dynamically on every call. However, the CURIEs are not dynamically generated, they are stored (using prefixes) in the database. (For reference, there are two [1] [2] previous issues regarding this).
This results in two problems.
Intra-module inconsistency
The namespace mappings are computed and then temporarily cached, so changes to the mapping are registered with every cache clear. The CURIEs that use those namespace mappings are stored in the database and never recomputed. Thus, if a namespace mapping is changed in a module and it uses a different prefix, the CURIEs will no longer expand properly.
Inter-module inconsistency
hook_rdf_mapping keep track of which module defined which namespace mapping. This causes the same problems that global variables pose in a modular system like Drupal—it is easy for two mappings with the same prefix to collide. If two modules define different namespace mappings for the same prefix, then the terms from one module will expand to the wrong URI.
Proposed resolution
Use a CURIE array for any mappings that use CURIEs. This CURIE array would be structured as follows:
array(
'local_name' => 'interactionCount',
'prefix' => 'schema',
'prefix_module' => 'rdf',
);
Then, compute prefixes so that if the same prefix is used twice, one of them gets a number appended to the end. For example, if a contrib module schema
defines the prefix "schema" to map to http://ex.com/my-schema
, then any mappings which use the prefix "schema" with the prefix_module "schema" would have its CURIEs processed out with the prefix "schema1". Any mappings that use the prefix "schema" with the prefix_module "rdf" would have its CURIE processed out with the prefix "schema".
API changes
- Removes hook_rdf_namespaces and replaces with configuration object.
- Changes mapping with CURIE from:
properties: - schema:author
to
properties: - local_name: 'author' prefix: 'schema' prefix_module: 'rdf
Comment | File | Size | Author |
---|---|---|---|
#68 | interdiff.txt | 2.82 KB | kay_v |
#68 | 8482667-68-rdf-conflicting-namespace.patch | 4.67 KB | kay_v |
#63 | rdf-conflicting-namespace-1778410-63.patch | 4.66 KB | ashepherd |
#63 | interdiff.txt | 1.15 KB | ashepherd |
#59 | interdiff.txt | 985 bytes | ashepherd |
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous commentedComment #2
Anonymous (not verified) CreditAttribution: Anonymous commented#1791678: Change the way RDF mappings are stored describes a way to store each term mapping as its own record, instead of as a single serialized array.
Comment #3
scor CreditAttribution: scor commentedThe advanced registry handling is interesting and worth considering, thanks for posting this. It's powerful but more complex compared to the full URI. I'm not opposed to it though. Performance wise, although the advanced registry would require more queries and joins to build the mappings for each bundle, it would not have a major impact since the mappings used for entity rendering would be cached (the only impact would be when initializing the caches). In D7 we went for a serialized storage because we didn't need to filter down the mappings within a bundle, this might change in D8, let's see. I'd like to discuss the use cases you're hoping to solve with this approach.
I agree, this would be more straight forward to do with a db query.
Hum, are you thinking of the local vocab vs external mappings? I don't think the local vocab should be stored alongside the external mappings, I think the local vocab terms should be generated on the fly when needed and not clutter the external RDF mappings. Also, how is your idea above different from what you advocated a while back? I think that having one external vocabulary in one serialization and another combination of external vocabs in another serialization would lead to confusion and inconsistencies. I make one exception to this rule though, the local vocab, which is not necessary for some case and should be optional. My point is, external vocabs should remain together in all serializations. There might be additional data in some serializations, but the same data item should not have different mappings depending on which serialization you used to retrieve it. makes sense?
How do you expect people to debug this, compared to having the mappings serialized with a row for each bundle?
How would the import work in the context of features for example? Imagine I want to export my mappings and import them on another site, would there be any primary key id conflicts?
How does your proposal related to evoc? Are you trying to provide a solution for Neologism's vocabulary terms management (replacement for evoc)? I think at best Neologism could leverage this system, but it would require more fine grained provenance of the namespaces beyond the module level, and have groups instead.
Comment #4
Anonymous (not verified) CreditAttribution: Anonymous commentedI'm suggesting that we allow people to choose a subset of vocabulary terms to expose the data. For example, if the startDate of an event is mapped to both schema:startDate and rdf_repo:startingTime, then the user could specify that only Schema.org is used in the HTML data, but both are used in a Turtle version, which will be consumed by the RDF repo that defines the rdf_repo vocab.
I believe providing such a subset is different than allowing developers to totally rearrange the model and insert triples that aren't in the model. If you provide a subset, there is still compatibility between the HTML data version and the other serializations... nothing that is said in the one contradicts what is said in the other. You can't say the same thing for an _alter.
I think this is related more to the #1791678: Change the way RDF mappings are stored than this issue, but I'll answer it here.
I honestly don't think that people are debugging the RDF mapping by examining the serialized array in the database... for a person, it is hard to parse a serialized array, especially if the mapping is large.
I believe that providing a row for each property makes it way easier to debug from the database than it is now.
You would simply write the RDF mapping feature code to handle this upon feature installation. Just because a particular ID is used to connect multiple things in feature code, that doesn't mean that the same ID needs to be saved to the database when the feature code is processed.
It is not related to Evoc. It is simply a way to fix our broken prefix-based indirection. It doesn't include any vocabulary import.
Fixing the prefix-based indirection should be the main goal. I don't think it's worthwhile to consider Neologism too heavily in current D8 development, as there isn't any clear momentum for it to be ported to D7 at this point and it has a limited user base. However, this proposal would provide a foundation for Neologism, as it is based on a plan I worked on with the current Neologism project lead to ensure that RDFx in D7 would work for them.
Comment #5
tim.plunkettComing from #1356170: Remove all uses of array_merge_recursive, or document why they are being used instead of NestedArray::mergeDeep(), rdf_get_namespaces() depends on module_invoke_all() using array_merge_recursive(), and then preforms its own filtering.
Furthermore, the actual behavior is that any conflicts result in the prefix not being added at all.
See the last assertion in GetRdfNamespacesTest::testGetRdfNamespaces().
This issue implies that none of the hook-based solutions are ideal, whether the first or last takes priority or none do. Any advice on how to move forward in that issue while not complicating this issue further?
Comment #6
scor CreditAttribution: scor commentedFYI, it's very likely that this part of the rdf module will be refactored and we could definitely keep this requirement in mind when rewriting it.
Comment #7
Anonymous (not verified) CreditAttribution: Anonymous commentedI'm starting to think that an RdfVocabulary config entity might make sense. Then we could attach relevant metadata to the vocabulary as well.
Comment #8
Anonymous (not verified) CreditAttribution: Anonymous commentedThis patch creates an RdfNamespaces config entity and uses that to store the mappings. It also removes hook_rdf_namespaces.
I'm working on top of the patch in #1869600: Refactor RDF mappings to be inline with the new Entity Field API, so it might not apply directly.
Comment #9
tim.plunkettI just have some feedback from the ConfigEntity front, attached as an interdiff.
I've also rerolled the above, also using the other issue as a starting point.
I think this is a reasonable usecase of them, but will there ever be a UI to create them? Otherwise it will be our first ConfigEntity with no UI.
Comment #11
Anonymous (not verified) CreditAttribution: Anonymous commentedThanks Tim :)
The fails in Tim's patch are because the patch it depends on fails RDFa placement tests.
Since Tim's patch:
Comment #13
tim.plunkettThe config key maps to the name of the property in the ConfigEntity, which are lowerCamelCase, so that needs to be reverted.
Comment #14
Anonymous (not verified) CreditAttribution: Anonymous commentedI'm confused, the handbook page on the Configuration API uses underscores, http://drupal.org/node/1762632... see some_string, some_int, some_bool
Comment #15
Anonymous (not verified) CreditAttribution: Anonymous commentedThis patch adds RdfNamespaceManager::prepareCuries() and tests for it. The method accepts an array of CURIES in the following form:
The resulting array would be:
Comment #17
Anonymous (not verified) CreditAttribution: Anonymous commentedReroll.
Comment #19
Anonymous (not verified) CreditAttribution: Anonymous commentedRenaming to better describe the issue. I'll be rerolling shortly.
Comment #20
Anonymous (not verified) CreditAttribution: Anonymous commentedRerolled. This will probably fail miserably.
Comment #22
Anonymous (not verified) CreditAttribution: Anonymous commentedI've incorporated the changes from the RDF mapping issue, which should help with test failures.
In addition, I've changed the config naming to be a bit more standard and have removed the custom storage controller.
Comment #24
Anonymous (not verified) CreditAttribution: Anonymous commentedThis patch introduces a RdfUri class which handles conversion between a CURIE array and a CURIE string.
Comment #26
scor CreditAttribution: scor commentedare there situations where $type may not be an array? is that to allow full URIs as types, or related to the temporary CURIEs as array approach?
ok. did you want to add a @todo to remember to evaluate that later, so we don't forget it?
The docs say that $uri_info can be a string, but in this case why is it not stored in some property of the object? also looks like the tests don't test the case where CURIEs are strings (at least in the interdiff, but might have missed something).
Comment #27
scor CreditAttribution: scor commentedI'm not sure where the idea of CURIE as array came from? not that I'm against it, but curious to know why you think it's better than CURIEs as strings. Doesn't it add a level of complexity for the developer who sets up the mappings in the YAML file? or is the array just a data structure used behind the scenes in the code logic, but no visible in YAML? If this impact the YAML, could you post an example.
Also, it seems the issue summary is outdated after these discussions. For example, it seems we won't really need to worry about any kind of database storage since we're using a config entity now.
Comment #28
Anonymous (not verified) CreditAttribution: Anonymous commentedThe CURIEs as arrays have been part of this patch since around #15, and I discussed it in #11. Using the array makes it possible to indicate which module defines the prefix. Yes, it would impact the YAML... the CURIE string would be replaced by the array.
Prefixes should never be used as persistent identifiers in and of themselves... URIs are identifiers. Prefixes in RDF are inherently rebind-able. The prefix is just a temporary alias, which could be re-aliased in the next request. Therefore, you shouldn't use them as identifiers in your persistence layer.
Regarding adding complexity... it doesn't add complexity, it simply reveals complexity. What we are doing by using CURIE strings, to borrow phrasing from Benji Nowack, isn't reducing complexity... we're just hiding it. And we're hiding it in a way that makes things break. When someone just pastes "schema:name" into a YAML file, they don't understand that some other part of the system that they've never looked at or touched will dictate whether this is a valid CURIE or not. By making it clear that they have to indicate the prefix_module, we signal how exactly this works. We take the dark magic out of it.
We could still provide UIs that do not expose this complexity, because it could be handled in the code behind the scenes. But I don't think that hiding the complexity at the API level is a good thing.
Regarding the implementation comments (the todo and the docs inconsistency), this patch was simply meant to get your feedback on the approach before fully committing to it.
Comment #29
scor CreditAttribution: scor commentedoh, I didn't know you were talking about #15. sure that looks good, let's go ahead with that.
Comment #30
Anonymous (not verified) CreditAttribution: Anonymous commentedThis patch adds documentation for the RdfUri class, adds support for passing in full URIs, and makes the tests that were failing pass.
Comment #32
scor CreditAttribution: scor commentedGreat news! The main refactoring patch for RDF was committed (#1869600: Refactor RDF mappings to be inline with the new Entity Field API), so this issue can move forward now.
Comment #33
Anonymous (not verified) CreditAttribution: Anonymous commentedOK, this now includes the updated Schema.org mappings, and removes the temp function.
Comment #34
scor CreditAttribution: scor commentedI appreciate the concerns in #28, I think the way you solved it with the array is good. Given that we won't have a mapping UI in core, should we try to flexible model in YAML to let people edit their mappings there directly? I found it quite handy when testing the other the earlier RDF patches to be able to do just edit my YAML files using CURIEs directly. I feel that we're losing in terms of DX when we're switching from:
to
Is there a way we could still support the advanced syntax for those who want to define fancy prefixes, but let those who just want to use the prefixes defined by core (which are known, stable, and always enabled) write CURIEs like in the first snippet above? In other words, if it's a string, then use the prefixes defined by rdf.module. In fact if I use a string in the YAML, it works, and it outputs as is, so this scenario is possible, except there is no validation. So it's up to the developer to check there is no typo. Just wondering whether we should document this practice, or even encourage it for the majority of the users?
Comment #35
Anonymous (not verified) CreditAttribution: Anonymous commentedI think that inconsistency will only confuse the issue for novice users. I would prefer to have the API be explicit and consistent.
Comment #36
Anonymous (not verified) CreditAttribution: Anonymous commentedAlso, we can't really do validation on CURIE strings if we also want to allow URIs for the values. Even though CURIEs are not URIs, they fit the URI pattern.
Therefore, if we throw exceptions for "prefixes" that we don't recognize, we might be rejecting a valid URI.
So if we want to do validation to ensure that the prefix is defined and also allow arbitrary schemes in URIs, I think we have to use the array structure.
Comment #37
Anonymous (not verified) CreditAttribution: Anonymous commentedReroll.
Comment #37.0
Anonymous (not verified) CreditAttribution: Anonymous commentedUpdated issue summary.
Comment #37.1
Anonymous (not verified) CreditAttribution: Anonymous commentedSwitch proposed solution.
Comment #37.2
Anonymous (not verified) CreditAttribution: Anonymous commentedAdd explanation of CURIE processing.
Comment #37.3
Anonymous (not verified) CreditAttribution: Anonymous commentedAdded API changes.
Comment #37.4
Anonymous (not verified) CreditAttribution: Anonymous commentedFix broken end tag.
Comment #38
tim.plunkettAs part of #1802750: [Meta] Convert configurable data to ConfigEntity system
Comment #39
catchThis looks like it'll require a HEAD-HEAD upgrade path if it goes in after beta, so tagging for upgrade path.
Comment #40
Anonymous (not verified) CreditAttribution: Anonymous commentedTop. This needs review.
Comment #41
catch#37: 1778410-37-rdfnamespace-config__includes1784234.patch queued for re-testing.
Comment #42.0
(not verified) CreditAttribution: commentedChange h3 to h2.
Comment #43
alexpottRestoring tags eaten by the d6 tag monster
Comment #44
scor CreditAttribution: scor commentedThis is unfortunate, but we can live with this. I don't think the added complexity in the mapping (and performance impact) proposed in this issue is worth it, especially given that it would only benefit a very small portion of use cases. Let's look at the audience of this issue first: the majority of the sites will only use a handful of popular vocabularies (e.g. schema.org and possibly SKOS, provenance, VOID, dcat), and that's all they will need. In the 0.1% of the other use cases, they might require more of the long tail of vocabularies. To avoid any kind of prefix conflict, I've created a documentation page: RDF namespaces registry. It lists all the popular vocabulary prefixes that people can pick from as needed. Secondly, people should never change their prefix bindings. I've never seen this problem in the wild. I don't think Drupal needs to support such feature.
Let's close this issue.
Comment #45
jneubert CreditAttribution: jneubert commented+1 for closing.
For the 0.1% using long tail vocabularies, and don't find their vocabs on RDF namespaces registry, perhaps it could be suggested there to
Comment #46
Anonymous (not verified) CreditAttribution: Anonymous commentedSo there was at least one case of this in contrib where two modules defined the
og
prefix differently. Yes, one didn't do it in a standard way, but that doesn't mean that users who installed the two modules should expect the prefix declaration to fail.Silent failure is bad design for something that few people understand or think to check. The system could at least throw an exception when prefixes are defined in a way that would cause the system to drop the prefix from the list of namespace declarations.
That is my two cents. As I am resigning as a maintainer of this module, I will not be adding any further objections.
Comment #47
scor CreditAttribution: scor commentedThanks Lin for the suggestion, this is something that would be trivial to add so that this problem gets raised as early as possible during site development process.
NB: I will be offline until Dec 23rd and then on and off until Jan 7th.
Comment #48
scor CreditAttribution: scor commentedComment #49
adorsk CreditAttribution: adorsk commentedHere's a first pass at this, let me know what you think.
Comment #50
adorsk CreditAttribution: adorsk commentedComment #52
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedI have created a new patch with a test that checks whether the exception is thrown.
Comment #53
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedComment #54
scor CreditAttribution: scor commentedThis looks great, Stefan!
Let's avoid adding a new test GetRdfNamespacesConflictTest, and instead, test for the exception in the existing GetRdfNamespacesTest (the tests will run faster that way). For that, all you need is to enable the rdf_conflicting_namespaces module midway through testGetRdfNamespaces(), and place your code that catches the exception right afterwards....
Comment #55
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedSee attached patch and interdiff implementing previous comment.
Comment #56
scor CreditAttribution: scor commentedindentation issues here.
Maybe add a short comment to explain what's going on here, why we're suddenly enabling that module.
Comment #57
scor CreditAttribution: scor commentedComment #58
ashepherd CreditAttribution: ashepherd commentedAddressed the problems of comment 56
Comment #59
ashepherd CreditAttribution: ashepherd commentedMissed a space in the comment. This patch removes that extraneous space character
Comment #62
scor CreditAttribution: scor commented@ashepherd: The files rdf_conflicting_namespaces.info.yml and rdf_conflicting_namespaces.module are missing from the last patches (they were present in #55). Patch #55 should be re-applied so you can get these two files, and then you need to "git add" these 2 files and commit them to your branch for them to be part of your patch. (the other files that are affected by the patch will fail to get patched, but that's fine, you can just ignore them).
Comment #63
ashepherd CreditAttribution: ashepherd commentedadded the missing rdf_conflicting_namespaces files that were present in #55
Comment #64
scor CreditAttribution: scor commentedThanks @ashepherd, this patch looks good now.
(aside: Nice to see the issue metadata switches back in the d.o comment form!)
Comment #65
ashepherd CreditAttribution: ashepherd commented@scor, no problem! thanks for showing me the ropes
Comment #66
alexpottWe don't use t() in test messages. Use String::format to do placeholder replacement.
Weird formatting
Also noticed whilst reviewing that rdf_get_namespaces has a completely superfluous call to function_exists() - this is checked in
ModuleHandler::getImplementations()
- can we open a follow up to remove this?Comment #67
scor CreditAttribution: scor commentedsince rdf_get_namespaces() is being refactored in this patch, let's fix the function_exists() in the next reroll.
Comment #68
kay_v CreditAttribution: kay_v commentedimplementing changes requested in #66
however, assuming t in rdf.module at
throw new Exception(t('Tried ...
should be maintained per Coding standards > PHP exceptions
also omitting fix for function_exists() per #67
Comment #69
scor CreditAttribution: scor commentedAlex's comments in #66 have been addressed. back to RTBC.
Comment #70
andypostComment #71
scor CreditAttribution: scor commented@andypost: Is the Needs change record tag in light of what's written in the OP:
Happy to be proven wrong, but wanted to check if you had seen this ^^.
Comment #72
andypost@scor Sorry, missed that hook is not removed now
hook_rdf_namespaces
, that was in original summaryComment #73
alexpottCommitted d63062a and pushed to 8.x. Thanks!