Problem/Motivation

JSON-LD is a serialization of the RDF data model. The RDF data model uses URIs for property names. We have an issue to decide the URI template we use for vocabulary terms, #1784198: Decide on a URI structure for site generated RDF vocabulary.

When people go to that URI, it is best practice to return information about that property. For example, a label and description. You can also give structural information such as the domain and range. The domain is the bundle that the property is attached to. The range is the kind of value it can take.

Proposed resolution

Use RDF Schema to describe entities and properties.

An example of information that one might find at the URI site:Entity/Node/Article.

There may be additional information that we want to add which do not have corresponding properties in RDFs. We could mixin other vocabs or create new properties for those.

Files: 
CommentFileSizeAuthor
#16 1831286-16-site-schema.patch21.4 KBlinclark
PASSED: [[SimpleTest]]: [MySQL] 49,239 pass(es).
[ View ]
#16 interdiff.txt7.56 KBlinclark
#15 1831286-15-site-schema.patch15.88 KBlinclark
FAILED: [[SimpleTest]]: [MySQL] 49,230 pass(es), 1 fail(s), and 0 exception(s).
[ View ]
#15 interdiff.txt10.35 KBlinclark
#5 1831286-05-site-schema.patch15.04 KBlinclark
PASSED: [[SimpleTest]]: [MySQL] 48,232 pass(es).
[ View ]
rdfs-site-schema.png558.64 KBlinclark

Comments

A JSON-LD example of the white, yellow and green sections above. I haven't included the blue part for simplicity.

{
  "@context": {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "site": "http://ex.org/site-schema/",
    "site-cs": "http://ex.org/site-schema/content-staging/",
    "parent": {
      "@type": "@id",
      "@id": "rdfs:subClassOf"
    },
    "domain": {
      "@type": "@id",
      "@id": "rdfs:domain"
    },
    "range": {
      "@type": "@id",
      "@id": "rdfs:range"
    }
  },
  "@graph":
  [
    {
      "@id": "site:Entity/Node/Article",
      "@type": "rdfs:Class",
      "rdfs:label": "Article",
      "rdfs:comment": "Use articles for time-sensitive content like news, press releases or blog posts.",
      "parent": "site:Entity/Node"
    },
    {
      "@id": "site:Entity/Node",
      "@type": "rdfs:Class",
      "rdfs:label": "Node"
    },
    {
      "@id": "site:Entity/Node/Article/field_body_value",
      "@type": "rdfs:Property",
      "rdfs:label": "Body - value",
      "domain": "site:Entity/Node/Article",
      "range": "rdf:HTML"
    },
    {
      "@id": "site:Entity/Node/Article/field_body_summary",
      "@type": "rdfs:Property",
      "rdfs:label": "Body - summary",
      "domain": "site:Entity/Node/Article",
      "range": "rdf:HTML"
    }
  ]
}

I've added a ticket to VIE about this, so that Drupal's front-end editing features would then be able to utilize the type data.

So I guess the piece of info we currently can't generate is "value : rdf:HTML / summary : rdf:HTML / format : xsd:string" in the blue box.
I.e, description of field type 'columns' in rdf:range terms...

Related to this discussion, the official Schema.org vocabulary description uses the same kind of RDFS structure. They now use it to import new terms.

@yched Correct. I believe that we could probably handle this mapping within Drupal's data type system. Since few people will be defining their own data types, I don't think that it will result in too much additional complexity for module developers. We'll have to see if there are any edge cases that can't be handled at using a mapping in data type info.

Status:Active» Needs review
StatusFileSize
new15.04 KB
PASSED: [[SimpleTest]]: [MySQL] 48,232 pass(es).
[ View ]

Just posting my first pass for the moment.

This patch provides two site-generated schemas:

  • Content Staging
  • Syndication

It provides routes and controllers for terms. So far, the only supported term type is bundle. The properties it returns for the bundle correspond to the following RDF:

<http://d8.l/site-schema/syndication/entity_test/entity_test> <rdfs:isDefinedBy> <http://d8.l/site-schema/syndication/> .
<http://d8.l/site-schema/syndication/entity_test/entity_test> <rdfs:subClassOf> <http://d8.l/site-schema/syndication/entity_test> .

Edit: Forgot to mention, some of the routing code comes from fuhby's work during the Gent sprint, so he should also get commit credit on this.

+++ b/core/modules/rdf/lib/Drupal/rdf/RdfConstants.php
@@ -0,0 +1,25 @@
+abstract class RdfConstants {
+  const RDF_TYPE            = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type';
+  // RDF Schema terms.
+  const RDFS_DOMAIN         = 'http://www.w3.org/2000/01/rdf-schema#domain';
+  const RDFS_IS_DEFINED_BY  = 'http://www.w3.org/2000/01/rdf-schema#isDefinedBy';
+  const RDFS_RANGE          = 'http://www.w3.org/2000/01/rdf-schema#range';
+  const RDFS_SUB_CLASS_OF   = 'http://www.w3.org/2000/01/rdf-schema#subClassOf';
+  // XSD datatypes.
+  const XSD_INTEGER         = 'http://www.w3.org/2001/XMLSchema#integer';
+  const XSD_DOUBLE          = 'http://www.w3.org/2001/XMLSchema#double';
+  const XSD_BOOLEAN         = 'http://www.w3.org/2001/XMLSchema#boolean';
+  const XSD_STRING          = 'http://www.w3.org/2001/XMLSchema#string';
+}

Wha? Why can't these be put into some more logical class?

+++ b/core/modules/rdf/lib/Drupal/rdf/SiteSchema/EntitySchema.php
@@ -0,0 +1,54 @@
+  /**
+   * Overrides \Drupal\rdf\SiteSchema\SchemaBase::getProperties().
+   */
+  public function getProperties() {
+    $properties = parent::getProperties();
+    return $properties;
+  }

This method seems redundant...

+++ b/core/modules/rdf/lib/Drupal/rdf/SiteSchema/SchemaController.php
@@ -0,0 +1,50 @@
+    $serializer = drupal_container()->get('serializer');
+    $siteSchema = new SiteSchema($schema_path);

If you make this class ContainerAware, then you can just call $this->container->get('serializer');

I don't fully understand all the RDF gibberish, but I can help with the general architecture bits. :-)

Thanks for the offer of help for the general architecture bits, it is much appreciated.

Wha? Why can't these be put into some more logical class?

It is very possible that we will be using these constants in the JSON-LD Normalizer. Depending on how we denormalize, we might be using the term definition to process an entity; For example, we might use the subClassOf relationship to figure out the identifier for the entity type's term definition from the bundle's term definition, in which case we'd be using RdfConstants::RDFS_SUB_CLASS_OF. I broke the constants out into their own class because I figured that would make it cleaner to include in other places. It's an idea I borrowed from Markus Lanthaler's JSON-LD parser.

Let me know if that still sounds weird.

Though you didn't bring this up, I do want to make sure it's clear to reviewers: this isn't the way I expect most RDF terms to be handled. This is a special group of terms that are used to define the semantics of the semantics, which is why they are handled differently (e.g. not configurable, not bothering with namespace prefixing, etc).

This method seems redundant...

I was imagining that this would add something like

site:(node|article) rdf:type rdfs:Class

and possibly something like
site:(node|article) rdfs:subClassOf drupal:Entity

...both of which apply to both entities and bundles. I decided not to add many properties until it's clearer how we're going to use the term descriptions provided by the schema (and also until this architecture had been reviewed).

I'll make the controller container aware. I expect that code to change once we have the REST/conneg work done, but better to have it as clean as possible for now.

The redundant method: Currently all its doing is calling its parent implementation, then returning it. Basically it's a no-op, and if you eliminate it then the parent method gets called. That's what's redundant about it. :-)

I don't fully understand what you said for the RDF constants, probably because of my limited understanding of RDF. Mostly, I just dislike having constant-only-classes as they feel like a hack. Do they not "belong to" any other class? They can be used wherever, and will get lazy loaded, as long as they're on a class somewhere.

Sorry, I wasn't making myself clear. The function is currently redundant. However, I plan to add properties there. I can remove it for now, though my next patch might be adding properties there anyway.

If the constants belong to a class, it is SiteSchema. Do you have the same problem with the SchemaConstants class? The list of constants in both classes will be expanding. I personally prefer the organization of separating them out into discrete classes with distinct purposes, but am open to dumping them all in SiteSchema, depending on how others feel.

How far do you foresee them expanding? (And we should probably get some input from others here; I know I'm not the only person to dislike constant-classes, but that position is not universal.)

The SchemaConstants class will have a few more paths (e.g. for fields, field properties, field types). Other than that, I don't think that there should be anything else to include in that set of constants. In addition, those constants are tied to our application (and specifically the SiteSchema portion of our application), so I think making those a part of SiteSchema could make sense regardless of whether we keep the other constant class.

The RdfConstants class could include quite a few more constants. For example, we could potentially add constants to map our internal datatypes to their XSD equivalents. There are already a few XSD constants in the class, but here is the full tree of XSD simple types. I'm not saying that we will definitely add a bunch of XSD constants, but it may make sense down the line.

Like I said above, this isn't how we will handle the URIs for user configurable mappings. This means that there's a finite number of URIs we would add... probably just terms from RDF, RDFS, and XSD. However, it's a large enough list that it could make the SiteSchema class less readable. These URI constants are also likely to be used in multiple places outside of the SiteSchema, in files that wouldn't otherwise reference the SiteSchema class.

Status:Needs review» Needs work

I had chat about this general issue with Lin at BADCamp while she was working on the diagram in the OP, but I'm realizing I never reported here that I'm in agreement.

The convention in RDF vocabularies is to ucfirst() the class names (e.g. Article) and keep the properties starting with lower case. This convention exists because it makes it easy to know what kind of term you're dealing with (Class or property). It's only a convention and RDF does not require it, so we could break this convention if we think it doesn't make sense in Drupal's use case. In particular I'm thinking that machine names are usually all lower case, so ucfirst()'ing machine names might look odd for developers and string matching would have to be case-insensitive. Also, would this have any impact on the routing? ucfirst() might also look like we're dealing with human labels (Article) when really we've altered the machine name article, so it could be misleading.

I also note that we aren't dealing with a traditional global site vocabulary also, but rather with micro site vocabularies since the terms are nested: http://d8.l/site-schema/syndication/Node is a class but also a vocabulary defining http://d8.l/site-schema/syndication/Node/Article... unless you include / in terms, in which case Node/Article is a term of the http://d8.l/site-schema/syndication/ vocabulary. Not a big deal at the end since all we care about and what matters in RDF are the full URIs.

Another question I brought up is how deep should each path such as http://d8.l/site-schema/syndication/ publish its children terms, just the immediate children (entity types in this case), or all the nested terms entity type * bundles * fields * properties. This could have a performance impact unless it's cached. Note that it does not change very often so could well be cached via EntityNG.

I've tried the patch but could not preview any kind of JSON-LD schema at site-schema/syndication/node/article, and I also note that the tests are not testing the actual routing, which probably should be part of the tests. Is this patch depending on some other patch for the HTTP response to work? I did see the two routes in the router table, but my browser gave me a Drupal access denied.

In particular I'm thinking that machine names are usually all lower case, so ucfirst()'ing machine names might look odd

If you look at the patch, you'll see that they don't use ucfirst. I used more traditional style in the diagram thinking that it might be clearer to the IKS folks which URIs were classes and which were properties.

I also note that we aren't dealing with a traditional global site vocabulary also

If you look at the patch, you'll see that it does actually break it conceptually into two global site vocabularies. I use rdfs:isDefinedBy... which "may be used to indicate an RDF vocabulary in which a resource is described."

I know that some have suggested that a vocabulary is everything in the URI that comes before the local name, but that reliance on URI structure has never seemed sensible to me.

how deep should each path such as http://d8.l/site-schema/syndication/ publish its children terms

I believe that it should show the full vocabulary. It can definitely be cached, since it will only change when a content type or field instance is added/modified.

browser gave me a Drupal access denied.

This is probably because the access patch went in since I posted this.

Side note: Based on discussion elsewhere, I'm going to moderate some on constant-classes. I don't like them, but it seems we're not going to be able to get away from them. Let's try to only use them where really necessary, though.

StatusFileSize
new10.35 KB
new15.88 KB
FAILED: [[SimpleTest]]: [MySQL] 49,230 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Just did a little clean-up. I will be making larger changes in the next patch.

  • Moved SchemaConstants into their appropriate classes. The RdfConstants class remains as it was.
  • Added a property (rdf:type) in EntitySchema::getProperties.
  • Changed CONTENT_STAGING schema to CONTENT_DEPLOYMENT. Some folks outside of this initiative have been using "content staging" to mean something else, so I want to make sure that it's clear we're talking about moving content from one site to another.

Status:Needs work» Needs review
StatusFileSize
new7.56 KB
new21.4 KB
PASSED: [[SimpleTest]]: [MySQL] 49,239 pass(es).
[ View ]

This patch fixes the access denied error and adds a JSON-LD normalizer for schema objects.

#1852812: Use cache to get entity type/bundle metadata from JSON-LD @type URI depends on this patch, so I've recommended committing it as part of that issue. If anyone wants to give input, please provide it there. We can add more properties to the schemas and more functionality in follow up issues.