Problem/Motivation

RDF relies on embedding markup inline with objects on the page. This was extremely challenging to implement in the first place, does not have good coverage in contrib, and is getting outdated.

There are various contrib alternatives:

https://www.drupal.org/project/schema_metatag - 25,000 users (depends on metatag)
https://www.drupal.org/project/jsonld
https://www.drupal.org/project/json_ld_schema
https://www.drupal.org/project/jsld

However, those projects don't have exact parity with RDF module. For example, schema_metatag requires configuration, and its current config model requires altering other config objects rather than being able to ship default config that would provide parity with the default RDF config that's currently in the Standard profile.

Proposed resolution

  • Deprecate RDF in Drupal 9.4 core. Remove it from Drupal 10.0 core.
  • Move RDF to contrib, thereby making it easy for existing sites to keep using it in the short to medium term.
  • Create a Change Record that mentions both the RDF contrib module and alternatives, such as schema_metatag.

Remaining tasks

  • A maintainer is needed for the RDF contrib module.
  • Once we have that maintainer, create an implementation issue to do what's in the proposed resolution.

User interface changes

API changes

Data model changes

Release notes snippet

Original issue summary

As a maintainer of RDF module, I don't have much confidence that the module provides the reliability we need for Drupal 8. We don't have a solution for compound fields (such as addressfield). Not all core field formatters are tested, to say much less of contrib field formatters. And we have criticals that haven't seen any major movement in many months.

In the meantime, Google has announced that it will consume JSON that is embedded in pages using <script> tags, and other search engines have followed suit. As I outline in my companion blog post, this is preferable to HTML data formats (RDFa, microdata) for a number of reasons. The ones relevant to this issue are:

  1. It would mean we could use the same pipeline for REST’s serialization and HTML data
  2. It removes complexity from the field formatters and theme layer
  3. It makes it easier to replace the implementation from contrib

While this is a large change in terms of the description on the Drupal 8 box, I'm fairly certain it will actually require less work than getting our RDFa up to snuff and has a better chance of being reviewable by other people, which this work hasn't really been in the past.

Comments

linclark’s picture

Issue summary: View changes
webchick’s picture

Assigned: Unassigned » scor

Should probably get scor's opinion on this.

Is this actually a beta blocker? It seems to me if we do choose to remove this module, we could do so all the way up until a very late RC?

linclark’s picture

Oh, I just assumed that removing one module and adding another was a beta blocker, but I'm totally cool with it if it isn't.

nod_’s picture

I read the blog post as well but just to be sure, can contrib still implement RDFa or is there code in other core modules added to support the current RDFa module that wouldn't be possible to do from contrib?

Just asking since that was the case for overlay and it means that module is dead and impossible to implement from contrib. Not a big deal for overlay but I could see people actually wanting RDFa at some point.

linclark’s picture

The only issue that would make it potentially harder, if it were reverted, is #1778122: Enable modules to inject attributes into field formatters, so that RDF attributes get output.

If a contrib module wanted to support RDFa in the same way that D7 supports it, it would be entirely possible to do from contrib even if the issue is reverted. I believe it would even be possible to support the more reliable RDFa Lite processing model (which is one of the big differences between D7 and D8) from contrib if that issue is reverted, though it would be messier.

nod_’s picture

Ok, perfect. thanks :)

Crell’s picture

Lin, is the net change here that we'd have fewer semantic attributes sprinkled throughout the page, but instead have, essentially, an alternate "machine-targeted" version (in semi-JSON-LD) embedded within the page? Vis:

<html>
<head>
<script>
  A node rendered to JSON-LD here.
</script>
</head>
</body>
  The same node rendered to HTML here.
</body>
</html>

Or something vaguely along those lines?

If that's correct, then in concept I am very +1. As we've discussed in the past on REST team calls, at this point I think the idea of comingling essentially two different data models into one XML tree (HTML and RDFa) was a mistake from the get-go and they should be separate documents simply linked to each other.

My concerns would be:

1) You've expressed in the past that our data model really doesn't map to JSON-LD well at all. That's why we dropped it in favor of HAL. Would it map better to "JGoogle-LD" (or whatever), or are we still looking at a hard mismatch?

2) What would be the net impact on file size? We would presumably be removing a lot of attribute markup, but adding a big string to the header. Would that be a net win on file size, big net loss, or close enough that we shouldn't bother caring? (My gut feeling is that the non-data markup on the page vastly dwarfs the amount of text we're talking about here either way, but I figure it's important to ask.)

Logistically, Lin is correct that D8's improved pluggability should make this much easier to do. To that end, I'd suggest an approach of ripping out RDFa entirely and planning to ship 8.0 with "none of the above". All of the various bits to make this happen (new encoder for serializer, hooks into the HTML Head, etc.) can happen in a contrib module for the moment, and evolve way faster than core.

Then when that's done we can fold it back into core. If that happens by 8.0, great. If not, it's exactly the sort of functionality we can add to 8.1 and show off as a hot-new-thing for 8.1. The important part for now is just ensuring that the hooks (conceptually, not hook functions) that we'd need are in core. Actually using them can then be a separate non-blocking task.

linclark’s picture

Yes, your understanding of how this would be included in the page is basically correct... though in the example in their docs, Google places the JSON object in the body (rather than the head).

You've expressed in the past that our data model really doesn't map to JSON-LD well at all.

The hardest thing about the work before was building in the flexibility to handle mapping the same entity type to multiple domain models which have incompatible ways of structuring data. And then doing so in a way that wasn't super confusing to users. If we are targeting a very specific domain model, this becomes easier.

What would be the net impact on file size?

If a person wants to expose every single property on their page as part of the JSON, then this would be true... for example, if they want to include the body as one of the properties. However, if they just expose the data that is necessary for rich snippets, I expect the difference will only be slight, since those are usually small bits of data and we often have to introduce a span element or the like to mark them up in HTML.

catch’s picture

I think it'd be fine to leave the API support we put in specifically for RDFa in core, but move the module to contrib. For module removal this issue should be assigned to Dries sooner rather than later but agree scor would be good to hear from.

Even if there's a bigger file size, having less attributes all over the place ought to be a bit better for browser performance. Whether any of that is measurable compared to the rest of the stuff on the page is a different issue.

I'd probably go for a <script tag at the bottom of the page, then it absolutely doesn't block anything else getting rendered/parsed?

With JSON-LD, what happens if a block is rendered in isolation (client side include for example)? Can it extend the main JSON lump with it's own separate bit? Or would we have to merge those? How does it work in general for multiple things rendered on the same page?

Also we'd need to ensure that the JSON-LD data that's added during rendering is compatible with render caching - it's got similar problems to drupal_set_*() drupal_add_*() vs. #attached. i.e. if the node is rendered from the render cache, then the JSON-LD should be too - rather than not at all, or from scratch each time.

Dave Reid’s picture

/me wonders how this works with Views or more complex things that aren't necessarily "only one primary thing on a page" since this would switch from inline data to one big chunk of data.

scor’s picture

As appealing as your proposal sounds, I think it would be a big step backwards from all the efforts we've put into RDFa and the theme layer over the years, and comes with its own trade offs. Here are my arguments against your approach:

  • it breaks the DRY principle.
  • it leads to additional (duplicated) content in the page, which sometimes isn't a big deal in terms of markup size, but it can be quite significant for types of content such as job posting where the meat is split into several chunks of texts.
  • it leads to some doubts as to which fields/properties to add in the HTML-JSON, as opposed to just give everything to search engines and let them pick what they like. This may seem irrelevant in the context of rich snippet, but is much more important in the general context of the Knowledge Graph
  • no indication that JSON or JSON-variant is a better fit for schema.org or more supported by search engines compared the other HTML syntaxes. Google has said they support JSON for some use cases (e.g. gmail), but not sure it's actually working for all the other use cases like rich snippets for example.
  • you still need a way to map your fields / content to schema.org in some way, which is handled by the RDF module at the moment.

Overall it seems too much of a drastic change at the last minute to an approach that is still very new (like a few months old) and we're not sure how the JSON-in-HTML is going to pan out in the coming years. On the other hand we know that search engines are comfortable extracting data from traditional HTML as they have done for years, so this support is not going away. Google is currently able to extract RDFa from Drupal 7 and present results in the form of rich snippets (recipe, person), so why take that away? (Note that Google's webmaster documentation is admittedly outdated as has been reported before). I agree the markup generated by D7 leaves a lot to be desired, but in comparison the equivalent markup in D8 is much more efficient.

Regarding the technical concerns you have on the reliability of the markup:

  • compound fields (such as addressfield): compound fields have a fairly predictable mapping and therefore don't need to be made overly customizable in config. Addressfield for example has consistent mapping (PostalAddress, streetAddress, addressCountry, etc). The street property of your formatter is always 'streetAddress', and I would qualify any situation where this is not the case (can't think of any) as enough of an edge case to call for a custom implementation (e.g. customer formatter).
  • Not all core field formatters are tested: that's something actively being worked on (META). A bunch of us have weekly calls / 2 hour code sprints where we have from 2 to 4 attendees every week. Two of these people also came to the code sprint last Saturday at MIT and worked on core RDF issues on their own initiative (the sprint was general and not specifically on RDF).
  • And we have criticals that haven't seen any major movement in many months: re RDF namespaces collisions result in invalid CURIEs, while this is an issue in theory, I don't think it's worth the added complexity given than 99.9% of the users will never run into it (based on schema.org being a major use case). I've talked about the compound field issue further up.
  • Overall when you look at the situation in D7 compared to the great progress that was made in D8, we're in much better shape. Credits is due here to you Lin for your refactoring work and all the other contributors: kay_v, jesse.d, cwells. Let's also not forget all the others who funded the work.

So obviously, a big -1 from me! Here is my counter proposal: give more time to fix the remaining major issues and re-evaluate the situation before RC1.

pwolanin’s picture

-1 from me also.

Certainly you could turn of the RDFa and add a JSON mapping or add the JSON mapping to a later 8.x release when the spec is more stable?

This seems like a bit of a distraction compare to deep API issues for beta.

linclark’s picture

compound fields (such as addressfield): compound fields have a fairly predictable mapping and therefore don't need to be made overly customizable in config.

You miss the point of what I'm saying... I'm not talking about config. AddressField outputs multiple properties in one blob of HTML. In order to mark up the individual properties, you need to actually have code in the formatter (or in a very hard-coded preprocess function) which knows where to place attributes, and add spans if needed.

So the way you would handle this in schemaorg_contrib is by having a long list of conditionals in hook_preprocess_field... if this is addressfield, use the tightly coupled addressfield special preprocessor. This seems unsustainable to me.

linclark’s picture

Certainly you could turn of the RDFa and add a JSON mapping or add the JSON mapping to a later 8.x release when the spec is more stable?

To me, it's less of an issue of what I would do personally. It's more about what kind of implicit promises we make to users when we say we support something.

The data exposed by Drupal 7's RDFa was unreliable. We can chalk that up to the fact that it was the first pass at implementing it. But if we continue to say that we support it in Drupal 8, we should make sure that it really works.

Having been close to this work for a number of years, and having seen the patterns of contribution around it, there is nothing at this point that leads me to believe that we will have solid support.

jneubert’s picture

I'm writing as an user of RDFa in Drupal 7. The RDFa support in core was the one thing which convinced me, and allowed me convince my organziation (the German National Library of Economics) to use Drupal for a Labs website which publishes information about projects for humans and machines. It worked quite well in this use case with the Drupal 7 standard features. The limitations I met were not general reliaility, but rather missing support of nested RDF structures, which I could work arround via custom field templates. This experience, and the strong commitment of the core development team to RDFa made me kind of a Drupal evangelist in the library linked data community.

An approach as suggested by Lin in this post wouldn't have worked for our use case, simply because schema.org does not provide the expressivity we wanted in the descriptions of projects. It wouldn't either work for some other use cases (a thesaurus of eocnomics and a press archive application), which we implemented with "handcrafted" RDFa applications before Drupal 7 was published. I'd hope that Drupal 8 allows us build such applications on a solidly grounded framework with much less effort, letting us combine schema.org and domain specific vocabularies easily.

Having RDFa dropped now in core would indeed feel like a serious backlash.

webchick’s picture

@jneubert Thanks for your perspective as a Drupal user/evangelist in this space. By chance, are you or someone else from your organization able to help tackle some of the issues in #1778226: [META] Fix RDF module, particularly #1778410: Throw exception when RDF namespaces collide (the only D8 release-blocker I'm aware of related to RDF)? I think at least part of the genesis of this issue is the fact that those issues aren't moving in a timely manner due to a lack of contributors to semantic web stuff.

linclark’s picture

If you're already writing your own field formatters to make things work, then this change (to move it from core to contrib) wouldn't have any impact on how it works for you.

linclark’s picture

Off the top of my head, these two would also be release blockers:

  1. #1778194: RDF module can't handle compound fields
  2. #1777688: RDFa output incorrect when not using entity template (Views, Panels, etc) or when render array is altered .... though, TBH, I can't remember if we did disable it completely in Views as scor suggested in his comment
webchick’s picture

If they're actual release blockers, they should be marked as critical so we can track 'em. But make sure the issue summary has a justification for why it must hold up the entire release of Drupal 8 if it's not fixed before then, and why it could not be fixed in a later point release (e.g. 8.0.1 or 8.1.0).

jneubert’s picture

@lin: To be more precise: I could use 10 or so fields just as they are, and had to hack only one field template, where the nested structure was required (still using the RDF Mapping API for the inner level).
@webchick: Unfortunately, I'm not coding on the level which is required for these patches (just trying to learn about the basics of OO-PHP and Symphony). I'd be happy to help testing, however.

jneubert’s picture

Re. #1777688: RDFa output incorrect when not using entity template (Views, Panels, etc) or when render array is altered, particularly views integration: I'd love to see this (because it would allow to publish OAI-ORE aggregations). But if it is not available, that's the state of affairs currently, and maybe I or somebody else may be able to extend RDFa support to this use in the future.

linclark’s picture

Late response@Dave Reid:

/me wonders how this works with Views or more complex things that aren't necessarily "only one primary thing on a page" since this would switch from inline data to one big chunk of data.

We currently don't support inline data very well in Views unless you are using the entity system to render (which most Views I've seen do not). I note this in #1777688: RDFa output incorrect when not using entity template (Views, Panels, etc) or when render array is altered. It would be possible to support it with inline data, but would require a lot more markup than we currently output.

linclark’s picture

Late response to @catch:

With JSON-LD, what happens if a block is rendered in isolation (client side include for example)? Can it extend the main JSON lump with it's own separate bit? Or would we have to merge those? How does it work in general for multiple things rendered on the same page?

There is nothing in JSON-LD that would block us from doing this (it is, in fact, one of the things RDF is really good at).

However, Google implements its own processors and they tend to not support anything more complicated than what they show in their examples... meaning that something which should work based on the standard does not actually work for Google.

Their testing tool is still in beta, and when I tried providing two items with the same ID, it gave me an itemtype error (which it really shouldn't). So it's unclear how that would work.

Also we'd need to ensure that the JSON-LD data that's added during rendering is compatible with render caching - it's got similar problems to drupal_set_*() drupal_add_*() vs. #attached. i.e. if the node is rendered from the render cache, then the JSON-LD should be too - rather than not at all, or from scratch each time.

I think this should be fine. The change frequency of the JSON-LD should be the same as the HTML (or the RDFa) would be as far as I can see.

webchick’s picture

Assigned: scor » Unassigned

scor chimed in, so unassigning. This should ultimately be assigned to Dries for the final decision, but it feels like we aren't quite done with the discussion yet.

jneubert’s picture

Some additional, not primarily technical thoughts - as I hope, not out of scope for this issue: As far as I can see, there seems to exist no specification for the "Google-flavored JSON-LD", besides the short example in the Gmail Actions description. So all bets are off to what extend something similar but slightly different will work in another sphere of the Google world, such as search results. Google seems to provide notoriously inaccurate and/or outdated structured data documentation, which may differ from the results of the rich snippets tool. The latter could provide some orientation, but again has explicitly no binding to what Google indeed will do when preparing search results. In each of these three very loosely coupled areas - actual system behavior, testing/validation tool and specification - things may change without further notice. (And of course other search engines using schema.org will interpret markup slightly different again.)

So an commitment to "Schema.org-focused JSON" sounds to me like a commitment to continuous reverse engineering, with an unknown number of moving parts (because we don't even know if Google will interpret the same JSON structures the same way for different subject areas, e.g., events vs. hotel ratings vs. health information).

A whole lot of SEO experts and companies try to keep up with this. As a developer of a data interface, I wouldn't like to - but that's up to the ones who actually create and maintain the code. However, this also puts a heavy burden on everybody else (outside Google), who just want use the data which is published through this data interface. Jeni Tennison (W3C TAG) has published a thoughtful article about Schema.org and the Responsibility of Monopoly, where she states about the lacking "clarity, detail, and conformance criteria within the schema.org vocabulary specification":

"Without that specificity, we get into a world where Bing, Facebook and any other search engines will spend a lot of time and effort trying to reverse engineer Google behaviour to extract the same data as they do. They might even sometimes manage to introduce useful quirks of interpretation of their own, but that’s unlikely given that their constrained engineering effort will naturally be focused on matching Google. This also forms a massive barrier to entry (as if those weren’t already significant) to potential new search engines. Overall, the lack of specificity suppresses innovation in the market.

And of course publishers, writers and tool creators are left struggling to keep up."

To me, this seems true just as well with regard to an apparently almost unspecified use of JSON syntax, compliant to none of the existing standards, and tightly coupled with and restricted to the schema.org vocabulary. Smaller institutions or companies which want to consume data published by Drupal and other sources trying to follow suit Google's course of action, are in the very same situation, but can put only a vanishingly small fraction of the resources of Bing, Facebook or Yandex on the task of dealing with basically the same situation.

Also for these reasons, I'd plead for carrying on with the commitment to RDFa as widely accepted standard. Just the more as extensive reverse engineering already seems to show that Google deals with RDFa quite well.

Damien Tournoud’s picture

Totally +1.

Trying to mix HTML and machine-readable metadata in the same pipeline has proven to be nearly impossible for everything except the most trivial use cases.

We tried to add support for RDFa in both Addressfield and the Commerce Price Field (both since Drupal 7), and it didn't go very far.

The RDFa serialization introduces a strong dependency between the HTML theming and the machine-readable metadata, so it reduces the flexibility of both.

Kudos to @linclark for the forward thinking here.

jneubert’s picture

If a common feeling among developers should evolve that RDFa in the cases described above is impossible or just too hard to integrate, and/or that it is not acceptable to start with a solution broken in some known places (non-essential in my eyes, but others can judge this better) - please consider to resort to another standard-based solution (such as JSON-LD, which has reached Proposed Recommendation status in November), which

  1. allows to use standard-based tools for parsing, validation, and integration into Linked Data workflows,
  2. supports the parallel use of multiple vocabularies, e.g., schema.org + domain vocabularies, and
  3. is supported by a RDF Mapping API and UI in Drupal core.

This would require change down the chain, but would furthermore support the use of Drupal in Linked Data publishing.

scor’s picture

Quick update: I went through the major issues / blockers and updated them.
- #1778194: RDF module can't handle compound fields and #1778410: Throw exception when RDF namespaces collide can both be closed (see rationales in the respective issues).
- I closed the views issue since the bug was fixed in another issue: #1777688: RDFa output incorrect when not using entity template (Views, Panels, etc) or when render array is altered. Full views support should be left for contrib, core should only support RDFa in the regular entity_view() output.

jessebeach’s picture

I would like to voice a vote for process here. We are 3 months away from an aggressive date to cut a beta.

We are reaching a point in the product lifecycle where launching the whole is more important than perfecting any single sub-system. We should never forget the lessons of Duke Nukem Forever.

It makes me very very nervous to start talking about removing and replacing large sub-systems at this point in the 8.0 release cycle.

I do not want to stifle this conversation. I would like to remove the beta-blocking status and take off the table the proposal to get this very risky change into the initial Drupal 8 release.

webchick’s picture

Issue tags: -beta blocker

Yeah, I agree this is not a beta blocker, at least unless proven otherwise. The most likely scenarios to come out of this issue are either 1) closed (won't fix) because RDF gets enough work done on it, or 2) RDF module gets removed, and json_embed module or whatever happens in contrib, perhaps moved into core in 8.1.x or a later minor release. Either of which could happen anytime between now and RC1, most likely.

xtfer’s picture

Or both (1) and (2), theoretically, however I think this is a step in the right direction.

bkudrle’s picture

Just some more thoughts along the lines of @jneubert...

The incorporation of RDFa and Semantic Web technologies into Drupal 7 was a strong attraction for me. It led to a couple of academic publications that in one sense evangelized the use of Drupal 7 for scientific applications. So to remove it from Drupal 8 core would seem a step backwards IMHO.

Also, in a case with much larger impact, the Structured Dynamics Group has created their Open Semantic Framework with Drupal 7 at its core. I am not familiar enough with the details of this framework, but their website says that Lullabot took over management of one of the sites based on this framework, so that is a good endorsement of the viability of this framework and, by extension, the advantage of Semantic Web technologies in core.

I have not been able to work much on Drupal for a couple of years, but I hope to in the near future with Drupal 8 and just wanted to voice the sentiment that it would be really nice in many ways to keep Drupal's cutting edge concerning the Semantic Web technologies by keeping these technologies in Drupal 8 core.

jhedstrom’s picture

Is this still under consideration for 8.0.x?

pwolanin’s picture

Title: Remove RDF module in favor of Schema.org-focused JSON embedded in the page » Deprecate RDF module in favor of Schema.org-focused JSON embedded in the page
Version: 8.0.x-dev » 8.1.x-dev
Category: Task » Feature request
Status: Active » Postponed

If we are going to do it during the 8.x cycle, we'd need to add another model, while leaving RDF available. This is would be a new feature that might be suitable for 8.1.x

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Berdir’s picture

Status: Postponed » Active

Reviving this in light of #3031710: Remove scor from MAINTAINERS.txt.

As far as I see, there has not been a single commit to rdf.module in the last 3 years that was not about coding standards, deprecations or migrations. The reality is that the module is unmaintained, has been for a long time and IMHO also not meaningful anymore IMHO.

I'm not quite sure how we'd approach this in 8.x, we could add a hook_requirements() error that will tell users that this module will be removed in 9.x and they must either uninstall it or switch to a contrib version. If nobody else steps up, I'm willing to create a contrib project but I have no plans to actually maintain it, but if someone wants to, they can reach out then.

Sites that need the module can then already switch to the contrib version and it will use that instead and the requirements warning will go away.

I don't think that core needs to provide a replacement functionality to be able to replace it, modules like poll were also removed in the past without a 1:1 replacement. There are alternatives in contrib, we recently used https://www.drupal.org/project/schema_metatag successfully.

andypost’s picture

Issue tags: +Drupal 9

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

borisson_’s picture

I'm not quite sure how we'd approach this in 8.x, we could add a hook_requirements() error that will tell users that this module will be removed in 9.x and they must either uninstall it or switch to a contrib version.

I think that a requirements hook in rdf.module seems like the best way to go here. Maybe we should add this to system.module as info as well so that people know that this can happen to their site in 9.x even when they don't have the rdf module installed?

e0ipso’s picture

It seems that there are two topics. One about weather or not to remove RDF, and the other one is trying to come up with a core process to remove a module in the next major release.

All the data here is compelling. I'll give my +1 to move rdf to contrib without a core replacement.

That, of course, will drop feature, so we'll need product owners involved. Also, it would potentially impact end users so we'll need to come up with a process to have them to uninstall or switch to the contrib solution. I'm foggy on this point, but maybe @alexpott has thoughts.

naveenvalecha’s picture

Few questions: I don't see in the deprecation policy. How to deprecate a module in a minor release cycle? It looks like this is a topic of discussion, how to deprecate a module?
+1 to deprecate it in 8.x cycle as there's not any active maintainer of the module and

We can also get the stats of the rdf module from the drupal infra team and add it to the issue. That would also be compelling data to take the decision.

catch’s picture

We already have deprecated modules in core, but we don't have a policy for actually deprecating stable modules as such. See #3013276: [META] Remove deprecated modules on the Drupal 9 branch.

xjm’s picture

So we should talk about whether we want to deprecate the module before 8.8.0-beta1, and if so, if we're comfortable doing it without adding a different feature to core as in the title (because that's unlikely to happen in the next month).

larowlan’s picture

My 2c here

- we actively use the RDF module on production sites to output microdata
- clients still actively ask for microdata

What other options are there for microdata on Drupal 8 at present?

If we remove this from core, is there a {thing} we can point people towards as the 'current best approach'?

Berdir’s picture

We used https://www.drupal.org/project/schema_metatag successfully in a project. https://www.drupal.org/project/json_ld_schema was also mentioned somewhere but I have no personal experience with it.

colan’s picture

Rather than constantly throwing things out of core, and then bringing other things back in, can we perhaps develop some sort of framework for one or more plug-in managers, which sit in core, for these types of things? Plug-ins can then site in contrib, generally. If this is possible, wouldn't it be much more sustainable than the never-ending-story of "What's the best machine-readable thingee to use now?"

Schema.org Metatag looks good, yes, but it assumes schema.org (obviously), which clearly isn't always what folks want.

Besides JSON LD Schema API, there's also JSON-LD REST Services, which does RDF.

Ultimately, it would be nice if I could go to some core admin form, and:

  • Choose one or more data formats, and
  • Choose one or more namespace / vocabulary standards

...depending on which plug-ins are installed.

As this is a bigger architectural issue, it might be better to resolve it first, either by postponing this issue on a new one, or repurposing it.

For what it's worth, I came across this issue while doing research on Exposing Drupal's Taxonomy Data on the Semantic Web. As I'm not the only one to run into these types of issues, it would be fantastic if we could agree on an architecture that helps everyone.

webchick’s picture

That indeed seems like a good idea, if we can figure out a way to make it performant.

Wearing my "product manager" hat, the #1 thing we need to accomplish over the next few months is make the upgrade from Drupal 8 to 9 easy. https://dri.es/making-drupal-upgrades-easy-forever

Given that, I'm not super keen on deprecating/removing anymore stuff in D8 unless for very good reason (e.g. security). Every single one of these changes adds to a growing list of things our end users need to tweak/fiddle with between major versions, and it becomes "death by a thousand cuts" to the point that people say "eff it" and choose to replatform onto something else entirely that requires less tweaking, and less fiddling.

The main reasons given here seem to be that no one's maintaining it (OTOH, it's also not a major source of bugs, either), and that the recommended standards have shifted (which, at least according to one source of data I was able to find: http://webdatacommons.org/structureddata/#results-2018-1 RDFa is definitely not the most prevalent, but is still in fairly widespread use). It also seems to be part of the HTML5 spec: https://www.w3.org/TR/rdfa-in-html/

I dunno, this isn't really my area; it's much more a "frameworky" feature. Just a general plea from the product managers to stay laser focused on making sure D9 is as smooth a trip as possible. 🙏

catch’s picture

@colan #51, that seems tricky, because RDF is rendered directly with different components (partly why it was so difficult to introduce and why supporting it for different new elements in contrib is hard), whereas JSON LD is a blob of data on the page somewhere. i.e. it's not just a different format but a different delivery mechanism too. This might work for some options though but not at all for what we currently have in core.

@webchick #52. While I agree 8.x-9.x smoothness should be the priority, this issue or a previous one previously drifted during the 8.0.x alpha phase because we were trying to get 8.0.x released. So if we don't deprecate in 8.8.x, I think we should try to deprecate quite soon after 9.x is opened, for 10.0.x. That way we won't be trying to discuss this two weeks before 11.0.x is opened.

For me personally, RDF while it's unmaintained, as webchick points out is also pretty stable/harmless, so I think deprecating for 10.0.x is a reasonable option here. I do think given there are equal or better options in contrib that just allowing contrib to provide this is fine. If we do try to do that, the change record should summarise the pros and cons of the different contrib modules to help people choose.

webchick’s picture

If we're discussing deprecating it in Drupal 9 for removal in Drupal 10, that definitely could be on the table.

Then I would personally prefer to see it replaced with something else, versus just removed and booted to contrib. Drupal being able to generate some form of semantic output by default seems desirable as a core feature. This is important for SEO, etc. (If this is not possible, so be it, but that would be my preference.)

I have no idea if JSON is the preferable/modern way of doing this, so would defer to others on that. "Microdata" is the #1 thing on that set of bar graph charts, so if those are the same thing, even better!

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

catch’s picture

So a new issue that has come up, is that easyrdf (which we don't use at runtime, but is a big dependency for RDF module's test coverage) is not PHP 7.4 compatible, and the project looks more or less unmaintained. This means we'll either need to fork it or otherwise refactor RDF module's test coverage in order for it to pass on PHP 7.4.

This is the beginnings of a maintenance burden for RDF which we've not really run into until the past couple of weeks, but should be considered here I think.

larowlan’s picture

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

catch’s picture

Issue summary: View changes

Updated the issue summary a bit, could use more work.

ressa’s picture

I recently added structured metadata to a web site, and after a bit of research found that JSON LD is currently one of the more popular formats, and will most likely become the dominant one. I implemented it with the https://www.drupal.org/project/metatag and https://www.drupal.org/project/schema_metatag modules.

Between 2018 and 2019 (http://webdatacommons.org/structureddata/2019-12/stats/stats.html) JSON LD usage by domains increased with ~1,265,000 whereas RDFa usage decreased with ~343,000:

Usage by domains 2018 to 2019

html-embedded-jsonld  3,835,046 -> 5,100,519
html-rdfa             1,382,497 -> 1,039,623

html-microdata is still number #1, but like RDFa it is embedded within the HTML of the website, which could complicate the addition and removal of structured data. JSON LD on the other hand is a chunk of separate data.

See also Schema.org And Metadata in Drupal.

catch’s picture

According to the stats on #3158669: By default deprecate non-experimental modules that are used by less 5% of sites before the next major version, around a quarter of remaining RDF usages are from Drupal sites, probably because RDF module is enabled by default in the standard profile (i.e. many sites won't have made a conscious decision to use it).

ressa’s picture

That's interesting @catch, but isn't that only the Drupal 8 stats? Looking at #2867597: Top Drupal 7 and Drupal 8 core sub-modules, it seems like RDF is also enabled by default in Drupal 7, so could we add another ~530,000 (80% of 672,250 sites using D7) if we include Drupal 7?

It looks like Drupal 9 also has RDF enabled by default, so a rough estimate could be 80% of the Total number of installs, excluding Drupal 5 and 6, which is 1,075,115 - 32,518 = 1042597. 80% of that result is ~834,000, which means that Drupal 7, 8 and 9 could count for as many as ~834,000 out of the 1,039,623 domains currently using RDF, which is more than 80%.

catch’s picture

@ressa that sounds right!

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

xjm’s picture

Project: Drupal core » Drupal core ideas
Version: 9.2.x-dev »
Component: rdf.module » Idea

Moving this to the Ideas queue for discussion there. (We separate the policy discussion from implementation for these since there are different needs.)

xjm’s picture

Title: Deprecate RDF module in favor of Schema.org-focused JSON embedded in the page » [Policy] Deprecate RDF module in favor of Schema.org-focused JSON embedded in the page
Gábor Hojtsy’s picture

catch’s picture

I asked @webchick if she had more thoughts since #59, and she said that she still has the same opinion (i.e. would prefer replacing rather than removing with no core alternative), but wouldn't block signing off on removal either.

For me personally, our last chance to deprecate RDF for Drupal 10 is in the next few months, whereas we could introduce a new schema.org experimental module at any time between now and during the Drupal 10 cycle, so it would be good to deprecate and point people to contrib alternatives for now, and then if there's a good core candidate (either from contrib or from scratch) I do think it's something useful to have in core since pretty much any public facing site benefits from it.

Gábor Hojtsy’s picture

Reading most of the recent comments, I am not sure how the contrib options are equal or better as per @catch from #53. For example, jsonld says it depends on the core RDF mappings in the first place. So its not much of a replacement as much as a different output format (on a different endpoint?). How would wrapping tha data in a different format solve the problems with compound fields and others?

catch’s picture

How would wrapping tha data in a different format solve the problems with compound fields and others?

I'm not the best person to answer this, but as I understand it the main difference is this:

RDF: Requires markup inline next to the things being represented - so for example an author name might have some RDF markup right next to it on the page, telling the machine what it is. This was very technically challenging when developing RDF, we had to make changes to the render system and every element needs to be compatible.

JSON/everything else: uses a 'blob of JSON' in the header somewhere, which holds the metadata about the page author, along with metadata for other page elements.

bbrala’s picture

To add to that; an simple example of json ld.

<script type="application/ld+json">{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "WebPage",
            "@id": "Cases",
            "author": {
                "@type": "Organization",
                "@id": "swis",
                "name": "SWIS",
                "url": "https://swis.nl",
                "logo": {
                    "@type": "ImageObject",
                    "url": "https://swis.nl/logo.png"
                }
            },
            "publisher": {
                "@type": "Organization",
                "@id": "swis",
                "name": "SWIS",
                "url": "https://swis.nl",
                "logo": {
                    "@type": "ImageObject",
                    "url": "https://swis.nl/logo.png"
                }
            },
            "timestamp": {
                "@type": "BlockchainTransaction",
                "identifier": "376a816ed7cc9f7824b5b37589cac0c757c0a9f36f86ab956aab74c30282cf22",
                "hash": "9b77669513c68b5686a71b2f4a69a965e91e52ab140ec9b8c20111d28ab63ba2",
                "hashLink": "https://www.swis.nl/wordproof/hashinput/90",
                "recordedIn": {
                    "@type": "Blockchain",
                    "name": "eosio_main"
                }
            }
        }
    ]
}</script>
Niklan’s picture

Issue summary: View changes

Added another contrib module for that.

catch’s picture

How would wrapping tha data in a different format solve the problems with compound fields and others?

To expand on #68, here's an example of the extremely close coupling of RDF with the theme layer:

https://api.drupal.org/api/drupal/core%21modules%21rdf%21rdf.module/func...

effulgentsia’s picture

From a framework management perspective, I agree that embedded JSON-LD would be a better implementation for Drupal than RDFa. I don't think JSON-LD was far enough along when we first put RDFa into Drupal core, but now it is.

I share @webchick's concern in #68 about putting core, and therefore Standard profile, into a state where no schema.org output of any kind is output. That does feel like a regression to me as well. However, core doesn't include any SEO modules to begin with, so people who want SEO-friendly sites are already having to get https://www.drupal.org/project/metatag and others from contrib. Given that, having to also get https://www.drupal.org/project/schema_metatag from contrib doesn't seem like a big extra leap.

Originally when we put RDF into core, the thinking was to make the internet better overall by having Drupal sites by default including machine-readable structured data, even for sites who don't proactively want to increase their search engine friendliness. We'd be losing that by punting to contrib, because then only sites motivated by SEO reasons will end up taking the extra step of installing the contrib module.

I realize that Drupal core's RDF module is currently unmaintained, but what's the current state of the problems/risks associated with that? Is it just the dev dependency on #56 that's the biggest risk, or have other significant issues surfaced since then?

I guess my overall feeling is that in terms of framework management, +1 on removing an unmaintained core module that has a solution in contrib that both implements a better technical decision (for 2021, even though that wasn't the better technical decision in 2010) and is better maintained. However, I also don't think those reasons alone are enough to outweigh a desire from product managers to have some schema.org output within Drupal generated pages as a product feature of the Standard profile, unless we have more solid arguments for how the RDF module is creating problems for us.

DamienMcKenna’s picture

+1. This is past due.

I would suggest removing RDF and leaving the space to contrib modules, or at the very least separating the two goals (1. removing RDF. 2. adding modern microdata formatting to core).

Disclaimer: I'm somewhat biased as lead maintainer of Metatag.

catch’s picture

@effulgentsia

unless we have more solid arguments for how the RDF module is creating problems for us.

The main issue from my perspective is the close coupling with the theme layer and the reliance on preprocess, for example https://api.drupal.org/api/drupal/core%21modules%21rdf%21rdf.module/func...

We have longstanding (although not very active) issues to massively reduce our reliance on preprocess, for example #2702061: Unify & simplify render & theme system: component-based rendering (enables pattern library, style guides, interface previews, client-side re-rendering).

Switching to a format which is 'single blob of data somewhere on the page' as opposed to 'lots of little bits of data intertwined with HTML' immediately removes all of that need for preprocess.

effulgentsia’s picture

Thanks! Yeah, I can see how #76 is compelling.

The biggest barrier that I see for existing RDF module users switching to https://www.drupal.org/project/schema_metatag, other than having to find it and composer require it, is that that module doesn't come with default config (and arguably shouldn't), which means when you first enable it, you don't get any JSON-LD output at all, and have to explicitly add the metatags that you want and populate them with the correct tokens for where that data is in the site. Core's rdf module doesn't include default config either, but the Standard profile includes default rdf mappings, so the user doesn't need to do any work to have their site outputting structured data.

I wonder if a prerequisite for removing RDF module from core should be to create a contrib module that provides default config for schema_metatag that outputs (approximately) equivalent data that you currently get with Standard profile's RDF config. That way, the migration instruction for existing D9 sites (that didn't customize their RDF mappings, which is probably the vast majority of them) could just be to install that module.

I don't know enough about Metatag module to know how easy or hard this would be. For example, the metatag.metatag_defaults.node config object might already exist on the site (for other, not Schema.org, metatags), and I don't know if schema_metatag's metatags would need to be merged into that config object or if such a module could provide its own config objects that don't conflict with existing metatag_defaults ones.

DamienMcKenna’s picture

There's an architectural gap in Metatag around assigning new default values when a submodule is enabled with new meta tags, right now it isn't technically (easily?) possible, but maybe it's something to look at in #2826669.

webchick’s picture

Apparently one of the reasons this isn't moving forward is because I voiced strong opinions back in the day about this. I no longer have strong opinions about this (or indeed, most Drupal things) these days. :)

Less flippantly, unlike with something like Multisite, where there is always end-user pushback whenever we propose removing it, this issue has been around for almost a decade, and has seen no similar pushback. And it's a fair point that anyone who cares about SEO has to go and download several contrib modules today anyway (sad panda).

So if the rest of the reasons to do this are valid, no need to hold it up on my account.

effulgentsia’s picture

Taking one tag off per #79.

For framework manager review, I'm +1 for moving the RDF module as-is from core to contrib, so if that's what we want to make this issue's scope, I'd remove that tag. For the current issue title, I'm not confident in that due to #77. However, I would not be opposed to another framework manager removing that tag if they don't think #77 needs to be a blocker to us recommending schema_metatag as the preferred replacement.

catch’s picture

Title: [Policy] Deprecate RDF module in favor of Schema.org-focused JSON embedded in the page » [Policy] Deprecate RDF module and move it to contrib
Issue tags: -Needs framework manager review

For deprecating from core, providing the same RDF module in contrib should be enough - that allows people to maintain their current site configuration without any changes.

However I think it's also a good idea in the change record to point out schema metatag (and other modules) exist too.

quietone’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update
catch’s picture

I think all the release managers are agreed on moving RDF to contrib, so removing that tag.

We need a core implementation issue(s) next, probably one issue to isolate RDF support to the module, one to deprecate in 9.4.x, and one to remove in 10.0.x

effulgentsia’s picture

Status: Active » Reviewed & tested by the community

I think #83 makes this policy issue RTBC, so doing so for visibility. To mark it fixed, we probably need to open the implementation issue per #83. Do we also want to communicate this decision in some way, or wait until if and when the implementation issues are done before doing that?

catch’s picture

I think we need someone to volunteer to create the contrib project before we can do the core implementation issues. Communication can probably happen as the implementations are landing or even afterwards.

effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes

Updated IS for #85 and other cleanup.

effulgentsia’s picture

Issue summary: View changes
quietone’s picture

Added child issue for tracking the move of RDF from core to contrib, #3267267: [Meta] Tasks to deprecate RDF

quietone’s picture

As for implementation there is also #3273976: [Meta] Tasks to remove RDF from core and move to contrib and the issue to approve RDF maintainers for the contrib version #3304913: Offering to co-maintain RDF.

There is nothing left to do in the remaining issues.

quietone’s picture

Status: Reviewed & tested by the community » Fixed

This work has been completed in core.

Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.