Problem/Motivation

One of the main reasons to adopt Media Entity in core was the need to be able to work with remote media (especially videos). We need to provide a base implementation of this in core Media that supports the most popular online video providers, ideally in a very extensible way.

Proposed resolution

Implement support for oEmbed in the Media module, supporting a subset of popular video providers by default (specifically, Vimeo and YouTube). Supporting oEmbed generally will give us, and contrib, the latitude to easily support other remote media assets in the future (e.g., Twitter, Instagram, other video providers).

Security considerations

From https://oembed.com/#section3

When a consumer displays HTML (as with video embeds), there's a vector for XSS attacks from the provider. To avoid this, it is recommended that consumers display the HTML in an iframe, hosted from another domain. This ensures that the HTML cannot access cookies from the consumer domain.

Drupal does not require multiple domains so it is not possible for oEmbed video fields to be secure out-of-the-box. Since #2522002: Do not strip www. from cookie domain by default because that leaks session cookies to subdomains has been committed to Drupal 8 it is possible to use a subdomain, for example: video.example.com, as long as example.com is redirected to www.example.com. We need make it simple to configure a different domain to serve the oEmbed content from and we need to have sufficient security warnings to inform users that their configuration is not secure.

Additionally we need to ensure that content creators can not enter arbitrary URLs. A content creator should only be able to link to sources from a list configured by someone with a trusted permission. The defaults for this configuration need to be sensible and restrictive. Plus the configuration field needs documentation about the security implications of adding sources.

(related comments: #139 #137 #132)

Remaining tasks

  • decide between specific provider or oEmbed based solution
  • implement plugin and extensive tests

CommentFileSizeAuthor
#311 2831944-3-311.patch166.44 KBalexpott
#311 310-311-interdiff.txt12.28 KBalexpott
#310 2831944-3-310.patch166.67 KBalexpott
#310 309-310-interdiff.txt3.21 KBalexpott
#309 2831944-3-309.patch165.59 KBalexpott
#309 306-309-interdiff.txt3.06 KBalexpott
#306 interdiff-2831944-305-306.txt32.21 KBphenaproxima
#306 2831944-306.patch164.51 KBphenaproxima
#305 2831944-3-305.patch160.76 KBalexpott
#305 299-305-interdiff.txt19.22 KBalexpott
#299 2831944-2-299.patch157.4 KBalexpott
#299 294-299-interdiff.txt1016 bytesalexpott
#294 2831944-2-294.patch157.31 KBalexpott
#294 293-294-interdiff.txt11.18 KBalexpott
#293 interdiff-2831944-291-293.txt2.64 KBphenaproxima
#293 2831944-293.patch157.57 KBphenaproxima
#291 interdiff-2831944-289-291.txt6.11 KBphenaproxima
#291 2831944-291.patch157.48 KBphenaproxima
#289 interdiff-2831944-287-289.txt2.49 KBphenaproxima
#289 2831944-289.patch153.94 KBphenaproxima
#287 interdiff-2831944-281-287.txt8.46 KBphenaproxima
#287 2831944-287.patch154.01 KBphenaproxima
#281 interdiff-2831944-278-281.txt2.06 KBphenaproxima
#281 2831944-281.patch153.57 KBphenaproxima
#278 interdiff-2831944-275-278.txt3.55 KBphenaproxima
#278 2831944-278.patch153.57 KBphenaproxima
#275 interdiff-2831944-273-275.txt11.41 KBphenaproxima
#275 2831944-275.patch149.91 KBphenaproxima
#273 interdiff-2831944-273-fix.txt1.01 KBphenaproxima
#273 interdiff-2831944-263-273.txt3.31 KBphenaproxima
#273 2831944-273.patch150.69 KBphenaproxima
#273 2831944-273-FAIL.patch150.66 KBphenaproxima
#263 interdiff-2831944-258-263.txt5.47 KBphenaproxima
#263 2831944-263.patch149.82 KBphenaproxima
#258 interdiff-2831944-257-258.txt1.54 KBphenaproxima
#258 2831944-258.patch149.64 KBphenaproxima
#257 interdiff-2831944-254-257.txt7.78 KBphenaproxima
#257 2831944-257.patch149.31 KBphenaproxima
#254 interdiff-2831944-251-254.txt588 bytesphenaproxima
#254 2831944-254.patch148.16 KBphenaproxima
#251 interdiff-2831944-250-251.txt2.1 KBphenaproxima
#251 2831944-251.patch148.17 KBphenaproxima
#250 interdiff-2831944-244-250.txt25.28 KBphenaproxima
#250 2831944-250.patch148.01 KBphenaproxima
#244 interdiff-2831944-240-244.txt29.24 KBphenaproxima
#244 2831944-244.patch145.14 KBphenaproxima
#240 2831944-240.patch139.24 KBphenaproxima
#239 interdiff-2831944-235-239.txt24.66 KBphenaproxima
#239 2831944-239.patch138.43 KBphenaproxima
#235 interdiff-2831944-234-235.txt1.96 KBphenaproxima
#235 2831944-235.patch137.73 KBphenaproxima
#234 interdiff-2831944-232-234.txt3.97 KBchr.fritsch
#234 2831944-234.patch137.73 KBchr.fritsch
#232 interdiff-2831944-231-232.txt4.71 KBchr.fritsch
#232 2831944-232.patch137.5 KBchr.fritsch
#231 interdiff-2831944-228-231.txt22.97 KBchr.fritsch
#231 2831944-231.patch136.05 KBchr.fritsch
#228 interdiff-2831944-221-228.txt6.22 KBphenaproxima
#228 2831944-228.patch136.79 KBphenaproxima
#221 interdiff-2831944-217-221.txt1.83 KBphenaproxima
#221 2831944-221.patch136.62 KBphenaproxima
#217 interdiff-2831944-216-217.txt3.13 KBphenaproxima
#217 2831944-217.patch136.28 KBphenaproxima
#216 interdiff-2831944-214-216.txt1.65 KBphenaproxima
#216 2831944-216.patch133.67 KBphenaproxima
#214 interdiff-2831944-212-214.txt2.47 KBphenaproxima
#214 2831944-214.patch132.29 KBphenaproxima
#212 interdiff-2831944-211-212.txt9.18 KBphenaproxima
#212 2831944-212.patch130.77 KBphenaproxima
#211 interdiff-2831944-207-210.txt4.37 KBchr.fritsch
#211 2831944-210.patch130.44 KBchr.fritsch
#207 interdiff-2831944-205-207.txt6.85 KBchr.fritsch
#207 2831944-207.patch130.62 KBchr.fritsch
#205 interdiff-2831944-202-204.txt10.88 KBphenaproxima
#205 2831944-204.patch130.62 KBphenaproxima
#202 interdiff-2831944-200-202.txt30.91 KBphenaproxima
#202 2831944-202.patch128.01 KBphenaproxima
#200 interdiff-2831944-198-200.txt1.86 KBchr.fritsch
#200 2831944-200.patch122.32 KBchr.fritsch
#198 2831944-198.patch121.71 KBchr.fritsch
#198 interdiff-2831944-195-198.txt5.12 KBchr.fritsch
#196 Screen Shot 2018-04-23 at 8.44.48 PM.png231.41 KBrobpowell
#196 2831944.mp41.09 MBrobpowell
#195 interdiff-2831944-192-195.txt739 byteschr.fritsch
#195 2831944-195.patch121.52 KBchr.fritsch
#192 interdiff-2831944-190-192.txt12.41 KBchr.fritsch
#192 2831944-192.patch121.48 KBchr.fritsch
#190 interdiff-2831944-185-190.txt2.62 KBphenaproxima
#190 2831944-190.patch121.22 KBphenaproxima
#185 interdiff-2831944-184-185.txt4.78 KBphenaproxima
#185 2831944-185.patch120.71 KBphenaproxima
#184 interdiff-2831944-183-184.txt18.67 KBphenaproxima
#184 2831944-184.patch119.64 KBphenaproxima
#183 interdiff-2831944-179-183.txt1.29 KBchr.fritsch
#183 2831944-183.patch118.08 KBchr.fritsch
#179 interdiff-2831944-177-179.txt4.11 KBchr.fritsch
#179 2831944-179.patch118.03 KBchr.fritsch
#177 interdiff-2831944-172-177.txt25.43 KBchr.fritsch
#177 2831944-177.patch118.01 KBchr.fritsch
#172 interdiff-2831944-169-172.txt6.86 KBchr.fritsch
#172 2831944-172.patch113.66 KBchr.fritsch
#169 interdiff-2831944-167-169.txt8.64 KBchr.fritsch
#169 2831944-169.patch113.62 KBchr.fritsch
#167 interdiff-2831944-163-167.txt5.85 KBchr.fritsch
#167 2831944-167.patch110.76 KBchr.fritsch
#163 interdiff-2831944-162-163.txt14.86 KBphenaproxima
#163 2831944-163.patch110 KBphenaproxima
#162 interdiff-2831944-159-162.txt26.98 KBchr.fritsch
#162 2831944-162.patch109.83 KBchr.fritsch
#159 interdiff-2831944-154-159.txt51.21 KBphenaproxima
#159 2831944-159.patch105.75 KBphenaproxima
#154 interdiff-2831944-153-154.txt5.24 KBphenaproxima
#154 2831944-154.patch86.46 KBphenaproxima
#153 interdiff-2831944-150-153.txt63.71 KBphenaproxima
#153 2831944-153.patch86.27 KBphenaproxima
#150 interdiff-2831944-147-150.txt12.4 KBchr.fritsch
#150 2831944-150.patch86.47 KBchr.fritsch
#148 interdiff-2831944-142-147.txt1.06 KBchr.fritsch
#147 2831944-147.patch81.55 KBchr.fritsch
#147 2831944-147.patch81.55 KBchr.fritsch
#142 interdiff-2831944-136-142.txt6.59 KBchr.fritsch
#142 2831944-142.patch81.07 KBchr.fritsch
#136 interdiff-2831944-134-136.txt9.11 KBchr.fritsch
#136 2831944-136.patch80.36 KBchr.fritsch
#134 interdiff-2831944-128-134.txt13.41 KBchr.fritsch
#134 2831944-134.patch79.04 KBchr.fritsch
#131 iframe-approach.txt9.36 KBchr.fritsch
#123 interdiff-2831944-120-123.txt3.08 KBchr.fritsch
#123 2831944-123.patch74.39 KBchr.fritsch
#121 oEmbed_video.gif1.62 MBchr.fritsch
#120 interdiff-2831944-119-120.txt9.38 KBchr.fritsch
#120 2831944-120.patch74.49 KBchr.fritsch
#119 interdiff-2831944-116-119.txt12.18 KBchr.fritsch
#119 2831944-119.patch73.79 KBchr.fritsch
#116 interdiff-2831944-115-116.txt18.8 KBchr.fritsch
#116 2831944-116.patch73.74 KBchr.fritsch
#115 interdiff-2831944-113-115.txt12.25 KBchr.fritsch
#115 2831944-115.patch78.33 KBchr.fritsch
#113 interdiff-2831944-103-113.txt45.27 KBchr.fritsch
#113 2831944-113.patch76.39 KBchr.fritsch
#103 interdiff-2831944-98-103.txt29.33 KBchr.fritsch
#103 2831944-103.patch73.56 KBchr.fritsch
#98 interdiff-2831944-96-98.txt5.36 KBchr.fritsch
#98 2831944-98.patch71.49 KBchr.fritsch
#96 interdiff-2831944-92-95.txt7.31 KBchr.fritsch
#96 2831944-95.patch71.67 KBchr.fritsch
#92 interdiff-2831944-91-92.txt1.8 KBchr.fritsch
#92 2831944-92.patch72.53 KBchr.fritsch
#91 interdiff-2831944-87-91.txt22.6 KBchr.fritsch
#91 2831944-91.patch72.53 KBchr.fritsch
#87 interdiff-2831944-73-87.txt14.68 KBchr.fritsch
#87 2831944-87.patch73.07 KBchr.fritsch
#73 2831944-73.patch68.41 KBchr.fritsch
#71 2831944-71.patch70.11 KBchr.fritsch
#68 interdiff-2831944-67-68.txt5.07 KBchr.fritsch
#68 2831944-68.patch69.97 KBchr.fritsch
#67 interdiff-2831944-65-67.txt6.25 KBchr.fritsch
#67 2831944-67.patch68.18 KBchr.fritsch
#65 interdiff-2831944-64-65.txt3.13 KBchr.fritsch
#65 2831944-65.patch66.95 KBchr.fritsch
#64 interdiff-2831944-63-64.txt3.27 KBchr.fritsch
#64 2831944-64.patch67.52 KBchr.fritsch
#63 interdiff-60-63.txt13.82 KBseanB
#63 2831944-63.patch67 KBseanB
#60 interdiff-2831944-58-60.txt15.69 KBchr.fritsch
#60 2831944-60.patch67.34 KBchr.fritsch
#58 interdiff-2831944-51-58.txt13.38 KBchr.fritsch
#58 2831944-58.patch64.87 KBchr.fritsch
#52 interdiff-2831944-42-51.txt16.4 KBchr.fritsch
#52 2831944-51.patch64.98 KBchr.fritsch
#42 interdiff-2831944-40-42.txt6.49 KBchr.fritsch
#42 2831944-42.patch61.02 KBchr.fritsch
#40 interdiff-2831944-37-40.txt53.59 KBphenaproxima
#40 2831944-40.patch59.9 KBphenaproxima
#37 interdiff-2831944-36-37.txt11.77 KBchr.fritsch
#37 2831944-37.patch59.69 KBchr.fritsch
#36 interdiff-2831944-34-36.txt20.25 KBchr.fritsch
#36 2831944-36.patch55.71 KBchr.fritsch
#34 interdiff-2831944-32-34.txt2.64 KBchr.fritsch
#34 2831944-34.patch55.66 KBchr.fritsch
#32 interdiff-2831944-31-32.txt11.14 KBchr.fritsch
#32 2831944-32.patch57.83 KBchr.fritsch
#31 interdiff-2831944-30-31.txt12.01 KBchr.fritsch
#31 2831944-31.patch53.33 KBchr.fritsch
#30 interdiff-2831944-26-30.txt38.52 KBchr.fritsch
#30 2831944-30.patch50.04 KBchr.fritsch
#26 interdiff-2831944-10-26.txt6.04 KBchr.fritsch
#26 2831944-26.patch36.04 KBchr.fritsch
#17 youtube_media_type.gif1.59 MBslashrsm
#10 youtube_media_type1.mp42.64 MBmarcoscano
#10 interdiff-4-10.txt40.36 KBmarcoscano
#10 2831944-10.patch35.92 KBmarcoscano
#4 2831944_4.patch30.16 KBslashrsm
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

slashrsm created an issue. See original summary.

seanB’s picture

The oEmbed option seems like the best approach to me, and delivers a whole bunch of video providers. It seems like a nice feature to be able to configure the list of supported providers, so maybe by default allow youtube and vimeo and add a settings page to allow extra providers?

tobiasb’s picture

Hint: Perhaps useful lib https://github.com/essence/essence

slashrsm’s picture

Status: Active » Needs work
FileSize
30.16 KB

WIP proof of concept. Attached patch adds a media handler for oEmbed and some configuration to be easy to test. There is also a formatter that will display the resource using HTML code returned by the oEmbed provider.

We need to decide if we want to implement oEmbed integration by ourselves or use an external library.

I'd also like to add ability to limit to certain providers, but this is a bit tricky since the official database of providers isn't really reliable. We could achieve that by offering site-builders to enter URL wildcards/regular expressions, but that isn't the best UX. Any thoughts about that would be welcome.

Obviously this depends on #2831274: Bring Media entity module to core as Media module.

tedbow’s picture

One thing about just making oEmbed formatters for the existing link field type is that now all existing link fields will have this option. I think this will be very confusing to site builders.

Should we make another field type oEmbed Link that just extends the link field type? Then we could limit the oEmbed link formatters to this oEmbed Link field type and maybe update existing link formatters to work with it too. That would also allow use to remove options that don't make sense. Internal links for example. Would oEmbed be used for internal links. I know it would possible but seems very edge case.

Gábor Hojtsy’s picture

I agree it sounds like a UX improvement if we can remove unnecessary extra options.

slashrsm’s picture

I'd rather use isApplicable() to solve that. Adding a bunch of new field types will also create confusion and UX problems (speaking generally, not just related to this issue).

The reason I didn't do that when initially working on this patch is that I've been trying to get PoC through the door before having to leave the sprint on the last day.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Dave Reid’s picture

I don't think there's an actual reason to have a separate Link field type. I think site builders would be confused at that separation, and in my experience are already good at just creating 'LInks' fields. FYI the URL Embed module already provides this formatter for link field types.

marcoscano’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests, +Media Initiative
FileSize
35.92 KB
40.36 KB
2.64 MB

OK So let's see if we can move this forward.

In this patch:
- Re-rolled #4
- Made all modifications necessary to adapt it to the API changes that ended up happening in #2831274: Bring Media entity module to core as Media module
- Used ::isApplicable to only expose the oEmbed formatter to fields from media entities
- Moved default config entities from standard profile to the media module, along with some fine-tuning in the default display settings.
- Finished implementing a first version of the allowed_providers configuration setting
- Fixed some minor bugs, documentation and CS improvements

Remaining tasks:
- Validate the main approach (for example, decide if we want an external library to handle the oEmbed integration, etc)
- Decide what to do with the "Allowed Providers" validation. Due to the inconsistency of the oEmbed database, this validation will probably give false failures quite often. We can try to add workarounds for each specific case (an example is implemented in this patch for the YouTube provider), bit it's ugly and hard to maintain. Should we just skip this functionality entirely and always allow all providers?
- Define a default thumbnail icon (for now I'm using the "no-thumbnail.png" icon)
- Write tests

A quick demo of current 8.4.x HEAD with this patch applied:
youtube_media_type1.mp4

seanB’s picture

Just watched the video, this looks AMAZING! Only wondering if Youtube should be called Video (with only the youtube provider enabled). This way you can easily add other video provider without having to add a new media type for each provider.

Will look at the code later, but marcoscano thanks for picking this up!

marcoscano’s picture

Status: Needs work » Needs review

testbot: I know I know.. let's validate the approach first :)

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

martin107’s picture

I like the idea behind this issue.

everyone_who_has_worked_on_this++

I see from the remaining tasks in #10 ... there are still things to decide

So while I am itching to fix this ... i will just post a code review comment as things might change radically.

This last patch introduces new code with calls to drupal_set_message().... this has become outdated.

New code should use the messenger service see-

#2278383: Create an injectible service for drupal_set_message()

slashrsm’s picture

Issue summary: View changes
FileSize
1.59 MB

Converted screencast into animated gif.

slashrsm’s picture

I started gathering info about 3rd party oEmbed libraries: https://docs.google.com/document/d/111zSRYcnOLz9qlJ1RiP2ZzYlCeMPm1DhieX-...

We can use this info to eventually decide whether to use one of them or to implement our own.

vijaycs85’s picture

Only wondering if Youtube should be called Video (with only the youtube provider enabled). This way you can easily add other video provider without having to add a new media type for each provider.

+1 on that!

chr.fritsch’s picture

Component: file system » media system
phenaproxima’s picture

Issue tags: -D8Media

Re-tagging.

phenaproxima’s picture

Status: Needs review » Needs work

This patch is pretty nice, and a big selling point for Media in core. Great work so far.

I found a relatively minor suite of issues -- nothing major -- but my overarching concern is that I find the OEmbed service, and its interface, kind of confusing and a little haphazard. Granted, I don't know how oEmbed works, and I'm unfamiliar with all the terminology so maybe my feeling is totally invalid. However, I wonder if we can't maybe tighten up that API somewhat and do a better job of sealing the abstraction.

  1. +++ b/core/modules/media/config/install/media.type.youtube.yml
    @@ -0,0 +1,13 @@
    +description: "Use thie media type to store remote content such as youtube videos."
    

    Nit: "youtube" should be "YouTube". Also, language like "...such as YouTube videos" implies that maybe the name and label of this media type should not be YouTube. How about "Remote video"?

  2. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +  /**
    +   * Cache key to be used to store providers info into cache.
    +   *
    +   * @var string
    +   */
    +  protected static $providersCacheKey = 'media:oembed:providers';
    +
    +  /**
    +   * URL of the JSON with providers info.
    +   *
    +   * @var string
    +   */
    +  protected static $providersUrl = 'http://oembed.com/providers.json';
    +
    +  /**
    +   * Cache lifetime for the providers info.
    +   */
    +  protected static $providersCacheLifetime = 60 * 60 * 24 * 7;
    

    These seem like they should be constants, not properties.

  3. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    if (!empty($this->providers)) {
    

    $this->providers could (theoretically) be an empty array, so this should be if (!isset($this->providers))

  4. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $response = $this->clientFactory->fromOptions()->get(static::$providersUrl);
    +      if ($response->getStatusCode() !== 200) {
    +        drupal_set_message($this->t('Could not retrieve the providers list from the remote oEmbed database.'), 'error');
    +        $this->logger->error('Remote oEmbed providers database returned status code @code.', [
    +          '@code' => $response->getStatusCode(),
    +        ]);
    +        return FALSE;
    +      }
    

    Won't Guzzle normally throw an exception on a non-200 status code? If so, can we catch that exception and log it instead? It might give more detailed information that would be helpful for troubleshooting.

  5. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $providers = \GuzzleHttp\json_decode($response->getBody()->getContents(), TRUE);
    

    Namespaced functions are, if I remember correctly, specific to PHP 5.6, so we would be violating Drupal's minimum PHP version requirement by doing this. Can we just use the Drupal\Component\Serialization\Json class to decode the response?

  6. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +        $keyed_providers[bin2hex($provider['provider_name'])] = $provider;
    

    Whoa.

  7. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      // @TODO Figure out a way of making this validation more robust. Youtube,
    +      // for instance, does not come with schemes...
    +      if ($provider_info['provider_name'] === 'YouTube') {
    +        $provider_info['endpoints'][0]['schemes'] = [
    +          'http*://*youtube.com/*',
    +          'http*://*youtu.be/*',
    +        ];
    +      }
    

    Can we possibly store this kind of info in config, say, and make use of that? Hard-coding it is icky-sauce.

  8. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    $response = $this->clientFactory->fromOptions()->get($url);
    +    if ($response->getStatusCode() !== 200) {
    +      drupal_set_message($this->t('Could not retrieve the remote URL correctly.'), 'error');
    +      $this->logger->error('Resource for URL @url responded with status code @code.', [
    +        '@url' => $url,
    +        '@code' => $response->getStatusCode(),
    +      ]);
    +      return FALSE;
    +    }
    

    Again, would it be preferable here to take advantage of HTTP exceptions thrown by Guzzle?

  9. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    $dom = new \DOMDocument();
    +
    +    // TODO this causes annoying warning with YT videos. Fix it:
    +    // Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity
    +    if (!@$dom->loadHTML($content)) {
    +      drupal_set_message($this->t('The remote URL returned an incorrect document structure.'), 'error');
    +      return FALSE;
    +    }
    +
    +    $xpath = new \DOMXpath($dom);
    +    $result = $xpath->query("//link[@type='application/json+oembed']");
    +    if ($result->length > 0) {
    +      $this->discovered[$url] = $result->item(0)->getAttribute('href');
    +      return $this->discovered[$url];
    +    }
    +
    +    $result = $xpath->query("//link[@type='text/xml+oembed']");
    +    if ($result->length > 0) {
    +      $this->discovered[$url] = $result->item(0)->getAttribute('href');
    +      return $this->discovered[$url];
    +    }
    

    This seems potentially quite expensive. Can we cache it?

  10. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    // TODO handle providers that do not support discovery.
    

    Should we open a follow-up?

  11. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    $response = $this->clientFactory->fromOptions()->get($endpoint_url);
    +    if ($response->getStatusCode() !== 200) {
    +      drupal_set_message($this->t('Could not retrieve the remote URL.'), 'error');
    +      $this->logger->error('Remote resource returned status code @code.', [
    +        '@code' => $response->getStatusCode(),
    +      ]);
    +      return FALSE;
    +    }
    

    Should we leverage Guzzle HTTP exceptions here?

  12. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $data = \GuzzleHttp\json_decode($content, TRUE);
    

    Once again, not sure we should be using namespaced functions instead of \Drupal\Component\Serialization\Json.

  13. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +   * Returns a multi-dimensional array where each provider is represented as a
    +   * single top-level array element. Each individual element is keyed by the
    +   * provider's name codified with bin2hex(), and has the following values:
    +   * - provider_name: Human readable name of the provider.
    +   * - provider_url: Main URL of the provider.
    +   * - endpoints: List of endpoints this provider exposes. Each endpoint is an
    +   *   array with the following values:
    +   *     - url: The endpoint's URL.
    +   *     - schemes: List of URL schemes supported by the provider.
    +   *     - formats: List of supported formats. Can be "json", "xml" or both.
    +   *     - discovery: Whether the provider supports oEmbed discovery.
    

    This smells like an ArrayPI. I kinda wish we just had a domain object to encapsulate all of this stuff. (One possible advantage of which is that such an object could implement CacheableDependencyInterface, for advanced caching possibilities...)

  14. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +   *   (optional) A list of providers to restrict to. If present, this needs to
    +   *   be an array where keys are the provider names (codified with bin2hex()),
    +   *   and the values are irrelevant. If ommitted, all providers will be
    

    Wait -- the values are irrelevant? Why not simply pass a simple array of provider names, then?

  15. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,103 @@
    +    // This formatter is only available for fields attached to media items.
    

    Um...why?

  16. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,272 @@
    +    if ($local_image = $this->getMetadata($media, 'thumbnail_local')) {
    +      return $local_image;
    +    }
    +    return $this->getMetadata($media, 'thumbnail_uri');
    

    Nit: This could be condensed to:

    return $this->getMetadata($media, 'thumbnail_local') ?: $this->getMetadata($media, 'thumbnail_uri')

phenaproxima’s picture

I'm picking this issue back up. I'd like to land it in 8.5.0; it'll be a highlight of the Media Initiative's progress.

What I think we are blocked on, at the moment, is two things:

  1. Do we want to use a third-party library to interact with oEmbed providers, or roll our own code?
  2. If we want a third-party library, we need sign-off from Dries.

So let's open the bike shed and hear from media folk. Would you prefer a third-party library, or rolling our own?

My preference is to use a third-party library. PIE FTW. Lower maintenance burden for us. The library (whichever one we choose) will be doing one thing, and doing it well. That is as it should be.

Let's hear from others.

chr.fritsch’s picture

I also prefer a third-party lib. For the same reason like @phenaproxima. Less code to write and maintain for us.

I think embed is most widely used and best maintained.

seanB’s picture

The libraries posted in the Google doc in #18 seem to be well supported and I don't think it would be a good idea to implement and support our own implementation if we don't have to. So big +1 to using a existing library.

We could debate on which one to choose, I don't have a strong preference at the moment, but I lean towards https://github.com/oscarotero/Embed since that seems to have the biggest community support (more contributors, installs, commits etc).

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
36.04 KB
6.04 KB

I moved the config files into the standard profile and renamed the youtube type into video.

chr.fritsch’s picture

I looked a bit deeper into the proposed libraries and now i am not 100% convinced anymore that it's a good idea to use one of them, because they are not fitting well into our needs.
Some of them are just using static functions and all of them are not providing a method to get the available providers.

Beside that, the stuff @slashrsm already implemented is super cool and works quite good.

phenaproxima’s picture

Status: Needs review » Needs work

I have some architectural concerns about this patch. I also find the documentation sketchy; there's no sense of what a "resource" or "provider" is, so I think that will need to be improved. Also, tests!

  1. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +  protected static $providersUrl = 'http://oembed.com/providers.json';
    

    Should this be https? Also, should we maybe make this a constructor parameter so it can be overridden by a service provider (and be more amenable to unit testing)?

  2. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +  /**
    +   * Static cache of fetched oEmbed resources.
    +   *
    +   * @var array
    +   */
    +  protected $resources;
    +
    +  /**
    +   * Static cache of discovered oEmbed resources.
    +   *
    +   * @var array
    +   */
    +  protected $discovered;
    

    What's the difference between a "fetched" and "discovered" oEmbed resource? Can the doc comments here explain that distinction?

  3. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +  public function __construct(ClientFactory $client_factory, CacheBackendInterface $cache, LoggerChannelFactoryInterface $logger_factory) {
    

    This service should use UseCacheBackendTrait, which means the cache backend can be optional.

  4. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    if (!empty($this->providers)) {
    +      return $this->providers;
    +    }
    

    $this->provider could theoretically be an empty array, so this should be changed to a if (isset($this->providers)) check instead.

  5. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    elseif ($data = $this->cache->get(static::$providersCacheKey)) {
    

    Why does the cache key need to be a static property?

  6. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $response = $this->clientFactory->fromOptions()->get(static::$providersUrl);
    

    Won't get() throw an exception if the response code is not 2xx? If so, we should catch that instead so we can potentially log more detailed error information.

  7. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $providers = \GuzzleHttp\json_decode($response->getBody()->getContents(), TRUE);
    

    Function namespacing is a PHP 5.6+ thing. We should use \Drupal\Component\Serialization\Json::decode() here instead.

  8. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      // Some provider names may contain dot chars ("."), which are not allowed
    +      // in config keys. We store them in an array where keys are hex-converted
    +      // names, in order to allow an easy conversion back to their original
    +      // names when necessary.
    

    Can there be an example here, for clarity?

  9. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $this->cache->set(self::$providersCacheKey, $keyed_providers, static::$providersCacheLifetime);
    

    The cache key and cache lifetime probably do not need to be static properties.

  10. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      if (!empty($provider_info['endpoints'])) {
    +        foreach ($provider_info['endpoints'] as $endpoint) {
    +          if (!empty($endpoint['schemes'])) {
    +            foreach ($endpoint['schemes'] as $scheme) {
    +              $regexp = str_replace(['.', '*'], ['\.', '.*'], $scheme);
    +              if (preg_match("|$regexp|", $url)) {
    +                return [
    +                  'provider' => $provider_name,
    +                  'endpoint' => $endpoint['url'],
    +                ];
    +              }
    +            }
    +          }
    +        }
    +      }
    

    This definitely needs a comment to explain what's going on here.

  11. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    $dom = new \DOMDocument();
    +
    +    // TODO this causes annoying warning with YT videos. Fix it:
    +    // Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity
    +    if (!@$dom->loadHTML($content)) {
    

    I believe \Drupal\Component\Utility\Html has some static methods that may prove helpful here.

  12. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    // TODO handle providers that do not support discovery.
    +
    +    return FALSE;
    

    Needs a follow-up issue or something.

  13. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +      $data = \GuzzleHttp\json_decode($content, TRUE);
    

    Let's use \Drupal\Component\Serialization\Json::decode() instead.

  14. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,286 @@
    +    elseif ($format[0] == 'text/xml') {
    +      $data = \GuzzleHttp\json_decode(\GuzzleHttp\json_encode($content), TRUE);
    +
    +      if (!is_array($data)) {
    +        drupal_set_message($this->t('An error occurred while trying to retrieve the remote URL content.'), 'error');
    +        $this->logger->error('Remote resource returned incorrect response content. Data: @response', [
    +          '@response' => json_encode($data),
    +        ]);
    +        return FALSE;
    +      }
    +
    +      $this->resources[$endpoint_url] = $data;
    +      return $data;
    +    }
    

    Er, wat? If the format is XML, we encode it as JSON, then decode it, and expect an array? I don't get it. Definitely needs a comment.

  15. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +   * Returns a multi-dimensional array where each provider is represented as a
    +   * single top-level array element. Each individual element is keyed by the
    +   * provider's name codified with bin2hex(), and has the following values:
    +   * - provider_name: Human readable name of the provider.
    +   * - provider_url: Main URL of the provider.
    +   * - endpoints: List of endpoints this provider exposes. Each endpoint is an
    +   *   array with the following values:
    +   *     - url: The endpoint's URL.
    +   *     - schemes: List of URL schemes supported by the provider.
    +   *     - formats: List of supported formats. Can be "json", "xml" or both.
    +   *     - discovery: Whether the provider supports oEmbed discovery.
    +   *
    +   * @return array[]
    +   *   Information about oEmbed providers.
    +   */
    +  public function getProviders();
    

    This seems like an ArrayPI to me. Maybe we should implement a simple value object to encapsulate the provider info instead.

  16. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +   * @param array $allowed_providers
    +   *   (optional) A list of providers to restrict to. If present, this needs to
    +   *   be an array where keys are the provider names (codified with bin2hex()),
    +   *   and the values are irrelevant. If ommitted, all providers will be
    +   *   considered.
    

    If the values are irrelevant, this should be a simple array of codified provider names, rather than an associative array.

  17. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +  public function matchUrl($url, array $allowed_providers = []);
    

    Considering the doc comment, I find this method name a little odd. How about isProvider($url) instead?

  18. +++ b/core/modules/media/src/OEmbedInterface.php
    @@ -0,0 +1,85 @@
    +  /**
    +   * Runs oEmbed discovery and returns the endpoint URL if successful.
    +   *
    +   * @param string $url
    +   *   The resource's URL.
    +   *
    +   * @return string|bool
    +   *   URL of the oEmbed endpoint, or FALSE if the discovery was not successful.
    +   */
    +  public function doUrlDiscovery($url);
    +
    +  /**
    +   * Determines whether a given URL is an oEmbed resource.
    +   *
    +   * @param string $url
    +   *   URL of the resource.
    +   *
    +   * @return bool
    +   *   TRUE if the URL represents a valid oEmbed resource, or FALSE otherwise.
    +   */
    +  public function isOEmbedResource($url);
    

    I'm not sure, but it seems like these two methods can be combined. Not sure what one does that the other does not.

  19. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraint.php
    @@ -0,0 +1,25 @@
    +  /**
    +   * The default violation message.
    +   *
    +   * @var string
    +   */
    +  public $message = 'The provider used is not allowed.';
    

    This message could be better. How about something like "The %provider provider is not allowed", with %provider being replaced at validation time with the actual name of the provider?

  20. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,64 @@
    +    // TODO check if resource belongs to an allowed provider.
    

    Follow-up?

  21. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,64 @@
    +    $foo = 'ar';
    

    Not sure this should be here :)

  22. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,64 @@
    +      $allowed_providers = array_map(function ($name) {
    +        return hex2bin($name);
    +      }, $source_config['allowed_providers']);
    

    There's no need for the closure here. Just this will work:

    $allowed_providers = array_map('hex2bin', $source_config['allowed_providers']);

  23. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedResourceConstraintValidator.php
    @@ -0,0 +1,50 @@
    +   * Constructs a new TweetVisibleConstraintValidator.
    

    Wrong class name.

phenaproxima’s picture

More thoughts that occur to me...

  • I think the oEmbed service has too many responsibilities. We should split it up -- there should be a service to discover the providers, and a separate service to extract oEmbed information from a particular page. Both should consume UseCacheBackendTrait so that they can optionally cache what they find.
  • In general I think there is an over-reliance on boolean return values. I think we should be more aggressive about throwing exceptions. For example, when fetching the providers, it's a full-scale error condition if the fetched providers do not decode to an array. This is not something we need to quietly log -- we should throw an exception in that case.
  • Right now, the oEmbed service includes a specific alteration to make YouTube work properly, since its provider info does not include schemes. We should do this with an alter hook, I think -- just a simple hook that can alter the fetched provider data before it's cached.
chr.fritsch’s picture

Thanks for the extensive reviews @phenaproxima

I tried to implement the most of your recommendations.

I changed so much, that it would be nice to get some feedback if we are on the right track.

Since we are shipping a specific formatter with this patch, i think we should also think about #2865184: Allow MediaSource plugins provide default field form/view display settings a bit more.

chr.fritsch’s picture

Next wave of improvements, now with more tests.

chr.fritsch’s picture

Next wave, minor improvements and more tests

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
55.66 KB
2.64 KB

Fixed one test, and removed the one that already exists in the Functional space.

mtodor’s picture

Status: Needs review » Needs work

Nice work! I have tested this a bit and reviewed code too.

Here are some things that ware not nice when I tested:

  1. I tried twitch.tv oEmbed because it's listed on oembed.com - and I wasn't able to create media entity. Then I have seen that some providers are not supported. I'm not sure is that big concern, but at least it's relevant information.
  2. I tried to create youtube.com oEmbed video and it didn't work. I got error message: GuzzleHttp\Exception\RequestException: cURL error 60: SSL certificate problem: unable to get local issuer certificate (see http://curl.haxx.se/libcurl/c/libcurl-errors.html) in GuzzleHttp\Handler\CurlFactory::createRejection() (line 187 of .../test-sites/drupal_2831944/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php).
  3. I tried to create flickr oembed and when I copy from flickr link to image I get https but in oembed check only http is support, so I can't save it. And when I change to http then it's possible to save it, but nothing is rendered, only title is correct.
  4. I tried to create gliphy and I have similar issue with rendering as for flickr, I can create but gliphy is not rendered.
  5. I had problem with thumbnail generation - for example Vimeo (https://vimeo.com/56980338).
  6. As I see there are lot of code here. Is there reason not to put this in sub-module?

And here are some code related concerns:

  1. +++ b/core/modules/media/media.module
    @@ -59,6 +59,22 @@ function media_entity_access(EntityInterface $entity, $operation, AccountInterfa
    +        'http*://*youtube.com/*',
    

    I'm not sure http* is a good way to check schema, because in this case, also https://www.drupal-youtube.com/watch?v=92a7Hj0ijLs is valid link.

  2. +++ b/core/modules/media/src/Exception/OEmbedProviderException.php
    @@ -0,0 +1,40 @@
    +  protected $providers;
    

    I don't see a reason for $providers in this exception. Since it's never set with some data.

  3. +++ b/core/modules/media/src/OEmbed.php
    @@ -0,0 +1,234 @@
    +        $data = Json::decode(Json::encode($content));
    

    This should be checked, it's kinda not logical.

  4. +++ b/core/modules/media/src/OEmbedProviderDiscovery.php
    @@ -0,0 +1,135 @@
    +      \Drupal::moduleHandler()->alter('oembed_providers', $providers);
    

    Should be injected.

  5. +++ b/core/modules/media/src/OEmbedProviderDiscovery.php
    @@ -0,0 +1,135 @@
    +        $keyed_providers[bin2hex($provider['provider_name'])] = $provider;
    

    I think this one can go inside foreach, instead of $filter and then also break since there is no reason looping when discovery and schema are found.

  6. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,307 @@
    +use GuzzleHttp\Exception\ClientException;
    +use GuzzleHttp\Exception\RequestException;
    

    These two Exceptions are not used in class, so uses can be removed.

  7. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,307 @@
    +    $main_property = $media->get($this->configuration['source_field'])->first()->mainPropertyName();
    +    $resource_url = $media->get($this->configuration['source_field'])->first()->get($main_property)->getString();
    

    It would be better to put $media->get($this->configuration['source_field'])->first() into variable and then there would not be need for $main_property

  8. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,307 @@
    +        $image_url = $this->getMetadata($media, 'thumbnail_url');
    

    I'm maybe missing something, but - if getMetadata is called with thumbnail_local_uri, then it will executed getMetadata with thumbnail_url , and that case will call getMetadata with thumbnail_local_uri and so on. I see it ass infinite recursive calls, or there is something I'm missing?

    And in general this getMetadata() function is quite difficult to read. It's like spaghetti.

  9. +++ b/core/profiles/standard/config/optional/media.type.video.yml
    @@ -0,0 +1,13 @@
    +description: "Use thie media type to store remote content such as YouTube videos."
    

    Small typo Use thie media

Good progress!

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
55.71 KB
20.25 KB

Thanks @mtodor for the review.

After the review i discussed the implementation with @mtodor in person and we basically found a way to support all the providers, doesn't matter if they support discovery and schemes or not.

If a provider doesn't support discovery, we can use the url key from the provider plus ?url=http://youtube.com/something

So i implemented that and and tested the current patch with Twitter, YouTube, Flickr, Twitch and Vimeo. Everything works OOTB :-)

Also fixed the nits from #35

Not sure if interdiff is usefull, because a lot of stuff was changed.

chr.fritsch’s picture

I added support for the maxheight and maxwidth oEmbed parameter in the formatter.

Also added some more comments and a few more tests.

aheimlich’s picture

+++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
@@ -0,0 +1,162 @@
+          $resource = $this->oEmbed->fetchResource($this->oEmbed->getResourceUrl($item->{$main_property}, $this->getSetting('maxwidth'), $this->getSetting('maxheight')));
+          $element[$delta] = [
+            '#theme' => 'media_oembed',
+            '#post' => (string) $resource['html'],
+          ];

Unless I'm misunderstanding something, the only oEmbed resource types that support the html response parameter are the video and rich types.

I realize that this issue is about remote videos, but nothing about the code in this patch actually prevents people from trying to embed, say, an image using the oEmbed field formatter. If my understanding is correct, then if someone were to try and embed an image using the oEmbed field formatter, nothing would show up, and they'd get an Undefined index "html" PHP notice.

I'd suggest either accommodating the other oEmbed resource types, or explicitly forbidding their use.
See https://oembed.com/, section 2.3.4 for details.

seanB’s picture

Thanks for the input! I think the oEmbed service should accommodate other types than video. We might want to have different formatters for each oEmbed type?

The media types with a oEmbed source should be able to configure which specific type of oEmbed resource it allows. We can use that information to validate the user input by type. Based on the supported resource type in the media source we can also determine which formatter is applicable.

Another approach could be to have different oEmbed media sources per resource type with a oEmbed source base class. This would mean adding more media sources but would probably make it a little easier to add source field constraints and configure supported formatters per source.

phenaproxima’s picture

I did some substantial refactoring on this patch.

  1. Both oEmbed services, and exceptions they throw, are now in the Drupal\media\OEmbed namespace.
  2. The OEmbed class, and its interface, are now called ResourceFetcher.
  3. I cleaned up both ResourceFetcher and ProviderDiscovery to make them (hopefully) clearer and more focused.
  4. The services don't bother logging errors -- they just throw exceptions.
  5. I got rid of the bin2hex encoding of oEmbed provider names. It was, according to a comment, only done to make the names compatible with config keys. I don't see why we have to store the names as keys.
  6. I made the services' caching a little less aggressive (i.e., removed the static caching).

We still need to implement support for all oEmbed types defined by the specification, and add a ton of test coverage.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
61.02 KB
6.49 KB

Thanks, @phenaproxima for your work. That looks really good.

I just fixed some namespaces and docs. I also added a check if there is an HTML key in the formatter.

slashrsm’s picture

I finally managed to check the latest version of the patch. First of all I'd like to express my satisfaction with the direction we're taking. A lot of valuable points were made. Refactoring into two services makes a lot of sense and after a long thought I also independently came to the same conclusion about the 3rd party libraries; amount of code that they would replace is not huge and removing that wouldn't justify bringing in another huge dependency.

  1. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,133 @@
    +          if (preg_match("|$regexp|", $url)) {
    

    Would we need to escape in a strange case where the pipe character would appear in the URL? Is that even possible?

  2. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,59 @@
    +   * provider's name codified with bin2hex(), and has the following values:
    

    "following values" are missing.

  3. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,248 @@
    +      $cached = $this->cacheGet('media:oembed_resource_url');
    +      if ($cached) {
    +        $this->urlCache = $cached->data;
    +      }
    

    Do we have any idea how big could this cache item become? If I understand it correctly count($media_items) ~= count($cached->data) AKA it could be huge.

    Huge cache items can cause problems from my experience.

  4. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,312 @@
    +    $form['thumbnails_location'] = [
    +      '#type' => 'textfield',
    +      '#title' => $this->t('Thumbnails location'),
    +      '#default_value' => $this->configuration['thumbnails_location'],
    +      '#description' => $this->t('Thumbnails will be fetched from the provider for local usage. This is the location where they will be placed.'),
    +    ];
    

    Do we want to validate this value to make sure nothing strange is put into it?

  5. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,312 @@
    +    $form['allowed_providers'] = [
    +      '#type' => 'checkboxes',
    +      '#title' => $this->t('Allowed providers'),
    +      '#default_value' => $this->configuration['allowed_providers'],
    +      '#options' => $provider_options,
    +      '#description' => $this->t('Optionally select the allowed oEmbed providers for this media type. If left blank, all providers will be allowed.'),
    +    ];
    +    return $form;
    

    This is a long list! Can we guarantee that all listed providers actually work? And more importantly, can we guarantee that any new providers that will be added to the list will?

    Would it make sense to maintain our own list? It could be based on the official one, but would only contain providers that we tested and can be sure are working correctly with in our integration. We could store it as a config object, which would allow us to easily add or update entries. Thoughts?

  6. +++ b/core/modules/media/templates/media-oembed.html.twig
    @@ -0,0 +1,9 @@
    +{{ post|raw }}
    

    We are displaying raw HTML that arrived from a 3rd party. Isn't that a potential security problem?

  7. +++ b/core/profiles/standard/config/optional/media.type.video.yml
    @@ -0,0 +1,13 @@
    +source_configuration:
    +  thumbnails_location: 'public://oembed_thumbnails'
    +  source_field: field_media_oembed
    

    We want to limit allowed providers in the default configuration.

martin107’s picture

Would we need to escape in a strange case where the pipe character would appear in the URL?

yes i am including a link with some interesting background reading

-- under the heading

"Common characters that need encoding"

The pipe char | - %7C.

https://developers.google.com/maps/documentation/urls/url-encoding

I hope that helps

chr.fritsch’s picture

#2865184: Allow MediaSource plugins provide default field form/view display settings landed, so let's use it to set the oEmbed formatter for the source field.

chr.fritsch’s picture

Also we should hide the name field and bring the source field to the top in prepareFormDisplay. Then we are aligned with the shipped configuration.

phenaproxima’s picture

We also need to fix (and have a test confirming) #45. Y'know, in addition to everything @slashrsm said in #44. :)

phenaproxima’s picture

Let's talk about #44.6:

We are displaying raw HTML that arrived from a 3rd party. Isn't that a potential security problem?

I would think so, yes. The oEmbed specification has this to say about video resources from a third party (source, section 2.3.4.2):

Consumers may wish to load the HTML in an off-domain iframe to avoid XSS vulnerabilities.

That, to me, implies that the oEmbed provider assumes no responsibility for XSS filtering. So it's our job to filter anything we receive from an oEmbed provider.

I discussed this with @chr.fritsch on Slack, and he told me that Facebook, for one, does include <script> tags in oEmbed resources. In my opinion, we definitely cannot trust that kind of thing, at least not out-of-the-box in core. I feel that our templates and code should filter everything we get from an oEmbed provider, but we should also provide some sort of mechanism, like an alter hook, to allow coders (NOT site builders, IMHO, due to how potentially dangerous this is) to disable XSS filtering for trusted resources on a case-by-case basis.

phenaproxima’s picture

Issue tags: +Needs security review

Tagging for security review.

slashrsm’s picture

I discussed this with @chr.fritsch on Slack, and he told me that Facebook, for one, does include

tags in oEmbed resources. In my opinion, we definitely cannot trust that kind of thing, at least not out-of-the-box in core. I feel that our templates and code should filter everything we get from an oEmbed provider, but we should also provide some sort of mechanism, like an alter hook, to allow coders (NOT site builders, IMHO, due to how potentially dangerous this is) to disable XSS filtering for trusted resources on a case-by-case basis.
The problem is that this embed codes won't work if we strip their scripts away. We would need to somehow make sure to include everything they need in a Drupal way. This will be very hard to do for all sites on the internet that support oEmbed. Another reason for a limited set of oEmbed providers that core will support (AKA #44.5)? This would allow us to guarantee that the providers we support not only work correctly when creating media but also when displaying it.
chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
64.98 KB
16.4 KB

I fixed most of the mentioned things.

Still open are #44.1 and #44.6

We should also add some description to the source field, which providers are currently available.

phenaproxima’s picture

Another reason for a limited set of oEmbed providers that core will support

We could hard-code a list of providers which we know to be safe, then provide an API-level mechanism (an alter hook?) allowing developers, who are equipped to understand and mitigate the XSS risks, to add additional providers that they trust. As long as we don't expose that mechanism to site builders, I think we'd be OK.

seanB’s picture

We could hard-code a list of providers which we know to be safe, then provide an API-level mechanism (an alter hook?) allowing developers, who are equipped to understand and mitigate the XSS risks, to add additional providers that they trust.

If we encourage people to create a different source for each oEmbed provider (or a subset of them) as I suggested in #39, new providers can only be added by coders. This also solves the "maintain our own list" part. We could potentially build the Facebook, Twitter, Instagram types in contrib on top of an oEmbed base class.

It might take a lot of work since there could be an unlimited amount of providers. This also makes it harder to support regions outside of Noth America and Europe. For example China has its own set of social platforms for video etc. This is really something we have to look into a bit more since a remote video media type only allowing Youtube/Vimeo will exclude a large part of the world not using that as the main platform.

seanB’s picture

We just had a short conversation in Slack in which the following was proposed:

The oEmbed approach saves a lot of pain with the specific implementations of oEmbed providers. Basically we only have a couple of oEmbed types and there could be generic formatters for each type. This is a lot better than needing specific implementations for each provider. The real benefit would be when you start grouping the allowed providers for plugins.

  1. We can create a 'Remote video' source supporting Youtube and Vimeo (maybe some others).
  2. Add a alter hook to add more oEmbed providers to the source.
  3. Add a possibility to restrict the allowed providers per media type. So the media source supports Youtube and Vimeo, but the type only supports Youtube for example.
  4. We can create a 'Remote image' source supporting Instagram and Flickr (and or others).
  5. We can create a 'Remote content' source supporting Facebook and Slideshare (and or others).

This will allow people to create all the sources they want and change the list of providers each source allows. This issue will solve point 1, 2 and 3.

We can create follow ups if we want 3 and 4 (this could also be contrib though, but having them in Core would be great since this removes the need for a lot of contrib modules).

Any thoughts?

phenaproxima’s picture

Add a alter hook to add more oEmbed providers to the source.

My suggestion: define the allowed providers in the plugin definition. That way, developers can add additional ones without us needing to define a completely new hook.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
64.87 KB
13.38 KB

I implemented the proposed approach from #56. But only the points 1.-3. because I want to see if I am going in the right direction.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
67.34 KB
15.69 KB

Another push:

  • Replaced drupal_set_message with the new messenger service
  • Adjusted some comments
  • Fixed the tests
  • Added the list of allowed providers to the link widget description
  • Extended the formatter, it can handle all kind of oEmbed types now
  • Removed the oembed template and use inline_template form element instead
  • Added the link module dependency to media. Not 100% sure if this is correct. But we are using a link field as source field.

And now:
Merry Christmas everyone 🎄

kmajzlik’s picture

I think that Link should not be dependency of Media. We can use Document/Image media types only.

phenaproxima’s picture

Status: Needs review » Needs work

Definitely coming along; I like where this is heading!

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -40,6 +40,17 @@ field.formatter.settings.media_thumbnail:
    +    maxwidth:
    +      type: integer
    +      label: 'Max-width'
    +    maxheight:
    +      type: integer
    +      label: 'Max-height'
    

    Can these be max_width and max_height, with the labels "Maximum width" and "Maximum height"?

  2. +++ b/core/modules/media/media.module
    @@ -71,9 +75,59 @@ function media_theme_suggestions_media(array $variables) {
    +function media_field_widget_form_alter(&$element, FormStateInterface $form_state, $context) {
    

    &$element and $context should have the array type hint.

    Also, we can remove the instanceof LinkWidget check if we implement hook_field_widget_WIDGET_TYPE_form_alter(), as well as the hard dependency on Link. Let's do that :)

  3. +++ b/core/modules/media/media.module
    @@ -71,9 +75,59 @@ function media_theme_suggestions_media(array $variables) {
    +  if ($context['widget'] instanceof LinkWidget && !empty($element['uri'])) {
    +
    +    /** @var \Drupal\media\MediaInterface $media */
    

    Nit: There's an extra blank line here.

  4. +++ b/core/modules/media/media.module
    @@ -71,9 +75,59 @@ function media_theme_suggestions_media(array $variables) {
    +    $source_config = $media->getSource()->getConfiguration();
    +    if ($source_config['source_field'] != $field_definition->getName()) {
    

    We should be using $media->getSource()->getSourceFieldDefinition(), rather than $source_config['source_field'].

  5. +++ b/core/modules/media/media.module
    @@ -71,9 +75,59 @@ function media_theme_suggestions_media(array $variables) {
    +    if (is_array($element['uri']['#description']) && !empty($element['uri']['#description']['#items'])) {
    

    Let's change the !empty to isset(), so that we add the message even if the description items are an empty array.

  6. +++ b/core/modules/media/src/Annotation/MediaSource.php
    @@ -105,4 +105,11 @@ class MediaSource extends Plugin {
    +  /**
    +   * Give media sources the possibility to provide their own annotation keys.
    +   *
    +   * @var array
    +   */
    +  public $settings = [];
    

    I'm not 100% sure we should be doing this. Annotations are not really supposed to be this free-form; besides, the annotations can already contain arbitrary keys (it's all transformed to an array anyway by the plugin discovery system).

  7. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,133 @@
    +    // We store them in an array where keys are hex-converted names, in order
    +    // to allow an easy conversion back to their original names when
    +    // necessary.
    

    We're no longer doing the hex conversion stuff, so we should probably remove this comment.

  8. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    + * Plugin implementation of the 'link' formatter.
    

    Nope, this is the 'oembed' formatter. :)

  9. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    + *   label = @Translation("oEmbed"),
    

    This label is exposed in the UI, so should we make it something more human-friendly?

  10. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +  /**
    +   * The oEmbed service.
    +   *
    +   * @var \Drupal\media\OEmbed\ResourceFetcherInterface
    +   */
    +  protected $oEmbed;
    

    Let's rename this to $resourceFetcher.

  11. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +   * Constructs a EntityReferenceEntityFormatter instance.
    

    Wrong class name.

  12. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +   *   The plugin_id for the formatter.
    

    "plugin_id" should be "plugin ID".

  13. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +   * @param \Drupal\media\OEmbed\ResourceFetcherInterface $oembed
    +   *   The oEmbed service.
    

    Let's rename the parameter to $resource_fetcher and update the description.

  14. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +      'maxwidth' => NULL,
    +      'maxheight' => NULL,
    

    Can these be max_width and max_height, and also have a sane integer default?

  15. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +      if (!empty($item->{$main_property})) {
    

    Rather than wrap nearly the whole function body in this if statement, can we do this:

    if (empty($item->{$main_property})) {
      continue;
    }
    
  16. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +        try {
    +          $resource = $this->oEmbed->fetchResource($this->oEmbed->getResourceUrl($item->{$main_property}, $this->getSetting('maxwidth'), $this->getSetting('maxheight')));
    

    Again, rather than wrap so much code in a single try block, let's flatten this a bit:

    try {
      $resource = fetchResource(...);
    }
    catch (ResourceException $exception) {
      $this->messenger->addError($exception->getMessage());
      continue;
    }
    
    // ...do more stuff with $resource
    
  17. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +          switch ($resource['type']) {
    

    Let's add a default case in which we throw an UnexpectedValueException if $resource['type'] is not one of the four known types.

  18. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +              $element[$delta] = [
    +                '#type' => 'inline_template',
    +                '#template' => (string) $resource['html'],
    +              ];
    

    Is this safe, from an XSS standpoint?

  19. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +      'maxwidth' => [
    +        '#type' => 'number',
    +        '#title' => $this->t('Max width'),
    +        '#default_value' => $this->getSetting('maxwidth'),
    +        '#size' => 5,
    +        '#maxlength' => 5,
    +        '#field_suffix' => $this->t('pixels'),
    +        '#min' => 0,
    +      ],
    +      'maxheight' => [
    +        '#type' => 'number',
    +        '#title' => $this->t('Max height'),
    +        '#default_value' => $this->getSetting('maxheight'),
    +        '#size' => 5,
    +        '#maxlength' => 5,
    +        '#field_suffix' => $this->t('pixels'),
    +        '#min' => 0,
    +      ],
    

    Should be max_width and max_height, and the labels should say "Maximum" instead of just "max"

  20. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,201 @@
    +  public function settingsSummary() {
    

    This method will also need maxwidth and maxheight converted to max_* (and the labels should be "Maximum").

  21. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,85 @@
    +      $this->context->addViolation($this->t('An error occurred while trying to retrieve the oEmbed provider database.'));
    

    It might be helpful to log the full exception message somewhere, for debugging purposes.

  22. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,327 @@
    +    $source_field = $this->getSourceFieldDefinition($media->bundle->entity)->getName();
    +    /** @var \Drupal\Core\Field\FieldItemInterface $source_field_item */
    +    $source_field_item = $media->get($source_field)->first();
    +
    +    $main_property = $source_field_item->mainPropertyName();
    

    We do this SO OFTEN...let's just add a public helper function to MediaSourceBase for it. MediaSourceBase::getSourceFieldValue($media)

  23. +++ b/core/modules/media/src/Plugin/media/Source/RemoteVideo.php
    @@ -0,0 +1,19 @@
    +/**
    + * Provides the media source plugin for oEmbed resources.
    + *
    + * @MediaSource(
    + *   id = "remote_video",
    + *   label = @Translation("Remote video"),
    + *   description = @Translation("Use remote video URL for reusable media."),
    + *   allowed_field_types = {"link", "string", "string_long"},
    + *   default_thumbnail_filename = "no-thumbnail.png",
    + *   settings = {
    + *    "supported_providers" = {"YouTube", "Vimeo"}
    + *   }
    + * )
    + */
    +class RemoteVideo extends OEmbed {}
    

    This doesn't need to be a class. If we make the OEmbed class non-abstract, we can add this plugin definition by implementing hook_media_source_info_alter(). Or better yet, we can have a deriver which simply generates a hard-coded set of plugin definitions based on the OEmbed plugin class implementation.

seanB’s picture

Had an hour so started on processing #62. Here is what I have so far.

1. Fixed
2. Fixed
3. Fixed
4. Not sure what this code is doing and what it's for? What is the reason for $source_config['source_field'] != $field_definition->getName()?
5. Fixed
6. This needs some more discussion.
7. Fixed
8. Fixed
9. Changed it to 'oEmbed content', not sure though.
10. Fixed
11. Fixed
12. Fixed
13. Fixed
14. Fixed
15. Fixed
16. Fixed
17. Fixed
18. Fixed
19. Fixed
20. Fixed
21. Still todo
22. We had this, but it was deleted since this was deemed unnessecary. I think it would be very helpful to add it back though. It would be good to scroll through the patchzilla issue to find out what the exact arguments were.
23. Still todo

chr.fritsch’s picture

chr.fritsch’s picture

Worked on #62.6. It seems to work without the new annotation key as well. So I removed that.

seanB’s picture

Discussed the link dependency with chr.fritsch in slack. We agreed link fields are best suited to store the links to oEmbed media. String fields are currently also allowed as a source field and we should probably remove that. Any thoughts?

chr.fritsch [12:15 PM] 
@seanb so I saw you removed the link dependency from the patch
@seanb The link field type is currently the default type for the source and we are using the link field in the optional config in standard. So if the link module is not enabled when you enable media, the remote video type will be broken

seanb [12:17 PM] 
@chr.fritsch That was mentioned in #62.2, but I guess the link field is a requirement for the oEmbed provider right?
@chr.fritsch Yeah, you are right, I guess as long as we ship with remote video I guess we depend on link.

chr.fritsch [12:19 PM] 
@seanb We could switch the config to use the string field, thats in core and not depending on a module

seanb [12:23 PM] 
@chr.fritsch I was thinking about that. Don't have a strong opinion on it, but I lean slightly towards using a link field. We are expecting a URL, and rendering the link would be easier with a link field. One of the reason for implementing media is remote media. I guess using link fields goes hand in hand with that. And I can't remember the last time I built a site without the link module.

chr.fritsch [12:25 PM] 
@seanb True story. So let’s re-add the link dependency
@seanb It’s possible to define multiple allowed_field_types. I don’t see a place how a user could create a video type with using a string field

seanb [12:29 PM] 
@chr.fritsch In which case would you need a string field?

chr.fritsch [12:30 PM] 
@seanb I don’t need it. oEmbed currently defines allowed_field_types = {“link”, “string”, “string_long”}, but how could someone use a string long field

seanb [12:32 PM] 
@chr.fritsch Maybe we should change that to just a link field? I guess the source for oEmbed should always be a URL the way it is written at the moment.
#61 in the issue makes a valid point about the link dependency, people could choose to not use remote video, but I guess the same argument could be made for the file field. You could choose to uninstall document and image as well and only use remote video. I guess we need to be a little opinionated and just require link and file as dependencies.
String fields for oEmbed should not be allowed though, that would be weird.
chr.fritsch’s picture

Based on the discussion with @seanB, I re-add the link dependency and removed the string types from the allowed_field_types.

Also:
- Fixed #62.4
- Changed the field name of the optional config to field_media_oembed_remote_video

Still open:
#62 21 & 22

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
69.97 KB
5.07 KB

Ok, fixed the last two open points

benjifisher’s picture

The new getSourceFieldDefinition() method is now added in #2934966: Make it easier to get source field values from media items.

I think this issue already needs a re-roll: there seems to be a conflict in media.schema.yml.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
70.11 KB

Rebased on all the fancy things that landed in the last couple of hours 😃

Renamed the media type to "remote_video", because "video" is already used now by the local video type.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
68.41 KB

Rebased on top of #2934966: Make it easier to get source field values from media items

I also reverted #62.18. We know it's not super safe to directly output what we get from the oEmbed service, but that's the reason why we only support a few oEmbed providers. Because we trust them.

phenaproxima’s picture

We know it's not super safe to directly output what we get from the oEmbed service, but that's the reason why we only support a few oEmbed providers. Because we trust them.

I tend to agree, but I think we need explicit review and sign-off from the security team for this approach.

alexpott’s picture

One quick thought that's going cause quite a bit of work... more review on the way.

+++ b/core/modules/media/media.info.yml
@@ -8,4 +8,5 @@ hidden: true
+  - link

This needs a corresponding hook_update_N / post update to install the link module. Not sure about what type of update it should be. I think hook_update_N because without link being installed the system is not correct.

alexpott’s picture

  1. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,128 @@
    +    $this->cacheSet($cache_id, $keyed_providers, time() + (60 * 60 * 24 * 7));
    

    This should use core's time service. Also 60*60*24*7 is a constant. Why make the program do the work? I'd add a comment about caching the result for a week.

  2. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,62 @@
    +   * @return array
    +   *   The provider information. Contains the following elements:
    +   *   - provider_name: Human readable name of the provider.
    +   *   - provider_url: Main URL of the provider.
    +   *   - endpoints: List of endpoints this provider exposes. Each endpoint is an
    +   *     array with the following values:
    +   *     - url: The endpoint's URL.
    +   *     - schemes: List of URL schemes supported by the provider.
    +   *     - formats: List of supported formats. Can be "json", "xml" or both.
    +   *     - discovery: Whether the provider supports oEmbed discovery.
    

    Can we create a value object to encapsulate the array and not pass around arrays of info. Same with endpoints.

chr.fritsch’s picture

Discussed "adding a link dependency" again with @phenaproxima and we agreed that adding a new dependency might not be the best solution. Instead of adding a new dependency we want to solve #2935973: Media source doesn't check if a field type exists before so that the source can fallback to a string field if link module is not present.

alexpott’s picture

I'm not sure that falling back to a string field if the link module is not present is a good idea.

What happens in these two situations:
1. Media installed
2. Remote field created
3. Link module installed

1. Media installed
2. Link module installed
3. Remote field created with exactly the same settings as the first time

Are the two fields different?

phenaproxima’s picture

1. Media installed
2. Remote field created
3. Link module installed

In this case, the remote media source will use the first available field type that it is compatible with. That will probably be a string field. If Link is subsequently installed, nothing changes for existing media types; new remote media types, though, will create a link field.

1. Media installed
2. Link module installed
3. Remote field created with exactly the same settings as the first time

In this case, the remote media source will create a link field, since that's what it most prefers.

This does not present a problem at all, because the media source will be compatible with either type of field, pretty much interchangeably.

alexpott’s picture

@phenaproxima I think that that behaviour is pretty unexpected though and it could lead to hard to debug bugs because you have to realise the code is behaving differently depending on the order things have occurred.

dawehner’s picture

I am wondering whether a solution to the dilemma out there would be to use string by default always, but allow people who wants more security (URL validation) or simply a title field to use the link field.

  1. +++ b/core/modules/media/media.module
    @@ -71,9 +75,59 @@ function media_theme_suggestions_media(array $variables) {
    +/**
    + * Implements hook_field_widget_WIDGET_TYPE_form_alter().
    + */
    +function media_field_widget_link_default_form_alter(array &$element, FormStateInterface $form_state, array $context) {
    +  if (!empty($element['uri'])) {
    +    /** @var \Drupal\media\MediaInterface $media */
    

    I would have expected an additional widget here to be honest.

  2. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,128 @@
    +  public function __construct(Client $http_client, $providers_url, CacheBackendInterface $cache_backend = NULL) {
    ...
    +    $this->cacheBackend = $cache_backend;
    +    $this->useCaches = isset($cache_backend);
    

    I don't think we need this optionalness. Why do we not pass in a null cache backend for this case?

  3. +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,43 @@
    +   * @return array|bool
    

    Nitpick: Doesn't the implementation throws an error in the case of "bool"?

  4. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraint.php
    @@ -0,0 +1,25 @@
    +  public $message = 'The @name provider is not allowed.';
    
    +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,98 @@
    +    if (!in_array($provider['provider_name'], $allowed_providers, TRUE)) {
    +      $message = new FormattableMarkup($constraint->message, [
    +        '@name' => $provider['provider_name'],
    +      ]);
    +      $this->context->addViolation((string) $message);
    

    ❓ I'm curious, doesn't this make it impossible to translate it?

  5. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -0,0 +1,81 @@
    +          'uri' => 'https://twitter.com/drupaldevdays/status/935643039741202432',
    ...
    +    $assert_session->elementContains('css', "blockquote.twitter-tweet > p", 'Save the date for Drupal Developer Days Lisbon 2018');
    

    That is a really nice test!

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

samuel.mortenson’s picture

I am wondering whether a solution to the dilemma out there would be to use string by default always

I think this is something we should consider - what does the Link module provide that would improve the UX of the OEmbed source? To me the nicest part of the Link module is its handling of internal URLs, but for OEmbed almost all links will be external. In terms of security, if we go with normal string fields, the only part of the Link module we'll be missing is protocol validation (see \Drupal\link\Plugin\Validation\Constraint\LinkExternalProtocolsConstraintValidator). I think the oembed_resource/oembed_provider constraints will handle this for us, but it wouldn't hurt to enforce http/https URLs as well.

phenaproxima’s picture

Status: Needs review » Needs work

I can't assign it to him, but @robpowell has agreed to implement the changes requested since #73.

phenaproxima’s picture

Title: Implement media type plugin for remote video » Implement media source plugin for remote video via oEmbed

Changing the title for increased accuracy points. :)

robpowell’s picture

Assigned: Unassigned » robpowell

Assigned it to myself and will take a look tonight.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
73.07 KB
14.68 KB

I started implementing value objects for Provider and Endpoints.

robpowell’s picture

@phenaproxima will not have time to work on this till saturday 1/27/18. Will change to unassigned.

robpowell’s picture

Assigned: robpowell » Unassigned
chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
72.53 KB
22.6 KB

So I picked it up again.

I adjusted several things:

  1. I fixed some tests. Not sure what to do about the MediaUpdateTest
  2. I removed the interfaces for the value objects
  3. Removed the dependency to the link module. String field is now the one and only.
  4. Fixed #76.1
  5. I fixed everything from 81.
chr.fritsch’s picture

Damn, forgot to remove the interfaces from the patch

alexpott’s picture

A quick drive-by un complete review - just concentrating on the new value objects.

  1. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,102 @@
    +   * @param bool $supportDiscovery
    +   *   Whether the provider supports oEmbed discovery.
    ...
    +  public function __construct($url, array $schemes = [], array $formats = [], $supportDiscovery = FALSE) {
    

    The argument should be $supports_discovery

  2. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,102 @@
    +  public function supportDiscovery() {
    +    return $this->supportDiscovery;
    

    I think supportsDiscovery() is better.

  3. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,154 @@
    +    if ($cached) {
    
    +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,241 @@
    +      $format = 'json';
    +      if (!empty($endpoint->getFormats()) && is_array($endpoint->getFormats())) {
    +        $format = reset($endpoint->getFormats());
    +      }
    +      $endpoint_url = str_replace('{format}', $format, $endpoint->getUrl());
    

    Tempting to put this logic into the Endpoint value object.

  4. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,241 @@
    +  protected function buildResourceUrl(Provider $provider_info, $url) {
    +    if (empty($provider_info->getEndpoints())) {
    +      throw new ProviderException('Provider @name does not define any endpoints.', $provider_info);
    +    }
    +
    +    $query = [
    +      'url' => $url,
    +    ];
    +    foreach ($provider_info->getEndpoints() as $endpoint) {
    +      $format = 'json';
    +      if (!empty($endpoint->getFormats()) && is_array($endpoint->getFormats())) {
    +        $format = reset($endpoint->getFormats());
    +      }
    +      $endpoint_url = str_replace('{format}', $format, $endpoint->getUrl());
    +
    +      return $endpoint_url . '?' . UrlHelper::buildQuery($query);
    +    }
    +    return FALSE;
    

    I think this could be part of the Provider and not here.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
71.67 KB
7.31 KB

So fixed the last failing tests and improved some minor things.

phenaproxima’s picture

I just want to give a wholehearted +1 to @alexpott's comments in #95.3 and #95.4. The Provider and Endpoint value objects are definitely the place for that logic.

chr.fritsch’s picture

So I moved some logic into the value objects and fixed all the things from #95

chr.fritsch’s picture

phenaproxima’s picture

Status: Needs review » Needs work

Great work! More hacking ahead :)

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -60,6 +71,20 @@ media.source.video_file:
    +media.source.oembed:*:
    +  type: media.source.field_aware
    +  label: 'Remote Video media source configuration'
    

    This config schema defines all oEmbed source derivatives, so the label should be something like "oEmbed media source configuration".

  2. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +  $media_type = \Drupal::entityTypeManager()->getStorage('media_type')->load($media->bundle());
    +  if ($media_type->getSource() instanceof OEmbed) {
    

    We don't need to load the media type. $media->getSource() will work just fine.

  3. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +    $provider = $media_type->getSource()->getMetadata($media, 'provider_name');
    +    $provider = \Drupal::transliteration()->transliterate($provider);
    

    We should add a new metadata attribute to return the transliterated machine-readable provider name, I think (provider_id or something like that).

  4. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +  if (!empty($element['value'])) {
    

    Rather than wrap all of the function's logic in an if statement, let's just return early if $element['value'] is empty.

  5. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +    $source_config = $media->getSource()->getConfiguration();
    +    // Only change the value field description on the source field.
    +    if ($source_config['source_field'] != $field_definition->getName()) {
    +      return;
    +    }
    

    We should be using $media->getSource()->getSourceFieldDefinition() here, not peeking directly into config. Keep those abstractions sealed! :)

  6. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +    $source_definition = $media->getSource()->getPluginDefinition();
    +    $allowed_providers = $source_definition['supported_providers'];
    +    if (!empty($source_config['allowed_providers'])) {
    +      $allowed_providers = $source_config['allowed_providers'];
    +    }
    

    I think this is also a bit of a leaky abstraction. Maybe we can add a method to the OEmbed plugin which returns the allowed providers, and call that here.

  7. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +    $message = t('These oEmbed providers are allowed: @providers', ['@providers' => implode(', ', $allowed_providers)]);
    

    I'm not sure we should use the technical term "oEmbed providers" here. How about something like "You can link to media from the following services: @providers"?

  8. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +  public function __construct($url, array $schemes = [], array $formats = [], $supports_discovery = FALSE) {
    

    The endpoint should also get its provider passed in, and let's add a method to retrieve it as well.

  9. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +      throw new ProviderException('Provider does not provide any endpoint URLs.');
    

    This should be an InvalidArgumentException, not a ProviderException. And let's adjust the message to match; something like "oEmbed endpoint must have a URL."

  10. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +    $this->url = $url;
    

    We should probably verify that the URL is valid as well, and throw an exception if not. I'm not exactly certain how we should go about that...maybe freak out if parse_url() returns FALSE?

  11. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +   * The URL will be build with the first defined format. If the endpoint
    

    "build" should be "built".

  12. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +    if (!empty($this->getFormats()) && is_array($this->getFormats())) {
    +      $format = reset($this->getFormats());
    +    }
    

    Can we call $this->getFormats() once and reuse its return value? Also, under what circumstances would getFormats() *not* return an array? And shouldn't the constructor throw an exception if $formats is empty?

  13. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +   * Returns list of supported formats.
    +   *
    +   * @return string[]
    +   *   List of formats.
    +   */
    

    Is there anywhere that I can get a list of the supported formats, and what they look like? (Even if it's not in the Drupal API?) An @see or @link would be very helpful here.

  14. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +  public function __construct($name, $url, array $endpoints) {
    

    I'm not sure this should be an array of objects. Why don't we simply make getEndpoints() instantiate the Endpoint objects? That way the calling code doesn't need to do all that work.

  15. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +    foreach ($this->getEndpoints() as $endpoint) {
    +      return $endpoint->getUrl() . '?' . UrlHelper::buildQuery($query);
    +    }
    

    Er...because the foreach loop returns, this will only iterate once, if at all. We should probably just retrieve the first endpoint's URL and go from there; no need to loop.

  16. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,154 @@
    +
    +  /**
    +   * Cache provider list for a week.
    +   */
    +  const CACHE_LIFETIME = 604800;
    

    Needs @var int.

  17. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,154 @@
    +        foreach ($endpoint->getSchemes() as $scheme) {
    +          // Convert scheme into a valid regular expression.
    +          $regexp = str_replace(['.', '*'], ['\.', '.*'], $scheme);
    +          if (preg_match("|$regexp|", $url)) {
    +            return $provider_info;
    +          }
    +        }
    

    I think we should move this logic into Endpoint as well. Something like $endpoint->matchUrl($url), which returns TRUE or FALSE if the given URL matches the endpoint.

  18. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,54 @@
    +  /**
    +   * Gets oEmbed providers information.
    +   *
    +   * Returns a multi-dimensional array where each provider is represented as a
    +   * single top-level array element. Each individual element is keyed by the
    +   * provider's name, and has the values like in
    +   * Drupal\media\OEmbed\ProviderDiscoveryInterface::get().
    +   *
    +   * @return \Drupal\media\OEmbed\Provider[]
    +   *   Information about oEmbed providers.
    +   *
    +   * @see \Drupal\media\ProviderDiscoveryInterface::get()
    +   *
    +   * @throws \Drupal\media\OEmbed\ProviderException
    +   */
    +  public function getAll();
    

    This doc comment needs adjustment! :)

  19. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,210 @@
    +      if (empty($resource['type'])) {
    +        $this->messenger->addError($this->t("oEmbed type couldn't be identified."));
    +        return $element;
    +      }
    

    We can remove this. The switch-case makes it redundant.

  20. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,210 @@
    +        default:
    +          throw new \UnexpectedValueException(sprintf('Unknown oEmbed resource type "%s"', $resource['type']));
    

    I think we should log an error here, rather than throw an exception.

  21. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,97 @@
    +    $media = $value->getEntity();
    +    $main_property = $value->getFieldDefinition()->getFieldStorageDefinition()->getMainPropertyName();
    +    $url = $value->first()->get($main_property)->getString();
    

    We can use $media->getSource()->getSourceFieldValue() here. :)

  22. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,97 @@
    +    $allowed_providers = $source_definition['supported_providers'];
    +    if (!empty($source_config['allowed_providers'])) {
    +      $allowed_providers = $source_config['allowed_providers'];
    +    }
    

    supported_providers? Never heard of it. This is another case for having a method on the OEmbed source which can retrieve the allowed providers.

  23. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedResourceConstraintValidator.php
    @@ -0,0 +1,56 @@
    +    $main_property = $value->getFieldDefinition()->getFieldStorageDefinition()->getMainPropertyName();
    +    $url = $value->first()->get($main_property)->getString();
    

    Let's use $media->getSource()->getSourceFieldValue() here.

  24. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,316 @@
    +    $thumbnails_location = $form_state->getValue('thumbnails_location');
    +    if (!file_valid_uri($thumbnails_location)) {
    +      $form_state->setErrorByName('thumbnails_location', $this->t('@path is not a valid path.', ['@path' => $thumbnails_location]));
    +    }
    

    We should probably call, file_prepare_directory() here, and try to set up the location if it doesn't exist. However, I'm OK deferring that to a follow-up.

  25. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,316 @@
    +    parent::prepareFormDisplay($type, $display);
    +    $display->removeComponent('name');
    

    Why are we removing the name from the form display?

  26. +++ b/core/modules/media/src/Plugin/media/Source/OEmbedDeriver.php
    @@ -0,0 +1,31 @@
    +  public function getDerivativeDefinitions($base_plugin_definition) {
    +
    +    $this->derivatives = [
    

    Nit: There's an extra blank line here.

  27. +++ b/core/modules/media/src/Plugin/media/Source/OEmbedDeriver.php
    @@ -0,0 +1,31 @@
    +        'id' => 'remote_video',
    

    I don't think we need to call this "remote_video"; the fact that it's an oEmbed derivative implies that the video is remotely hosted. Let's just call it 'video', so that the full plugin ID will be 'oembed:video'.

samuel.mortenson’s picture

Some manual testing notes:

  1. I haven't been able to use a YouTube link yet - Vimeo seems to work fine, but I got an "The given URL does not match any known oEmbed providers." error when using any YouTube video URL.
  2. Why is there no "Name" field for this provider? Would users ever want to override the automatically set label coming from the oEmbed source?
  3. I'm not sure why YouTube and Vimeo are the only oEmbed providers supported, but as a Media user who wants to use another oEmbed provider, what would I do? Would I need to write custom code?
chr.fritsch’s picture

Assigned: Unassigned » chr.fritsch

Working on this

chr.fritsch’s picture

Fixed most of the things #100. Regarding the removed name field: The oEmbed providers are supporting the name of the resource. So we should use that by default. But we could also show the name field and make it non-required.

tstoeckler’s picture

Status: Needs work » Needs review
dawehner’s picture

  1. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
       $suggestions[] = 'media__' . $media->bundle();
    ...
    +    $suggestions[] = 'media__oembed';
    

    I'm curious, shouldn't the bundle one be more specific then the generic media__oembed suggestion?

  2. +++ b/core/modules/media/media.module
    @@ -93,9 +95,58 @@ function media_theme_suggestions_media(array $variables) {
    +    if (!empty($element['value']['#description'])) {
    +      $element['value']['#description'] = [
    +        '#theme' => 'item_list',
    +        '#items' => [$element['value']['#description'], $message],
    +      ];
    +    }
    +    else {
    +      $element['value']['#description'] = [
    +        '#theme' => 'item_list',
    +        '#items' => [$message],
    +      ];
    +    }
    

    What about making this a bit more dense, by moving the description inline.

  3. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +  public function __construct($url, array $schemes = [], array $formats = [], $supports_discovery = FALSE) {
    +    if (!$url) {
    +      throw new ProviderException('Provider does not provide any endpoint URLs.');
    +    }
    

    I like this level of defensiveness!

  4. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +    if (!empty($this->getFormats()) && is_array($this->getFormats())) {
    +      $format = reset($this->getFormats());
    +    }
    

    It'd be nice to explain why we use the first one ... I guess there is no better idea? Also I think it would be nice if getFormat ensures to always returns an array. No reason to check for is_array

  5. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,109 @@
    +   *   TRUE or FALSE.
    

    Let's drop that :)

  6. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +    if (empty($endpoints)) {
    +      throw new ProviderException('Provider @name does not define any endpoints.', $this);
    +    }
    +    $this->name = $name;
    

    Why do we not check $url in this case?

  7. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +   * Builds the endpoint URL and returns it.
    +   *
    +   * @param string $url
    +   *   The canonical media URL.
    +   *
    +   * @return string
    +   *   URL of the oEmbed endpoint.
    +   */
    +  public function buildResourceUrl($url) {
    

    Let's add documentation that and why we return the first available endpoint.

  8. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,154 @@
    +      foreach ($provider['endpoints'] as $endpoint) {
    +        $formats = !empty($endpoint['formats']) ? $endpoint['formats'] : [];
    +        $schemes = !empty($endpoint['schemes']) ? $endpoint['schemes'] : [];
    +        $discovery = !empty($endpoint['discovery']) ? $endpoint['discovery'] : [];
    

    What about making this a bit easier by using $endpoint += ['formats' => [], 'schemes' => [], ...] ?

  9. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,154 @@
    +    if (isset($providers[$provider_name])) {
    +      return $providers[$provider_name];
    +    }
    +    else {
    +      throw new \InvalidArgumentException("Unknown provider '$provider_name'");
    +    }
    

    I'm personally a huge fan of having the common case being not inside a condition aka. if (!error_case) { ... throw an exception } ... continue normally

  10. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,54 @@
    +interface ProviderDiscoveryInterface {
    

    It would be nice to have at least one place which explains the overall concept, aka. how providers, fetchers and provider discovery work together.

  11. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,54 @@
    +   * @throws \Drupal\media\OEmbed\ProviderException
    +   */
    ...
    +   * @throws \InvalidArgumentException
    +   *   If there is no known oEmbed provider with the specified name.
    +   */
    

    Shouldn't this be the same interface?

  12. +++ b/core/modules/media/src/OEmbed/ProviderDiscoveryInterface.php
    @@ -0,0 +1,54 @@
    +   * @return \Drupal\media\OEmbed\Provider|bool
    +   *   Returns a single provider if found, otherwise FALSE.
    +   *
    +   * @see \Drupal\media\ProviderDiscoveryInterface::get()
    +   */
    +  public function getProviderByUrl($url);
    

    Interesting, why is this one returning a FALSE instead of throwing an exception?

  13. +++ b/core/modules/media/src/OEmbed/ProviderException.php
    @@ -0,0 +1,51 @@
    +  public function __construct($message, Provider $provider = NULL, \Exception $previous = NULL) {
    +    $variables = [
    ...
    +   * @return \Drupal\media\OEmbed\Provider
    +   *   The information about the oEmbed provider which caused the exception.
    +   */
    +  public function getProvider() {
    +    return $this->provider;
    +  }
    

    Why can $provider be empty?

  14. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,207 @@
    +    // If the response is XML, magically convert it to JSON.
    +    if (strpos($format[0], 'text/xml') !== FALSE) {
    +      // Load XML into SimpleXMLElement.
    +      $content = simplexml_load_string($content, "SimpleXMLElement", LIBXML_NOCDATA);
    +      // Convert the SimpleXMLElement to JSON.
    +      $content = Json::encode($content);
    +    }
    

    That is good to know!

  15. +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,43 @@
    +/**
    + * Interface for oEmbed service.
    + */
    +interface ResourceFetcherInterface {
    

    It looks like this documentation is out of date.

  16. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,210 @@
    +        $this->messenger->addError($this->t("Could not retrieve the remote URL."));
    ...
    +        $this->messenger->addError($this->t("oEmbed type couldn't be identified."));
    

    I'd actually log the error instead. In an API first world this isn't helpful. Also, the enduser doesn't care about that

  17. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -0,0 +1,81 @@
    +class OEmbedFormatterTest extends BrowserTestBase {
    

    It doesn't feel like we would actually need a browser test for that. Can't you trigger rendering the formatter directly and save a bit of setup code?

  18. +++ b/core/modules/media/tests/src/Functional/OEmbedProviderDiscoveryTest.php
    @@ -0,0 +1,107 @@
    +   * @dataProvider dataProviderNonExistingProviderDatabase
    +   */
    +  public function testNonExistingProviderDatabase($providerUrl, $exceptionClass, $exceptionMessage) {
    

    Nice!

  19. +++ b/core/modules/media/tests/src/Functional/OEmbedProviderDiscoveryTest.php
    @@ -0,0 +1,107 @@
    +
    

    Nitpick, let's drop this empty line

I haven't been able to use a YouTube link yet - Vimeo seems to work fine, but I got an "The given URL does not match any known oEmbed providers." error when using any YouTube video URL.

I had a look, for some reason, $schemes is empty for youtube:

Array ( [0] => Drupal\media\OEmbed\Endpoint Object ( [url:protected] => http://www.youtube.com/oembed [schemes:protected] => Array ( ) [formats:protected] => Array ( ) [supportsDiscovery:protected] => 1 ) )

I also naively tried things out on vimeo: I went up using https://vimeo.com/channels/staffpicks/251853214, which didn't work either :( Here is an idea: Could we do something more clever here: Show some message like this: "It looks like you are trying to provide a youtube video, these are the allowed schemes ...". In order to do that we could use a String distance function and find the most likely matching one.

Why is there no "Name" field for this provider? Would users ever want to override the automatically set label coming from the oEmbed source?

Interesting point. To be honest this feels like something which could be put into the configuration level, but it doesn't have to be the in the UI itself, given it reads, at least for me, like an edge case. Most people are fine with using the name of the provider they are given by the provider itself.

chr.fritsch’s picture

Thanks again for reviewing. Will work on that

tstoeckler’s picture

Read through the patch except for the test coverage and the exported config.

  1. +++ b/core/modules/media/media.module
    @@ -93,9 +96,48 @@ function media_theme_suggestions_media(array $variables) {
    +function media_field_widget_string_textfield_form_alter(array &$element, FormStateInterface $form_state, array $context) {
    

    So, instead of altering this would it not make sense to provide a dedicated widget with this description and then do all those various checks in ::isApplicable()

    This would also be nice if we want to improve the input of oembed fields in the future, i.e. provide a preview, or do some magic input processing via JS or whatever. Then it would be nicer to have a proper widget instead of a form alter. What do you think?

  2. +++ b/core/modules/media/media.module
    @@ -93,9 +96,48 @@ function media_theme_suggestions_media(array $variables) {
    +  /** @var \Drupal\media\MediaInterface|OEmbedInterface $media */
    

    The |OEmbedInterface is wrong, I think.

  3. +++ b/core/modules/media/media.module
    @@ -93,9 +96,48 @@ function media_theme_suggestions_media(array $variables) {
    +  $source = $media->getSource();
    

    This is going to fatal for non-media entities using the string widget.

  4. +++ b/core/modules/media/src/MediaSourceBase.php
    @@ -301,7 +301,9 @@ public function createSourceField(MediaTypeInterface $type) {
    -    $base_id = 'field_media_' . $this->getPluginId();
    ...
    +    $base_id = 'field_media_' . $this->getDerivativeId();
    

    So the question to me is whether we use the base plugin ID or the derivate ID or both. I.e. the resulting ID could be "field_media_oembed", "field_media_youtube" or "field_media_oembed_youtube". You chose the second one here, but I'm whether the first one actually wouldn't be more sensible? That would promote more re-use of the field data.

  5. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,145 @@
    +  const CACHE_LIFETIME = 604800;
    

    Generally I think we would call this "MAX_AGE" in core.

  6. +++ b/core/modules/media/src/OEmbed/ProviderDiscovery.php
    @@ -0,0 +1,145 @@
    +      $name = $provider['provider_name'];
    +      $keyed_providers[$name] = new Provider($provider['provider_name'], $provider['provider_url'], $provider['endpoints']);
    

    Should we validate, that $provider has the required keys?

  7. +++ b/core/modules/media/src/OEmbed/ProviderException.php
    @@ -0,0 +1,51 @@
    +    $message = (string) new FormattableMarkup($message, $variables);
    

    Let's not invoke the markup component, can we not just use string concatenation or strstr() or something?

  8. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,207 @@
    +    // Let other modules alter the query string, because some oEmbed providers
    +    // are providing extra parameters. For example, Instagram also supports the
    +    // 'omitscript' parameter.
    +    $this->moduleHandler->alter('oembed_resource_url', $parsed_url);
    

    Should we give some context to modules doing this, then? I.e. the provider, for example?

  9. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,96 @@
    +class OEmbedProviderConstraintValidator extends ConstraintValidator implements ContainerInjectionInterface {
    

    Why can we not use the AllowedValuesConstraintValidator with an allowed values callback?

  10. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,348 @@
    +    $ready = file_prepare_directory($directory, FILE_CREATE_DIRECTORY | FILE_MODIFY_PERMISSIONS);
    +    if (!$ready) {
    

    Can we inline the call instead of having the extra $ready variable? That would be more standard with how file_prepare_directory() is used elsewhere.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
76.39 KB
45.27 KB

I fixed most of the things from #110

Some notes to that:
1. Yes, but changing that would be an BC break, right?
6. The provider URL is not relevant for us. Endpoint URLs are much more important.
11. I don't think so. getAll() throws ProviderException if it's not possible to retrieve anything from the oEmbed API. get() throws InvalidArgumentException if someone tries to get a non-existent provider.
13. Because we also throw this exception during the fetching of oEmbed data from the API. At that point there are no instanciated providers.
17. I did that corresponding to FileAudioFormatterTest

I also did a big refactoring. The YouTube thing didn't worked, because in OEmbedProviderConstraintValidator we only checked the URLs against the schemes. YouTube has no schemes, but provides discovery. So getProviderByUrl() has to use the discovery as well. Which leads to use the resource fetcher inside the provider discovery.
So I reorganized the things a bit. The ProviderDiscovery is now a ProviderCollecter, becuase thats everything it does. The ResourceFetcher does now only fetching resources. And the new OEmbedManager (not satisfied with this name), has getProviderByUrl and getResourceUrl.

Didn't looked at #112. Thats the next thing on my list

phenaproxima’s picture

Status: Needs review » Needs work
  1. +++ b/core/modules/media/media.module
    @@ -93,9 +96,39 @@ function media_theme_suggestions_media(array $variables) {
    +  if ($media->getSource() instanceof OEmbedInterface) {
    +    $provider_id = $media->getSource()->getMetadata($media, 'provider_id');
    

    Can we call $media->getSource() once and reuse the return value?

  2. +++ b/core/modules/media/media.module
    @@ -93,9 +96,39 @@ function media_theme_suggestions_media(array $variables) {
    +  if ($source->getSourceFieldDefinition($media->bundle->entity)->getName() != $field_definition->getName()) {
    

    We should use !== here.

  3. +++ b/core/modules/media/media.services.yml
    @@ -2,9 +2,17 @@ services:
    +  media.oembed.oembed_manager:
    +    class: Drupal\media\OEmbed\OEmbedManager
    +    arguments: ['@media.oembed.provider_collector', '@media.oembed.resource_fetcher', '@http_client', '@module_handler', '@cache.default']
    

    Can we rename this service media.oembed_manager?

  4. +++ b/core/modules/media/src/MediaSourceBase.php
    @@ -301,7 +301,9 @@ public function createSourceField(MediaTypeInterface $type) {
    +    // Some media sources are using a deriver, so their plugin IDs are
    +    // containing a ':' which is not allowed for field names.
    +    $base_id = 'field_media_' . $this->getDerivativeId();
    

    If the plugin is not a derivative, won't getDerivativeId() return NULL?

  5. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,152 @@
    +    if (!empty($this->formats)) {
    +      // If the endpoint specifies multiple formats we use the fist one.
    +      $format = reset($this->getFormats());
    

    If we're going to use $this->formats in the empty() check, we might as well use it in the reset() call as well.

  6. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,152 @@
    +  public function supportsDiscovery() {
    +    return $this->supportsDiscovery;
    +  }
    

    For consistency, let's cast this to bool.

  7. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,152 @@
    +   * Tries to match an URL against the endpoint schemes.
    

    Nit: "an" should be "a".

  8. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,152 @@
    +   * @return bool
    +   *   TRUE wether URL matches against the endpoint schemes, otherwise FALSE.
    

    "TRUE if the URL matches..."

  9. +++ b/core/modules/media/src/OEmbed/OEmbedManager.php
    @@ -0,0 +1,202 @@
    +/**
    + * Class OEmbedManager.
    + */
    +class OEmbedManager implements OEmbedManagerInterface {
    

    Doc comment needs to be fleshed out.

  10. +++ b/core/modules/media/src/OEmbed/OEmbedManager.php
    @@ -0,0 +1,202 @@
    +      if (empty($provider_info->getEndpoints())) {
    +        continue;
    +      }
    

    This is redundant and can be removed.

  11. +++ b/core/modules/media/src/OEmbed/OEmbedManager.php
    @@ -0,0 +1,202 @@
    +        if (empty($endpoint->getSchemes())) {
    +          continue;
    +        }
    

    This is also redundant and can be removed.

  12. +++ b/core/modules/media/src/OEmbed/OEmbedManagerInterface.php
    @@ -0,0 +1,44 @@
    +   * @throws \Drupal\media\OEmbed\ResourceException
    +   */
    +  public function getProviderByUrl($url);
    

    We need to explain under which circumstances the ResourceException can be thrown.

  13. +++ b/core/modules/media/src/OEmbed/OEmbedManagerInterface.php
    @@ -0,0 +1,44 @@
    +   * Tries to match the URL against the database of oEmbed providers.
    +   *
    +   * @param string $url
    +   *   The URL to be tested.
    +   * @param int|null $max_width
    +   *   Max width for the oEmbed resource.
    +   * @param int|null $max_height
    +   *   Max height for the oEmbed resource.
    +   *
    +   * @return bool
    +   *   If a match was found, will return TRUE, otherwise FALSE.
    +   *
    +   * @see \Drupal\media\ResourceFetcherInterface::getProviders()
    +   *
    +   * @throws \Drupal\media\OEmbed\ResourceException
    +   */
    +  public function getResourceUrl($url, $max_width = NULL, $max_height = NULL);
    

    This doc comment seems out of date. Shouldn't getResourceUrl() return a URL? If it should return a bool, we should rename the method.

  14. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,107 @@
    +   * @param string[] $endpoints
    +   *   List of endpoints this provider exposes.
    +   *
    +   * @throws \Drupal\media\OEmbed\ProviderException
    +   */
    +  public function __construct($name, $url, array $endpoints) {
    

    $endpoints is not an array of strings. Also, we need to document the circumstances under which the ProviderException is thrown.

  15. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,107 @@
    +  public function getEndpoints() {
    +    return $this->endpoints;
    +  }
    

    I think that this method should instantiate the Endpoint objects, not the constructor.

  16. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,107 @@
    +   * Builds the endpoint URL and returns it.
    

    "Builds and returns the endpoint URL."

  17. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,107 @@
    +   * The oEmbed API could provide multiple endpoints, but never does this.
    +   * So the resource URL will be built from the first available endpoint.
    +   *
    +   * @param string $url
    +   *   The canonical media URL.
    +   *
    +   * @return string
    +   *   URL of the oEmbed endpoint.
    +   */
    +  public function buildResourceUrl($url) {
    

    This seems like it should be part of the Endpoint object, not the Provider object.

  18. +++ b/core/modules/media/src/OEmbed/ProviderCollectorInterface.php
    @@ -0,0 +1,40 @@
    +/**
    + * Interface for OEmbedProviderCollector service.
    + */
    +interface ProviderCollectorInterface {
    

    The doc comment needs to be more detailed.

  19. +++ b/core/modules/media/src/OEmbed/ProviderCollectorInterface.php
    @@ -0,0 +1,40 @@
    +   * Returns an array where each provider is represented as a provider object
    +   * element. Each individual element is keyed by the provider's name, and has
    

    Should be rephrased a bit: "Returns an array of Provider objects, keyed by provider name." Also, let's make this the @return description.

  20. +++ b/core/modules/media/src/OEmbed/ProviderCollectorInterface.php
    @@ -0,0 +1,40 @@
    +   * @throws \Drupal\media\OEmbed\ProviderException
    +   */
    +  public function getAll();
    

    We need to document the circumstances of the ProviderException.

  21. +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,24 @@
    +   * @return array
    +   *   Resource information as returned from the oEmbed endpoint, or FALSE if
    +   *   the resource could not be fetched.
    +   *
    +   * @throws \Drupal\media\OEmbed\ResourceException
    +   */
    +  public function fetchResource($endpoint_url);
    

    Two things: 1) Document the circumstances of the ResourceException; and 2) Let's add a @see or @link to the expected structure of the oEmbed resource.

  22. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,219 @@
    +      catch (ResourceException $exception) {
    +        $this->logger->error("Could not retrieve the remote URL.");
    +        continue;
    +      }
    

    We should log the actual URL as well, plus additional information (HTTP status code? Response content?) if we can get it.

  23. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,349 @@
    +    $allowed_providers = $this->getSupportedProviders();
    +    if (!empty($this->configuration['allowed_providers'])) {
    +      $allowed_providers = $this->configuration['allowed_providers'];
    +    }
    +    return $allowed_providers;
    

    Nit: We can shorten this to return $this->configuration['allowed_providers'] ?: $this->getSupportedProviders();

  24. +++ b/core/modules/media/src/Plugin/media/Source/OEmbedInterface.php
    @@ -0,0 +1,28 @@
    +/**
    + * Interface for OEmbed source.
    + */
    +interface OEmbedInterface extends MediaSourceFieldConstraintsInterface {
    

    Can we improve the doc comment here?

  25. +++ b/core/modules/media/tests/src/FunctionalJavascript/MediaSourceRemoteVideoTest.php
    --- /dev/null
    +++ b/core/profiles/standard/config/optional/core.entity_form_display.media.remote_video.default.yml
    

    I think we should move all of this default configuration into a follow-up issue.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
78.33 KB
12.25 KB

Ok, in this patch I addressed #112

I am still on it to address #114

chr.fritsch’s picture

tstoeckler’s picture

Awesome work! I really like the new setup with OEmbedManager/ProviderCollector/ResourceFetcher. I really found almost only nitpicks, the whole patch is very nice to read now. Yay!

  1. +++ b/core/modules/media/media.api.php
    @@ -20,6 +20,18 @@ function hook_media_source_info_alter(array &$sources) {
    +function hook_oembed_resource_url_alter(array &$parsed_url, \Drupal\media\OEmbed\Provider $provider) {
    

    Nice!

  2. +++ b/core/modules/media/media.module
    @@ -13,6 +13,8 @@
    +use Drupal\media\MediaInterface;
    

    This is unused.

  3. +++ b/core/modules/media/src/OEmbed/OEmbedManager.php
    @@ -0,0 +1,191 @@
    +        $this->urlCache[$url] = $cached->data;
    ...
    +    if ($url instanceof Url) {
    +      $url = $url->toString();
    +    }
    

    I guess the ->toString() should come before using it as an array key.

  4. +++ b/core/modules/media/src/Plugin/Field/FieldWidget/OEmbedWidget.php
    @@ -0,0 +1,53 @@
    + *     "string"
    + *   }
    

    Super-nitpick: Could add a trailing comma.

  5. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,91 @@
    +      $this->context->addViolation($this->t('An error occurred while trying to retrieve the oEmbed provider database.'));
    ...
    +      $this->context->addViolation($this->t('The given URL does not match any known oEmbed providers.'));
    ...
    +      $this->context->addViolation($constraint->message, [
    

    Let's add all of these as different messages, i.e. $constraint->unallowedMessage, $constraint->emptyMessage, $constraint->invalidMessage (or something like that).

  6. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,351 @@
    +    $display->getComponent($source_field)['weight'];
    

    I think this is leftover debug.

  7. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -0,0 +1,81 @@
    +          'uri' => 'https://twitter.com/drupaldevdays/status/935643039741202432',
    

    Nice! ;-)

  8. +++ b/core/modules/media/tests/src/Functional/OEmbedProviderCollectorTest.php
    @@ -0,0 +1,106 @@
    +    $body = $this->getMockBuilder('\GuzzleHttp\Psr7\Stream')
    ...
    +    $response = $this->getMockBuilder('\GuzzleHttp\Psr7\Response')
    ...
    +        'Drupal\media\OEmbed\ProviderException',
    ...
    +        'Drupal\media\OEmbed\ProviderException',
    ...
    +        'Drupal\media\OEmbed\ProviderException',
    ...
    +        'Drupal\media\OEmbed\ProviderException',
    

    Could use ::class for extra credit.

  9. +++ b/core/modules/media/tests/src/Functional/OEmbedProviderCollectorTest.php
    @@ -0,0 +1,106 @@
    +
    

    Superfluous newline.

  10. +++ b/core/modules/media/tests/src/Functional/OEmbedTest.php
    @@ -0,0 +1,63 @@
    +    $cacheBackend = $this->container->get('cache.default');
    ...
    +    $providerCollector = $this->container->get('media.oembed.provider_collector');
    +    $resourceFetcher = $this->container->get('media.oembed.resource_fetcher');
    ...
    +    $cacheBackend = $this->container->get('cache.default');
    ...
    +    $oEmbedProviderDiscovery = $this->container->get('media.oembed.provider_collector');
    +    $providerCollector = $this->container->get('media.oembed.provider_collector');
    

    Let's use snake_case here. Or use lowerCamelCase consistently throughout the class.

  11. +++ b/core/modules/media/tests/src/Functional/OEmbedTest.php
    @@ -0,0 +1,63 @@
    +    $this->assertSame('Let\'s Not Get a Drink Sometime', $resource['title']);
    

    Let's use double quotes here to avoid the escaping.

  12. +++ b/core/modules/media/tests/src/FunctionalJavascript/MediaSourceRemoteVideoTest.php
    @@ -0,0 +1,92 @@
    +  public static $modules = [
    +    'link',
    +  ];
    

    Missing @inheritdoc

  13. +++ b/core/modules/media/tests/src/FunctionalJavascript/MediaSourceRemoteVideoTest.php
    @@ -0,0 +1,92 @@
    +    $provided_fields = [
    +      'type',
    +      'title',
    +      'author_name',
    +      'author_url',
    +      'provider_name',
    +      'provider_url',
    +      'cache_age',
    +      'thumbnail_url',
    +      'thumbnail_local_uri',
    +      'thumbnail_local',
    +      'thumbnail_width',
    +      'thumbnail_height',
    +      'url',
    +      'width',
    +      'height',
    +      'html',
    +    ];
    

    @chr.fritsch told me I should tell him that we should also test that provider_id is returned (I would obviously have never seen this, because I am too focused on finding pointless nitpicks)

  14. +++ b/core/modules/media/tests/src/FunctionalJavascript/MediaSourceRemoteVideoTest.php
    @@ -0,0 +1,92 @@
    +    // Try to create a youtube item, which should not be allowed.
    ...
    +    $page->fillField("Remote video", 'https://clips.twitch.tv/SleepyBoringLapwingRuleFive');
    ...
    +    $assert_session->pageTextContains('The Twitch provider is not allowed.');
    

    s/youtube/Twitch/

  15. +++ b/core/modules/media/src/OEmbed/OEmbedManagerInterface.php
    @@ -0,0 +1,42 @@
    +   * Tries to match the URL against the database of oEmbed providers.
    ...
    +  public function getResourceUrl($url, $max_width = NULL, $max_height = NULL);
    

    This description doesn't really match the name getResourceUrl()

  16. +++ b/core/modules/media/src/OEmbed/OEmbedManagerInterface.php
    @@ -0,0 +1,42 @@
    +   * @return bool
    

    @daniel.bosen points out this is not actually TRUE.

  17. +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,28 @@
    +   * @see https://oembed.com/#section2
    

    Very nice!

chr.fritsch’s picture

Assigned: chr.fritsch » Unassigned
Status: Needs work » Needs review
FileSize
73.79 KB
12.18 KB

After 14h of heavy hacking around, and fixing more than 100 suggestions, I am done for today. I think we made a good step forward today. Hopefully, tomorrow will be as productive as today 😀 Thank you everyone for continuously reviewing.

chr.fritsch’s picture

More cleanup and documentation changes.

chr.fritsch’s picture

Issue summary: View changes
FileSize
1.62 MB

Updated the video

alexpott’s picture

@chr.fritsch asked me to look at the security considerations. From oembed.com:

When a consumer displays HTML (as with video embeds), there's a vector for XSS attacks from the provider. To avoid this, it is recommended that consumers display the HTML in an iframe, hosted from another domain. This ensures that the HTML cannot access cookies from the consumer domain.

I think we have to make it easy and recommended to set-up your site in the secure way oembed.com recommends.

chr.fritsch’s picture

Just more cleanups

bhanuprakashnani’s picture

Status: Needs review » Reviewed & tested by the community

Patch applies cleanly.

phenaproxima’s picture

Status: Reviewed & tested by the community » Needs review

Thanks for checking it, but this is not quite ready to go. Still needs pretty thorough review and an explicit answer to the security concerns pointed out by @alexpott.

chr.fritsch’s picture

Ok, what about that:

I extended the oEmbedFormatter to define which additional HTML tags should pass the XSS filtering. Then I adjusted the deriver to pass in allowed HTML tags into the oEmbed source. This configuration will then used to configure the formatter in prepareViewDisplay.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
75.25 KB
803 bytes

Ok, this obviously not a patch file...

alexpott’s picture

I was asked in slack what I think about filtering using Xss::filter($youtube_html, ['iframe']); and allowing custom tags to be allowed. I still think this is some way away from the oembed's best practice of serving the content from an different domain. I wonder if https://www.owasp.org/index.php/HTML5_Security_Cheat_Sheet#Sandboxed_frames helps us.

alexpott’s picture

Basically we need this:

This ensures that the HTML cannot access cookies from the consumer domain.

chr.fritsch’s picture

FileSize
9.36 KB

So I tried a bit the iFrame with sandbox parameter approach. It would work for us but we would have to add sandbox="allow-same-origin allow-scripts" to get YouTube and Vimeo running.

From https://html.spec.whatwg.org/multipage/iframe-embed-object.html#attr-ifr...

Setting both the allow-scripts and allow-same-origin keywords together when the embedded page has the same origin as the page containing the iframe allows the embedded page to simply remove the sandbox attribute and then reload itself, effectively breaking out of the sandbox altogether.

In my understanding, this would mean our effort is basically pointless 😢

phenaproxima’s picture

@alexpott, @chr.fritsch and I discussed this in Slack today. We came away with some pretty solid conclusions.

To display oEmbed content securely, we need to sandbox it in an iframe which has no access to the Drupal domain's cookies, in order to prevent session hijacking. Core, however, makes this rather difficult because session cookies are, by default, accessible to both the main domain and any of its subdomains. (See \Drupal\Core\Session\SessionConfiguration::getCookieDomain()).

So the first thing that needs to happen is that #2522002: Do not strip www. from cookie domain by default because that leaks session cookies to subdomains needs to be revisited (#2347877: Move DrupalKernel::initializeCookieGlobals() into a SessionConfiguration service was also cited as related literature). Session cookies should probably not apply to all subdomains, but that is potentially quite complicated. In any event, @alexpott determined that this work does not need to block Media, because we came up with a strategy for how to proceed...

With the current oEmbed architecture proposed in this patch, the ideal way to display oEmbed content securely is to show it in an iframe that is served from the same Drupal site, on an alternate domain (not a subdomain). So Media will:

  1. Add a controller whose only purpose is to display oEmbed content in an iframe. The oEmbed formatter will display video and rich content in an iframe using this controller.
  2. In the name of getting things to "just work", the oEmbed formatter will by default display the iframe on the main domain, which NOT secure. So we will pepper the UI with warnings and hook_help() documentation explaining why this is not secure.
  3. In order to give site builders something actionable they can do about this, we will add a new option to the formatter which allows the site builder to specify the domain from which to serve the iframe. Therefore, if they have an alternate domain that can serve the oEmbed content securely, they can go ahead and use that.
  4. Once core changes its session cookies such that they are inaccessible to subdomains, we will still have warnings and documentation in the UI, but we will encourage people to set up a subdomain for serving oEmbed content securely, rather than hoping they have an alternate domain. This will make it possible and easy for more people to set up oEmbed support in a secure way, since it's easier and cheaper to have a subdomain than an alternate domain.

@alexpott gave us this handy guide to determine if we're showing oEmbed content securely:

no alternative domain != secure
sub domain but cookie starts with a dot != secure
sub domain but cookie is specific = secure
completely different domain = secure

So, in essence: although we are going to allow people to do something that's not very secure, we are going to make it very obvious and easy to configure it in a secure manner.

Another snippet which came out of this conversation is the possibility of not supporting the http scheme for any oEmbed endpoint, at least not by default. There is no longer any good reason to be using HTTP instead of HTTPS in general, and no serious oEmbed provider should be preferring HTTP over HTTPS at this point. So it seems reasonable to force all endpoints to use the HTTPS scheme, or maybe not support providers which don't support secure endpoints. This could and probably should be done in a follow-up issue, though.

alexpott’s picture

So I've read through #2522002: Do not strip www. from cookie domain by default because that leaks session cookies to subdomains again and a patch was committed to 8.x for that so if you force to www on your domain your cookie domain will be .www.example.com</com>. I'm not sure why we are prefixing the dot though. But the good news is that in most sensible site configurations we'll be subdomain secure - apart from where you force <code>www.example.com to example.com

chr.fritsch’s picture

Ok, I started to implement the proposal from #132.

What I did so far:

  1. Implemented a controller to show an oEmbed resource
  2. The formatter now renders an iFrame for video and rich media
  3. The formatter can receive a domain and will then call the oEmbed route based on that domain

Still missing is the extensive warning and I suppose that some tests will fail now.

I will continue tomorrow morning on that.

tstoeckler’s picture

Wow, it's really cool to be surrounded by such smart people in this community. Very impressive research and solution-finding by all of you!

Some thoughts, all but 4. are just nitpicks, though.

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,66 @@
    + * Class OEmbedController.
    

    Our descriptions often aren't really meaningful, but I think we can do just a little bit better ;-)

  2. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,66 @@
    +class OEmbedController extends ControllerBase {
    

    I think this is not needed.

  3. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,66 @@
    +    $response = \Drupal::service('renderer')->render($element);
    

    I think the renderer should be injected.

  4. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,66 @@
    +    return new Response($response);
    

    I was going to ask why you are not just returning the render array directly and let the request processing system render that. The answer, I think, is that if you were to do that the oEmbed HTML would be "decorated" by all the usual blocks, page template, ...

    I then thought that this could also be achieved by using a _wrapper_format in the URL that is requested in the formatter. Not sure if we already have one that just renders the controller result without any wrapper markup but I guess modals/dialogs have a similar problem, so maybe it would actually work.

    On the other hand, the resulting HTML that will be output by this template actually includes an <html> and <body> tag already, so if we were to utilize such a wrapper format, the resulting HTML on a page without that wrapper format would actually be invalid due to the nested HTML tags. So perhaps this is the best way to go.

    Maybe someone more into the request processing system (@Wim Leers ?!) can comment. Also, if we do stick with the manual rendering here could probably use a little code comment.

  5. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -152,20 +164,26 @@ public function viewElements(FieldItemListInterface $items, $langcode) {
    +            $url = $this->urlGenerator->generateFromRoute('media.oembed.render', ['url' => $resource_url], ['base_url' => $this->getSetting('iframe_domain')]);
    

    Instead of using the URL generator, can't we just do $url->setOption('base_url', ...)? That would also avoid having to inject it.

  6. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -152,20 +164,26 @@ public function viewElements(FieldItemListInterface $items, $langcode) {
    +            '#template' => '<iframe src="{{ url }}" frameborder="0" scrolling="false" allowtransparency="true" width="{{ width}}" height="{{ height }}"></iframe>',
    

    Maybe adding a link to the URL @alexpott posted above would be nice, as reasoning for why we are doing this.

  7. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -192,10 +210,10 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      'iframe_domain' => [
    +        '#type' => 'url',
    

    Was about to complain about missing validation of the URL, but the URL element already does that for us. Nice!

  8. +++ b/core/modules/media/templates/media-oembed.html.twig
    @@ -0,0 +1,14 @@
    + * Default theme implementation to display a oEmbed resource.
    

    Super nitpick:
    a -> an

  9. +++ b/core/modules/media/templates/media-oembed.html.twig
    @@ -0,0 +1,14 @@
    +<html>
    +<body style="margin: 0">
    +
    +{{ post|raw }}
    

    Our regular html.html.twig indents the <body> element so we should do that here, as well.

    The {{ post|raw }} should also be indented (twice) and the empty line can be removed, I think.

  10. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -74,8 +74,8 @@ public function testRender() {
    -    $assert_session->elementContains('css', "blockquote.twitter-tweet > p", 'Save the date for Drupal Developer Days Lisbon 2018');
    +    $assert_session->elementExists('css', "iframe");
    +    $assert_session->elementAttributeContains('css', "iframe", 'src', '/media/oembed?url=https%3A//publish.twitter.com/oembed%3Furl%3Dhttps%253A//twitter.com/drupaldevdays/status/935643039741202432');
    

    You can use $this->getSession()->switchToIFrame() to actually test stuff within the iframe.

chr.fritsch’s picture

Thanks for reviewing again.

Fixed all the mentioned suggestions.

I am not sure if we should store the iFrame URL in the field formatter, or better create a settings page for that, because you probably want to configure the domain just once and not everytime you use the formatter.

samuel.mortenson’s picture

Status: Needs review » Needs work

Reviewing the patch from #136, there appear to be multiple security vulnerabilities:

  1. We're not using iframes in a way that would make us more secure. The iframe loads content from a local path (/media/oembed), which still renders whatever HTML the oembed resource returns. If an oembed contains malicious JS, that would still get executed in the context of a local path and have access to the current user's cookies.
  2. The /media/oembed path has no authentication and doesn't validate the passed-in URL. That means that you can easily trigger XSS with a spoofed oembed response. I've put together a quick POC of this, available to anyone testing this patch by visiting /media/oembed?url=https://cdn.rawgit.com/mortenson/deabebe5ff31e8afb6ced6b3a2a13b62/raw/f1fd7d413f0b0438c69ccf74a525864c40c0d180/oembed.json on their site. The Github gist is available here: https://gist.github.com/mortenson/deabebe5ff31e8afb6ced6b3a2a13b62
  3. simplexml_load_string is used in \Drupal\media\OEmbed\ResourceFetcher::fetchResource and is vulnerable to XXE. The core \Drupal\serialization\Encoder\XmlEncoder should be used instead, which extends the Symfony class which appropriately protects against XXE.
phenaproxima’s picture

Thanks for reviewing, @samuel.mortenson!

\Drupal\media\OEmbed\ResourceFetcher::fetchResource and is vulnerable to XXE. The core \Drupal\serialization\Encoder\XmlEncoder should be used instead, which extends the Symfony class which appropriately protects against XXE.

If we use this, won't Media have a dependency on Serialization? That's not the worst thing in the world, but it would be preferable not to introduce additional dependencies. Is there any chance of moving XmlEncoder into the \Drupal\Component namespace, or maybe just using the Symfony class directly?

samuel.mortenson’s picture

1.

+++ b/core/modules/media/src/OEmbed/OEmbedManager.php
@@ -0,0 +1,183 @@
+  protected function discoverResourceUrl($url) {
+    try {
+      $response = $this->httpClient->get($url);
+    }
+    catch (RequestException $e) {
+      throw new ResourceException('Could not fetch oEmbed resource.', $url, [], $e);
+    }
+
+    $document = Html::load((string) $response->getBody());
+    $xpath = new \DOMXpath($document);
+
+    return $this->findUrl($xpath, 'json') ?: $this->findUrl($xpath, 'xml');

This method also seems spoofable. I didn't create a full POC but managed to create an HTML file that had an oembed link to a JSON file with malicious content - which was fully validated until it was eventually passed to YouTube which didn't know how to handle the oembed. You can debug this locally by putting https://rawgit.com/mortenson/af20b4f7faecb1aa75362e04354ec64f/raw/ccea86... into a remote video source field.

Sources:
https://gist.github.com/mortenson/af20b4f7faecb1aa75362e04354ec64f
https://gist.github.com/mortenson/0d0d746685053b54df07df01e8e78649

We should validate the passed in $url to this method somehow, so that we're not requesting HTML from arbitrary sources. In addition to that, we should validate that the URL returned by this method is supported by a provider - right now we just request it and determine the provider based on the provider_name it contains:

+++ b/core/modules/media/src/OEmbed/OEmbedManager.php
@@ -0,0 +1,183 @@
+    if ($resource_url = $this->discoverResourceUrl($url)) {
+      $resource = $this->resourceFetcher->fetchResource($resource_url);
+      return $this->providerCollector->get($resource['provider_name']);
+    }

Which seems backwards to how this normally works.

2.

+++ b/core/modules/media/src/OEmbed/Endpoint.php
@@ -0,0 +1,168 @@
+      $regexp = str_replace(['.', '*'], ['\.', '.*'], $scheme);
+      if (preg_match("|$regexp|", $url)) {
+        return TRUE;

Can we use ^ and $ in the regex, for safety reasons? Also, when we convert "*" to ".*", that's going to allow any number of forward slashes as well. With the way this currently works, the scheme http://www.23hq.com/*/photo/* will match a lot of bad URLs like https://mortenson.coffee?foo=http://www.23hq.com/foo/photo/bar

alexpott’s picture

Issue summary: View changes

I've updated the issue summary with some of the security discussions from recent comments so that people don't have to read everything to know what has been considered.

Re XXE and the XML encoder - just create a new \Symfony\Component\Serializer\Encoder\XmlEncoder in our code. Moving the serializer seems unnecessary.

@samuel.mortenson++

alexpott’s picture

I've opened #2940879: Don't automatically set a cookie domain to make example.com's cookie as secure as www.example.com's cookie.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
81.07 KB
6.59 KB

This patch should address #137 2/3.

Regarding 1:
For that reason, we are offering the possibility to define a different domain which would render the iframe content. See the field formatter settings for this.

chr.fritsch’s picture

Re XMLEncoder:
I am using Symfony\Component\Serializer\Encoder\XmlEncoder because Wim wants to kill Drupal\serialization\Encoder\XmlEncoder in #2926034: Deprecate 'xml' serialization format in Drupal 10

samuel.mortenson’s picture

@chr.fritsch So to use Oembed securely, Drupal users will have to register a new sub-domain that points to the same site, but does _not_ share cookies? Also, that sub-domain will share cookies by default if the primary domain does not redirect to www.domain.com. How many Drupal sites will actually set this up in production?

samuel.mortenson’s picture

For #145 I don't expect us to solve the oembed security problem, but I'm just trying to get a handle on what it would take to really use this securely. The more I research oembed the more I realize that basically everyone just trusts the HTML response, so the fact that we have a path forward for security is better than most implementations I've seen.

I think we can ignore #137.1 unless someone has a better idea than what we're doing now. I would love to use something like the iframe srcdoc attribute for this but it's not supported in IE11.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
81.55 KB
81.55 KB

Re #139.1:
I am not sure how to solve that because not all providers are supporting schemas. That's the reason we also have to try to get a resource when schema fetching doesn't work.

Solved ##13.2

chr.fritsch’s picture

Forgot the interdiff

alexpott’s picture

Status: Needs review » Needs work
+++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
@@ -0,0 +1,269 @@
+      'iframe_domain' => [
+        '#type' => 'url',
+        '#title' => $this->t('iFrame domain'),
+        '#default_value' => $this->getSetting('iframe_domain'),
+      ],

This needs a description and an explanation of why using a sub domain is a good idea. Plus I think media should implement a hook_requirements warning that not using a separate iframe_domain is insecure.

Plus it would be great to have reviews on #2940879: Don't automatically set a cookie domain

+++ b/core/modules/media/src/OEmbed/Provider.php
@@ -0,0 +1,88 @@
+  /**
+   * The provider endpoints.
+   *
+   * @var \Drupal\media\OEmbed\Endpoint[]
+   */
+  protected $endpoints = [];
...
+    foreach ($this->endpoints as $endpoint) {
+      $endpoint += ['formats' => [], 'schemes' => [], 'discovery' => FALSE];
+      $endpoints[] = new Endpoint($endpoint['url'], $this, $endpoint['schemes'], $endpoint['formats'], $endpoint['discovery']);
+    }

This isn't an array of endpoint objects. It's an array of endpoint info stored in an array.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
86.47 KB
12.4 KB

I talked to @phenaproxima and we came to the conclusion that the iFrame domain should be stored globally and not on formatter level. If you did the setup once, you want to use that in every formatter.

For that reason, I created a new settings form for media.

samuel.mortenson’s picture

Minor review from today:

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,106 @@
    +    catch (\Exception $exception) {
    +      $response = '';
    +    }
    

    Can we handle this more gracefully? We throw a number of specific exceptions with messages that could be valuable for end-users. If every exception also sets a Drupal message I suppose that's OK as well.

  2. +++ b/core/modules/media/src/OEmbed/ProviderException.php
    @@ -0,0 +1,46 @@
    +  /**
    +   * Gets the provider information, if available.
    +   *
    +   * @return \Drupal\media\OEmbed\Provider
    +   *   The information about the oEmbed provider which caused the exception.
    +   */
    +  public function getProvider() {
    +    return $this->provider;
    +  }
    

    Is this ever used? If not, I think we can remove it.

I also wanted to re-phrase #146 and say that my prior security concerns have been addressed.

dawehner’s picture

This is some amazing piece of work!

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,106 @@
    +    if (!($url = $request->get('url'))) {
    

    $request->get supports both POST and GET. I guess oembed just uses GET, so let's use $request->query->get() here.

  2. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,106 @@
    +
    +    // Return a new response object to not get all the Drupal blocks around the
    +    // post.
    +    return new Response($response);
    

    Is there a reason we don't provide a cacheable response?

  3. +++ b/core/modules/media/src/Form/SettingsForm.php
    @@ -0,0 +1,96 @@
    +    if (!$this->oEmbedManager->isOEmbedUrlSecure($config->get('iframe_domain'))) {
    +      $message = $this->t('Rendering oEmbed content on the same domain is considered to be insecure.');
    +      $this->messenger()->addWarning($message);
    +    }
    

    I'm curious whether we could link to an article maybe, explaining why this is the case.

  4. +++ b/core/modules/media/src/OEmbed/OEmbedManager.php
    @@ -0,0 +1,196 @@
    +  public function isOEmbedUrlSecure($oEmbedUrl) {
    +    if (!$oEmbedUrl) {
    +      return FALSE;
    +    }
    +    $iframe = parse_url($oEmbedUrl);
    +    global $base_url;
    +    $system = parse_url($base_url);
    +    return $iframe['host'] !== $system['host'];
    +  }
    

    Note: We could use the request 'router.request_context' service instead of $base_url. It has a getCompleteBaseUrl to use.

  5. +++ b/core/modules/media/src/OEmbed/ProviderCollector.php
    @@ -0,0 +1,116 @@
    +  public function getAll() {
    +    $cache_id = 'media:oembed_providers';
    +
    +    $cached = $this->cacheBackend->get($cache_id);
    +    if ($cached) {
    +      return $cached->data;
    +    }
    +
    +    try {
    

    I'm curious whether this method should have a static cache.

  6. +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,31 @@
    +   *
    +   * @return array
    +   *   Resource information as returned from the oEmbed endpoint, or FALSE if
    +   *   the resource could not be fetched.
    +   *
    +   * @see https://oembed.com/#section2
    ...
    +   */
    +  public function fetchResource($endpoint_url);
    

    I'm curious, could we put this result into its own value object, with nice getters for the common keys?

    type (required)
    The resource type. Valid values, along with value-specific parameters, are described below.
    version (required)
    The oEmbed version number. This must be 1.0.
    title (optional)
    A text title, describing the resource.
    author_name (optional)
    The name of the author/owner of the resource.
    author_url (optional)
    A URL for the author/owner of the resource.
    provider_name (optional)
    The name of the resource provider.
    provider_url (optional)
    The url of the resource provider.
    cache_age (optional)
    The suggested cache lifetime for this resource, in seconds. Consumers may choose to use this value or not.
    thumbnail_url (optional)
    A URL to a thumbnail image representing the resource. The thumbnail must respect any maxwidth and maxheight parameters. If this parameter is present, thumbnail_width and thumbnail_height must also be present.
    thumbnail_width (optional)
    The width of the optional thumbnail. If this parameter is present, thumbnail_url and thumbnail_height must also be present.
    thumbnail_height (optional)
    The height of the optional thumbnail. If this parameter is present, thumbnail_url and thumbnail_width must also be present.

    seems to be a list we could use here.

  7. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraint.php
    @@ -0,0 +1,38 @@
    + */
    +class OEmbedProviderConstraint extends Constraint {
    
    +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedProviderConstraintValidator.php
    @@ -0,0 +1,116 @@
    +/**
    + * Validates the OEmbedProvider constraint.
    + */
    +class OEmbedProviderConstraintValidator extends ConstraintValidator implements ContainerInjectionInterface {
    

    +1 for adding constraints!

  8. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -0,0 +1,81 @@
    +class OEmbedFormatterTest extends BrowserTestBase {
    

    It feels like this could be totally a kernel test

phenaproxima’s picture

Apologies for the big interdiff. I did some refactoring, documentation fixes, and renaming of things. I also addressed some of @dawhener's feedback in #152:

  1. Fixed.
  2. Changed to CacheableResponse.
  3. I added a to-do, but let's file a follow-up for that documentation.
  4. Done.
  5. I decided to hold off for now because I'm not sure how much of a benefit that will confer, beyond just reducing the number of database reads.
  6. Punting for now; @chr.fritsch will work on that.
  7. :) I actually merged the two constraints into once, since I realized that they were both really just validating a resource URL.
  8. Deferring for now; let's get everything else locked down (especially the resource value objects) before refactoring the tests.
phenaproxima’s picture

Ugh. Forgot to complete the rename of thumbnails_location to thumbnails_uri.

dawehner’s picture

+++ b/core/modules/media/src/Controller/OEmbedController.php
@@ -94,13 +106,14 @@
-    catch (\Exception $exception) {
+    catch (\Exception $e) {
+      $this->logger->error($e->getMessage());
       $response = '';
     }
 
     // Return a new response object to not get all the Drupal blocks around the
     // post.
-    return new Response($response);
+    return new CacheableResponse($response);

I think in the case of an exception we don't want to cache it. Also not, when you don't assign any cacheabiity metdadata at all, it'll cache forever. Any idea what we could do there?

phenaproxima’s picture

Any idea what we could do there?

Yes. I'd like to have the resource value objects implement CacheableDependencyInterface, and we'll have the CacheableResponse use their cache metadata. But that is blocked on @chr.fritsch's implementation of the value objects.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
105.75 KB
51.21 KB

Apologies for another gigantic interdiff. It took some doing, but we implemented a Resource value object and made tests pass with it. The next step is to mock HTTP client requests in our tests, since we really can't rely on random junk from Vimeo in order for our tests to pass :)

aheimlich’s picture

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,129 @@
    +use Drupal\media\OEmbed\Resource\RichResource;
    

    This class doesn't seem to exist.

  2. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,392 @@
    +use Drupal\media\OEmbed\Resource\PhotoResource;
    

    This class also does not seem to exist.

phenaproxima’s picture

Issue tags: +Nashville2018

Per #151, security team concerns have been addressed. Also, we have been hacking heavily on this patch at DrupalCon Nashville. So, adjusting tags to match.

chr.fritsch’s picture

So I reworked the tests and added a bunch of fixtures to simulate oEmbed requests

phenaproxima’s picture

You're a rockstar, dude. That is amazing work -- totally removes any external Internet-based dependency from our tests, even the provider database at oembed.com. WOW!

I did a couple of minor changes, plus a bit of refactoring on the tests -- namely, I removed any need to mock the HTTP client service :) Trying to wrap a mock object around a real object, especially an object that has as many meta-methods as the HTTP client, is utterly painful with Prophecy. This way should be a lot easier.

This is awesome teamwork. I have a feeling that this is very close to ready. Don't be put off by the size of the interdiff; I just renamed a few files, and git is being stupid about it.

Also, really removing the security review tag this time, as per #161.

phenaproxima’s picture

Issue tags: +Needs usability review

Oh, and: we briefly discussed this patch in a Media Initiative meeting in Nashville. We explained how we are warning people about the insecure nature of third-party embeds (unless in a properly sandboxed iFrame), and @xjm asked us to run our solution past the usability folks. So I'm tagging this for usability review.

bkosborne’s picture

Coming into this real late. I scanned through the comments but didn't read them all carefully, but here's two concerns I have:

1) I get why the iframe solution is needed, and I've often balked at how oembed encourages people to blindly trust data from a third party. But even so, putting everything into an iframe seems to be really bad for UX, no? Iframes make styling much more difficult, especially when dealing with responsiveness. Also, many oembed providers will themselves just return the markup for an iframe, so we'd be embedding an iframe in an iframe in those cases. I think this is something you've all probably thought about, but there's really no good solution, and I don't have one, I just haven't seen this concern explicitly mentioned so here it is.

2) All the Guzzle requests to a 3rd party need to define network/response timeouts. Right now they're just defaulting to whatever Guzzle has as the default, which I think is 30 or 60 seconds. Imagine this scenario: You have an article on a site with 15 twitter or instragram media embeds. This is a popular article. Caches are rebuilt on your site (maybe a code deploy), and suddenly you have X amount of people hitting this page in a cold cache scenario. Every visit to this page results in 15 additional hits to Drupal, each of which is blocked on network I/O waiting for oembed data. If the oembed endpoint is down, or just slow, this can easily lock up your site.

I wrote about that concern in issue queues for the popular media entity provider modules, like #2910292: Performance issue if Instagram API is down. I think a good solution would be to introduce a configurable timeout, maybe 5 seconds, and also institute a "backoff" functionality, where if a certain amount of requests fail within X amount of time, all future requests won't even be attempted for Y number of seconds.

phenaproxima’s picture

All the Guzzle requests to a 3rd party need to define network/response timeouts. ... If the oembed endpoint is down, or just slow, this can easily lock up your site.

This is a valid concern for sure and I think it's one we can address.

This is a great idea:

I think a good solution would be to introduce a configurable timeout, maybe 5 seconds, and also institute a "backoff" functionality, where if a certain amount of requests fail within X amount of time, all future requests won't even be attempted for Y number of seconds.

However, it sounds like something that should be implemented upstream in Guzzle, if it's not already.

Any concerns about the iframe, though, I'd rather deal with in a follow-up. This patch is big and complex enough and we should not do anything to increase the scope. Security is the top priority, and what we came up with is good enough for jazz in that department. IMHO the surrounding issues should be secondary for the time being, but I defer to the UX team for the final word on that.

chr.fritsch’s picture

So let's give that another push. I also tested it heavily and I am feeling really good about it now.

ckrina’s picture

Issue tags: -Needs usability review

We need make it simple to configure a different domain to serve the oembed content from and we need to have sufficient security warnings to inform users that their configuration is not secure.

Just reviewed this with @chr.fritsch and @phenaproxima and here are some suggestions:

I’d be handful to let users know about this when they are creating the new media type in a non-intrusive nor scaring way. Just briefly inform them about the recommended step before “they create a problem that needs to be resolved” and ideally point them to the next actionable step for that.

I wouldn’t use the warning styles for that because they haven’t done anything wrong and it could be confusing having it inside a form, it might make them feel like they’re doing something wrong. So I’d recommend adding a really short text with a link to the actionable step only when a oembed is selected.

I’d also keep the warning message in the status page and in the config page until the domain is added.

Just removing the 'Needs usability review' for now, but feel free to add it again.

chr.fritsch’s picture

Thanks @ckrina

So in this patch, I implemented all the suggestions that were raised by the usability review.

samuel.mortenson’s picture

Status: Needs review » Needs work

Some quick review of the value object. Still need to read through the entire patch (again).

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -93,27 +97,33 @@
    +      if ($resource instanceof RichResource) {
    +        // Render the content in a new render context so that cacheable metadata
    +        // will bubble up correctly.
    +        $content = $this->renderer->executeInRenderContext(new RenderContext(), function () use ($resource) {
    +          $element = [
    +            '#theme' => 'media_oembed',
    +            '#post' => [
    +              '#type' => 'inline_template',
    +              '#template' => $resource->getHtml(),
    +            ],
    +          ];
    +          return $this->renderer->render($element);
    +        });
    +      }
    

    Why are we using inline_template here? Doesn't this mean that OEmbed responses could return Twig template responses? :)

  2. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,335 @@
    +  protected function setType($type) {
    

    Missing documentation block.

  3. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,335 @@
    +    if (in_array($type, ['link', 'rich', 'photo', 'video'], TRUE)) {
    +      $this->type = $type;
    +    }
    

    Why is this list hard-coded? Is that based on a specification somewhere?

  4. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,335 @@
    +  protected function setThumbnailDimensions($width, $height) {
    

    Missing documentation block.

samuel.mortenson’s picture

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,131 @@
    +      $resource_url = $this->urlResolver->getResourceUrl($url, $request->get('max_width'), $request->get('max_height'));
    

    This should also use $request->query->get.

  2. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,357 @@
    +   * Returns the URL of the resource. Only applies to the 'photo' resource type.
    ...
    +   * Only applies to the 'rich' and 'video' resource types.
    

    Maybe we could have an object representing each kind of resource in the future so they don't all share attributes that may or may not be useful.

  3. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,148 @@
    +    $this->useCaches = isset($cache_backend);
    
    +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,210 @@
    +    $this->useCaches = isset($cache_backend);
    

    Is $this->useCaches ever used?

  4. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,148 @@
    +    list ($format) = $response->getHeader('Content-Type');
    

    Remove the space after list.

  5. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,148 @@
    +    $content = (string) $response->getBody();
    ...
    +    if (strstr($format, 'text/xml') || strstr($format, 'application/xml')) {
    ...
    +    elseif (strpos($format, 'text/javascript') === FALSE && strpos($format, 'application/json') === FALSE) {
    

    Is this a valid way to determine the format of a response? I'm wondering if there's an API method for this.

  6. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,210 @@
    +    if ($this->urlCache === NULL) {
    

    Can we set $this->urlCache to an empty array in the constructor to avoid this check?

    Also, if urlCache is only used in this method I wonder if we could remove it completely and instead just check from $this->cacheGet instead.

  7. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,248 @@
    +        // Render videos and rich content in a iFrame for security reasons.
    +        // @see: https://oembed.com/#section3
    +        $element[$delta] = [
    +          '#type' => 'inline_template',
    +          '#template' => '<iframe src="{{ url }}" frameborder="0" scrolling="false" allowtransparency="true" width="{{ width}}" height="{{ height }}"></iframe>',
    +          '#context' => [
    +            'url' => $url,
    +            'width' => $resource->getWidth(),
    +            'height' => $resource->getHeight(),
    +          ],
    +        ];
    

    Would using #type => html_tag be cleaner than using Twig for this?

  8. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,388 @@
    +    $form['thumbnails_uri'] = [
    +      '#type' => 'textfield',
    +      '#title' => $this->t('Thumbnails location'),
    +      '#default_value' => $this->configuration['thumbnails_uri'],
    +      '#description' => $this->t('Thumbnails will be fetched from the provider for local usage. This is the location where they will be placed.'),
    +    ];
    

    Should this be required? It could be set to an empty string as is, I think.

  9. +++ b/core/modules/media/tests/fixtures/oembed/lets-not-get-a-drink-sometime.json
    @@ -0,0 +1,16 @@
    +  "title": "Let's Not Get a Drink Sometime",
    ...
    +  "html": "<iframe src=\"http:\/\/www.collegehumor.com\/e\/40002870\" width=\"610\" height=\"343\" frameborder=\"0\" webkitAllowFullScreen allowFullScreen><\/iframe>",
    +  "thumbnail_url": "http:\/\/2.media.collegehumor.cvcdn.com\/19\/77\/1f1374e38e1e6552dd57ad12935a442a.jpg",
    
    +++ b/core/modules/media/tests/fixtures/oembed/lets-not-get-a-drink-sometime.xml
    @@ -0,0 +1,19 @@
    +  <html>&lt;iframe src="http://www.collegehumor.com/e/40002870" width="610" height="343" frameborder="0"
    +    webkitAllowFullScreen allowFullScreen&gt;&lt;/iframe&gt;
    +  </html>
    +  <thumbnail_url>http://2.media.collegehumor.cvcdn.com/19/77/1f1374e38e1e6552dd57ad12935a442a.jpg</thumbnail_url>
    

    Just confirming - with this and other example data, no external HTTP requests will ever be sent from tests right?

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
113.66 KB
6.86 KB

Thank you @samuel.mortenson for reviewing it again!

Fixed #170

Re #171
1. Fixed
2. We had that in the first place and decided then to merge everything into one object because of simplicity. The different types have most of the properties in common and we don't want to over-engineer it. This patch is already quite huge...
3. Yes in UseCacheBackendTrait
4. Fixed
5. I don't know. If there is something we could use I am open to suggestions.
6. Fixed
7. Fixed
8. Makes sense
9. Yes, that's what we are trying to achieve. I ran the tests without an internet connection, found one bug and fixed that.

samuel.mortenson’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests

Interdiff looks good - I can't find a better API-method for checking the Content-Type header, so I'm going to mark this as RTBC.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,128 @@
    +    if (!($url = $request->query->get('url'))) {
    

    Let's ensure that url is a string.

  2. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,128 @@
    +      $resource_url = $this->urlResolver->getResourceUrl($url, $request->query->get('max_width'), $request->query->get('max_height'));
    

    Let's ensure that max_width and max_height are of the expected type.

  3. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -0,0 +1,112 @@
    +    $providers = Json::decode((string) $response->getBody());
    +
    +    if (!is_array($providers) || empty($providers)) {
    +      throw new ProviderException('Remote oEmbed providers database returned invalid or empty list.');
    +    }
    +
    +    $keyed_providers = [];
    +    foreach ($providers as $provider) {
    +      $name = $provider['provider_name'];
    +      $keyed_providers[$name] = new Provider($provider['provider_name'], $provider['provider_url'], $provider['endpoints']);
    +    }
    

    Given the data is coming from a third party I think we need to be a bit more suspicious and do some stronger typing and checking of each value.

  4. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,358 @@
    +/**
    + * Value object representing an oEmbed resource.
    + */
    +class Resource implements RefinableCacheableDependencyInterface {
    
    +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,30 @@
    +   * @return \Drupal\media\OEmbed\Resource
    ...
    +  public function fetchResource($url);
    

    I think we need to document that the Resource can contain data that should not be trusted. Also I think we should be more strict on the things we accept. Ie. width and height need to be ints.

phenaproxima’s picture

Also I think we should be more strict on the things we accept. Ie. width and height need to be ints.

That is already the case. See Resource::setThumbnailDimensions() and Resource::setDimensions(). If we don't get numeric data, we throw an exception. The Resource class is quite strict.

chr.fritsch’s picture

Assigned: Unassigned » chr.fritsch

I will work on all these things on my flight back from DrupalCon.

I also found out that OEmbedFormatterTest is still calling oEmbed and Twitter API. I will try to prevent that.

chr.fritsch’s picture

Assigned: chr.fritsch » Unassigned
Status: Needs work » Needs review
FileSize
118.01 KB
25.43 KB

Ok, here is a new patch where I fixed all the suggestions from @alexpott.

I also adjusted the OEmbedFormatterTest. Now it's not calling the oEmbed API anymore.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
118.03 KB
4.11 KB

Oh, my fault.

Let's try to fix the tests.

alexpott’s picture

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -103,7 +107,7 @@ public function render(Request $request) {
    -      $resource_url = $this->urlResolver->getResourceUrl($url, $request->query->get('max_width'), $request->query->get('max_height'));
    +      $resource_url = $this->urlResolver->getResourceUrl($url, $request->query->getInt('max_width', NULL), $request->query->getInt('max_height', NULL));
    

    I didn't know about getInt() but this is very nice indeed.

  2. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -65,13 +65,21 @@ class Endpoint {
    +      // If the endpoint specifies multiple formats we use the fist one.
    

    First one?

  3. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -89,8 +90,13 @@ public function getAll() {
    +        // Just skip all the invalid providers.
    

    Should we log?

  4. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -8,6 +8,8 @@
    + * Data received from an oEmbed provider could by insecure.
    

    Looks good. I wish there was an annotation like @UserInput or something that denoted stuff is not to be trusted.

  5. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -140,8 +144,26 @@ class Resource implements RefinableCacheableDependencyInterface {
    +    if (empty($version) || $version !== '1.0') {
    +      throw new \InvalidArgumentException("Resource version must be '1.0'");
    +    }
    

    Seems a bit strict.

phenaproxima’s picture

Seems a bit strict.

Section 2.3.4 of the oEmbed specification says that all resources must have a version number, and it must be 1.0 (source). I don't see any real reason not to validate that.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
118.08 KB
1.29 KB

Addressed @alexpott's review:

2. Fixed
3. But where? I guess that would flood the watchdog table after every cache rebuild.
5. What @phenaproxima said

I also tried to fix the last test. Lets see.

phenaproxima’s picture

I did an in-depth review and fixed a bunch of small things I found. Remaining open questions are all documentation things:

  1. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,357 @@
    + * Data received from an oEmbed provider could by insecure.
    

    We need to expand on this sentence, or remove it entirely. We can't just leave a blithe comment like this one. In what ways might the data be insecure, and how might the threat(s) be mitigated by this object? If we can't explain these things here, we should remove this line.

  2. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,357 @@
    +   * The width of the optional thumbnail.
    

    Is the thumbnail actually optional, according to the spec?

  3. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,357 @@
    +   * The height of the optional thumbnail.
    

    Same question.

  4. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,357 @@
    +   * @param string $thumbnail_url
    +   *   A URL to a thumbnail image representing the resource.
    +   * @param int $thumbnail_width
    +   *   The width of the optional thumbnail.
    +   * @param int $thumbnail_height
    +   *   The height of the optional thumbnail.
    

    Is the thumbnail actually optional? If not, we should remove the word "optional" from the $thumbnail_width and $thumbnail_height descriptions.

phenaproxima’s picture

Fixed my remaining review points.

dawehner’s picture

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -60,6 +82,20 @@ media.source.video_file:
    +    thumbnails_uri:
    +      type: string
    +      label: 'Thumbnail storage URI'
    

    Note: you can use type: uri here

  2. +++ b/core/modules/media/media.routing.yml
    @@ -24,3 +24,19 @@ entity.media.revision:
    +  requirements:
    +    _permission: 'access content'
    +    _csrf_token: 'TRUE'
    

    _crsf_token seems to be fine for anonyomus users. I was just pondering about that.

  3. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,135 @@
    +    catch (ResourceException $e) {
    +      // @todo Log additional information from ResourceException, to help with
    +      // debugging.
    +      $this->logger->error($e->getMessage());
    +    }
    

    Should we do something to not cache these results?

  4. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,389 @@
    +      'thumbnail_local_uri' => $this->t('The local URI of the thumbnail'),
    +      'thumbnail_local' => $this->t('The local URL of the thumbnail'),
    

    I'm curious, what would be the advantage of storing the thumbnail locally?

  5. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,389 @@
    +    $domain = $this->configFactory->get('media.settings')->get('iframe_domain');
    +    if (!$this->urlResolver->isSecure($domain)) {
    +      array_unshift($form, [
    +        '#markup' => '<p>' . $this->t('It is potentially insecure to display oEmbed content in a frame that is served from the same domain as your main Drupal site, as this may allow execution of third-party code. <a href=":url">You can specify a different domain for serving oEmbed content here</a>.', [
    +          ':url' => Url::fromRoute('media.settings')->setAbsolute()->toString(),
    +        ]) . '</p>',
    +      ]);
    +    }
    

    I'm curious whether we could somehow instead of a warning ensure that people at least checked a checkbox to know that they are doing something insecure.

phenaproxima’s picture

Issue tags: +Needs followup

Thanks @dawehner!

I'm curious, what would be the advantage of storing the thumbnail locally?

If we don't, we're basically hotlinking to an unreliable external resource every time we want to display the thumbnail. I'll open a follow-up to expose a configuration option on the oEmbed source plugin to toggle whether or not the thumbnails are downloaded.

I'm curious whether we could somehow instead of a warning ensure that people at least checked a checkbox to know that they are doing something insecure.

I'm personally not in favor of that, but in any event it's outside the scope of this patch. I will open a follow-up issue for further discussion and possible implementation.

phenaproxima’s picture

Status: Needs work » Needs review
Issue tags: -Needs followup
FileSize
121.22 KB
2.62 KB

Fixed @dawehner's review points. Tests will still fail, though...not sure what I did to cause that.

Anyway, responding to #187:

  1. Done.
  2. Werd to that?
  3. Done -- set the max-age to 0.
  4. Opened #2962751: Allow site builders to choose whether oEmbed media item thumbnails should be stored locally
  5. Opened #2962753: Expose a way to suppress oEmbed security warnings
chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
121.48 KB
12.41 KB

So I found the reason for the failing test.

+++ b/core/modules/media/src/OEmbed/Endpoint.php
@@ -69,17 +69,12 @@
-    $format = 'json';
-    if (!empty($this->formats)) {
-      // If the endpoint specifies multiple formats we use the first provided
-      // format.
-      $format = reset($this->formats);
-    }
-    $this->url = str_replace('{format}', $format, (string) $url);
-
-    if (!UrlHelper::isValid($this->url, TRUE)) {
-      throw new \InvalidArgumentException('oEmbed endpoint must have a valid URL');
+    if (!UrlHelper::isValid($url, TRUE) || !UrlHelper::isExternal($url)) {
+      throw new \InvalidArgumentException('oEmbed endpoint must have a valid external URL');
     }
+    // Use the first provided format to build the endpoint URL. If no formats
+    // are provided, default to JSON.
+    $this->url = str_replace('{format}', reset($this->formats) ?: 'json', $url);

The format replacement has to be done before checking the URL.

Also, I changed the MediaSourceOEmbedVideoTest to use the oEmbedTestTrait as well.

phenaproxima’s picture

+++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
@@ -164,9 +177,15 @@ public function getMetadata(MediaInterface $media, $name) {
+            catch (RequestException $exception) {
+              // Return NULL so the default image will be used.

Only one request: let's log the exception message.

Otherwise, I think we have played patch-pong long enough. Any further tweaking can be done in follow-ups. This is a big win for the Media Initiative, a (reasonably) secure implementation of oEmbed, and effectively OOTB support for remote media in Drupal. Fantastic work.

RTBC from me once the complaint above is fixed. Then it's on to @webchick for the moment of truth (WebchickTestCase).

phenaproxima’s picture

Issue summary: View changes

Updating the IS.

chr.fritsch’s picture

robpowell’s picture

I attempted to test #195. Here are the steps taken:

  1. Fresh checkout of 8.6.x (hash #d01bb4f633)
  2. Fresh DB install of 8.6.x drush si -y --account-pass admin
  3. enable Media module
  4. Create Remote video Media type admin/structure/media/add
  5. If the first action on the form is to select Media Source, the ajax will fail because the label is blank but no error is displayed to the user (see attached video)

Filling out the form in this incomplete state will give you the following error:

Error: Call to a member function getName() on null in Drupal\media\MediaSourceBase->prepareFormDisplay() (line 352 of
/var/www/docroot/core/modules/media/src/MediaSourceBase.php)

We should make it so the ajax doesn't fail and the form can be completed correctly (#2932226).

Secondly, when I filled the form out in correct order (ie title then media source), I was able to successfully save a new remote media type. However, there was a "File could not be created " error message and there was an incorrect thumbnail. When selecting this youtube video, I used the short link, https://youtu.be/dQw4w9WgXcQ.

Rick roll error message and incorrect thumbnail

phenaproxima’s picture

Status: Needs review » Needs work

Thanks for testing, @robpowell!

Regarding the first problem, my knee-jerk reaction is to assume that #2557299: Any AJAX call disregards machine name verification when AJAX is used and leads to a fatal error is responsible. I have rerolled that patch for 8.6.x and we should try to land it as quickly as possible.

And as for the second problem...damn, nice find. The thumbnail download code was changed in a recent patch, so it's possible we need to improve our testing around that. So I'm kicking this back for that.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
5.12 KB
121.71 KB

Damn it. We had a test for that, but apparently, I broke that as well.

So I fixed that and I found out that we are still calling the API for retrieving a thumbnail image. So I mocked that as well.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
122.32 KB
1.86 KB

Since testbot is using a subdirectory, I think we should use baseUrl instead of just scheme and host.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
128.01 KB
30.91 KB

I did some major refactoring on the tests, so I'm sorry for the big interdiff -- but it works for me locally now. Let's see how Drupal CI fares with it. The nature of the refactoring is so that we can do less HTTP request/response mocking, and more "put stuff into state so that we can actually write useful tests that don't need to do a universe of dirty work in order to properly test that things are working".

phenaproxima’s picture

I demoed this patch in today's UX call. I'm pleased to say that everyone really liked it, and the remaining complaints are all either relatively minor, or punt-able to follow-up issues.

  • The oEmbed formatter doesn't appear to respect the max width and max height settings. I created a Vimeo video, set the formatter to 150x150, and the resulting video (as displayed in an entity reference field) was still larger than those dimensions. I think we need to test this more thoroughly and fix the bug. Fixed in #212.
  • The iFrame stuff is hard to understand, and it should be documented better. We need to implement hook_help() on the media settings form to explain why the iFrame setting exists, and the iFrame domain field itself needs to specifically say that the domain you enter has to point to your Drupal site. Additionally, we we have opened a follow-up issue to validate that the domain does, in fact, point to the Drupal site: #2965979: [PP-1] Validate alternate domain for oEmbed iFrame.
  • Even though it's a potential security hazard, we may want to expose the ability for site builders to configure which oEmbed providers are available. We decided to open a follow-up for discussion and possible implementation, with the security team's blessing: #2965985: Allow site builders to enable oEmbed providers.
  • There is a "URL alias" field directly below the source field for oEmbed videos. This is not Media's fault, but it's especially confusing in this case. So I'm adding #2916809: Add fieldset/vertical tab for URL alias field as a related issue.
phenaproxima’s picture

Status: Needs work » Needs review
FileSize
130.62 KB
10.88 KB

Cleaned up doc blocks in new test stuff, and fixed a couple of small asks from the UX call.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
130.62 KB
6.85 KB

That are really nice test improvements. I like the way of mocking the providers.

So I did a couple of things:

  • Fixed the remaining tests
  • Introduced a plugin definition key "source_field_label" and set the label at the correct place. I introduced a new key, because other media sources that support for example Flickr might want to set this to "Image URL" or something else
  • Fixed that the max width and height are taken into account.
phenaproxima’s picture

  1. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -390,12 +390,18 @@ public function prepareFormDisplay(MediaTypeInterface $type, EntityFormDisplayIn
    +  public function createSourceField(MediaTypeInterface $type) {
    +    return parent::createSourceField($type)->set('label', $this->pluginDefinition['source_field_label']);
    +  }
    

    Thanks for fixing that :) I'm not sure how I feel about adding it to the plugin definition, though; I'd rather we made it part of the configuration form for *all* media sources. So maybe we should remove this for now, then open a follow-up issue to make this part of MediaSourceBase.

  2. +++ b/core/modules/media/src/Plugin/media/Source/OEmbedDeriver.php
    @@ -20,6 +20,7 @@ public function getDerivativeDefinitions($base_plugin_definition) {
    +        'source_field_label' => t('Video URL'),
    

    If we punt the label work to a follow-up, this can also be reverted.

+++ b/core/modules/media/tests/src/FunctionalJavascript/MediaSourceOEmbedVideoTest.php
@@ -90,7 +87,7 @@ public function testMediaOEmbedVideoSource() {
-    $this->assertFileExists("public://oembed_thumbnails/Vimeo_10311.png");
+    $this->assertFileExists("public://oembed_thumbnails/Vimeo_33797.png");

This is not correct anymore; the problem is, if you look at the JSON file for the Vimeo fixture, I changed the thumbnail URI to internal:/core/misc/druplicon.png. Which resulted in a new filename for the thumbnail.

alexpott’s picture

I wonder if we could achieve slightly cleaner mocking if instead of locking the http client we used an approach similar to https://lornajane.net/posts/2017/testing-api-calls-in-php-with-guzzle-mocks?

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
130.44 KB
4.37 KB

Crosspost with @alexpott

Addressing #209

1/2. Ok, reverted that change and will open a follow-up for that
3. Since the test refactoring in #202 the resource URL is different on every machine that runs the tests. That also means that the Thumbnail name changes every time. In order to prevent that, we are now using the source field value to generate a thumbnail name. I also changed from provider_name to type, because it's the only field where is guaranteed that it is set.

phenaproxima’s picture

A few more changes in this iteration:

  1. The oEmbed formatter was never really respecting the max_width and max_height settings when rendering resources. I have fixed this and added a test.
  2. The oEmbed thumbnails are now simply named after the UUID of the media item, rather than a hashing scheme.
  3. I reverted the "Video URL" field label change in OEmbed::createSourceField(). I think we should make that configurable on the media type form, and we should do that entirely in #2965988: Make the source field label configurable when creating a media type.
  4. Some minor clean-up, and a few new assertions, in OEmbedFormatterTest.
phenaproxima’s picture

I've opened a follow-up to make the field label configurable when creating a media type: #2965988: Make the source field label configurable when creating a media type.

And with that, all follow-ups needed so far are filed. So I'm removing the tag.

And regarding #210: in a quick Slack chat, @alexpott agreed to defer that work to the existing follow-up for better oEmbed tests (#2964636: Improve oEmbed tests), so I'm copying his comment there.

phenaproxima’s picture

I've added some verbiage explaining the iFrame stuff, as requested by the UX team in #203.

I don't think we need anything else in order for this to be RTBC.

phenaproxima’s picture

@alexpott thought it might be a good idea to have a change record, so I wrote one: https://www.drupal.org/node/2966029.

phenaproxima’s picture

Added test coverage of hook_oembed_resource_url_alter() as requested by @alexpott on Slack. I also added an update function to create the all-important oembed_providers value in media.settings.

phenaproxima’s picture

Tested the update path as requested by @alexpott on Slack, and wrote a little documentation on the OEmbed plugin, explaining how to extend it to support additional providers.

phenaproxima’s picture

Status: Needs work » Needs review

The failures are probably due to infrastructure stress because of the security update that came out a short time ago. Setting back to NR while I wait for the mania to subside, then I'll re-queue for testing.

phenaproxima’s picture

I discussed this briefly with @webchick in Slack and she felt that deferring the source field label changes to #2965988: Make the source field label configurable when creating a media type was going to block this patch. So, after a short stint in the bikeshed, we decided to rename the source field to "Remote video URL", but to do it somewhat generically, such that any additional derivatives of the OEmbed plugin will get a source field labeled "$plugin_label URL" by default. So, in the case of the remote video derivative, the source field is labeled "Remote video URL".

To me, that's a pretty good compromise until we figure out how to make the source field label configurable (which is what the follow-up is for).

tstoeckler’s picture

+++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
@@ -438,2 +438,14 @@
+    $label = (string) $this->t(':type URL', [
+      ':type' => $plugin_definition['label'],

Let's use "@" instead of ":" for the variable placeholder, though. ":" is just for urls e.g. in < a href = "" situations

Gábor Hojtsy’s picture

Reviewed #214 as requested on slack. I agree on the "Remote video URL" label change that was done later. This text was not changed in later patches yet as per the interdiffs.

  1. +++ b/core/modules/media/media.module
    @@ -59,6 +59,11 @@
    +      $output .= '<p>' . t('In order to mitigate the risks, Drupal always displays third-party assets in an iFrame, which effectively sandboxes any executable code running inside it. For even more security, Drupal can serve the iFrame from an alternate domain (that also points to your Drupal site), which you can configure on this page. This helps safeguard cookies and other sensitive information.') . '</p>';
    

    1. If others add settings to this screen, it will look super odd to have this help at the top and then a bunch of settings that are unrelated to iframes. Putting all of this explanation on the setting may be a bit too long (maybe we can look into cutting it), but I think that would make the page future-proof.

    2. To make sure it is clear to people that we talk about THIS site, IMHO that would be good to include. "Your Drupal site" may not make it clear we are talking about this same site.

  2. +++ b/core/modules/media/src/Form/SettingsForm.php
    @@ -68,13 +68,16 @@
    +      '#description' => $this->t('Enter a different domain from which to serve oEmbed content, including the <em>http://</em> or <em>https://</em> prefix, to prevent this. This domain needs to point back to your Drupal site, or existing oEmbed content may not display correctly.'),
    

    "to prevent this"? That looks like a half written sentence to me :)

    oEmbed content will not display *at all* if this is not set correctly, right?

phenaproxima’s picture

oEmbed content will not display *at all* if this is not set correctly, right?

It sort of depends on how it was set, but it's likely to result in a lot of iframes with 404 errors in them.

Gábor Hojtsy’s picture

I think it's better to be super clear here so people will not fiddle with this setting if their iframe is too wide or too dark or whatever :) "not displayed properly" is a very wide category :)

phenaproxima’s picture

Discussed #225 on Slack with Gábor and we decided that "...not displayed properly, or at all" would be better, clearer wording.

phenaproxima’s picture

Issue tags: +Needs followup

Today, @chr.fritsch and I jumped on a call with @webchick to test the patch out. Here's what we found:

  1. When creating a new media type that uses oEmbed, you are informed about the use of the iframe to display the oEmbed content. There is a link to configure the iframe domain, and that link should open in a new window. For clarity, we should also add "(opens in a new window)" after the link. Fixed in #228.
  2. The fact that we don't ship a "remote video" media type out of the box decreases the usability of the patch substantially. Therefore, the follow-up to create and ship that media type with Standard is the top priority when this patch lands.
  3. YouTube videos are by far the most common use case for oEmbed, but we don't have any test fixtures for YouTube videos. Instead, we used Vimeo. We should open a follow-up to either test YouTube fixtures in addition to Vimeo, or just remove the Vimeo fixtures entirely in favor of YouTube. Opened #2966655: Test oEmbed with a YouTube fixture
  4. The iframe/security warnings are good, but could use a few small tweaks. @webchick will post the specifics. Improved in #228, but needs validation from @webchick.
  5. There was general agreement that #223.1 is an excellent idea. Let's wrap the settings form's iframe stuff in its own fieldset, labeled "Security". Fixed in #228.
  6. It is possible to use the oEmbed formatter on any link, string, or string_long field. And when you do, there is no filtering by allowed providers. I discussed the security implications with @samuel.mortenson and @webchick, and the agreed-upon course of action is to make the formatter apply only to media types that use an OEmbedInterface source plugin. This is better UX (one way to do something instead of three), and is more secure. Fixed in #228.
  7. Attempting to embed a Flickr URL produced an exception. We should investigate that, fix it, and test it. Fixed in #231.
  8. Some oEmbed providers -- notably YouTube -- will completely ignore the max_width and max_height settings of the oEmbed formatter, which can result in a video WAY overflowing a frame that is far too small for it, and that looks extremely weird. Therefore, given a max_width/height and the width/height of the resource, the formatter should choose whichever value is bigger. Opened #2966656: Negotiate max width/height of oEmbed assets more intelligently
phenaproxima’s picture

Let's see how this does. I fixed:

  • @tstoeckler's feedback in #222.
  • Gábor's feedback in #223-226.
  • #227, points 1, 4 (partially), 5, and 6.
phenaproxima’s picture

Issue tags: -Needs followup

All needed follow-ups have been opened, so I'm removing the tag.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
136.05 KB
22.97 KB

Regarding #227.2 I found a small bug in the formatter. I fixed that and added test coverage in the formatter test. By doing that I refactored the that test heavily. It's now testing with Media items because we limit the formatter to only work with media entities. I also introduced a dataProvider.

By testing the formatter I found that the provider_name is not a mandatory property. So we can not expect that it will be retrieved by oEmbed. So I adjusted the code a bit to take that into account.

I also noticed that the tests are still rely on an active internet connection and expect that all the oEmbed providers will respond. We should definitly fix that.

chr.fritsch’s picture

Ok, here is the fix for the oEmbedFormatterTest. ::discoverResourceUrl is now mocked and returns directly the local endpoint URL.

phenaproxima’s picture

Status: Needs review » Needs work

I don't really see anything to seriously complain about here. Just minor stuff.

  1. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -264,7 +264,8 @@ public static function isApplicable(FieldDefinitionInterface $field_definition)
    +    return $applicable && $media_type && $media_type->getSource() instanceof OEmbedInterface;
    

    Do we really need to check if $media_type is set? If it's not, several things have gone wrong and it should be allowed to fail hard.

  2. +++ b/core/modules/media/tests/src/Functional/FieldFormatter/OEmbedFormatterTest.php
    @@ -38,64 +29,106 @@ class OEmbedFormatterTest extends BrowserTestBase {
    +  public function testRender($url, $resource_url, $field_settings, $selectors) {
    +
    +    $media_type = $this->createMediaType([], 'oembed:video');
    

    Nit: Extra blank line. Also, can we type hint $field_settings and $selectors as arrays, rename $field_settings to $formatter_settings, and document the method parameters?

+++ b/core/modules/media/tests/modules/media_test_oembed/src/UrlResolver.php
@@ -0,0 +1,38 @@
+ * Overrides the oEmbed url resolver service for testing purposes.

Nit: s/url/URL

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
137.73 KB
3.97 KB

Fixed #233

1. Yes, we have to check, because the method is called for every field. And on a 'Name' field, for example, no MediaType can be loaded
2. Fixed
3. Fixed

I also replaced \Unicode::strtolower() with mb_strtolower since #2849669: Fix \Drupal\Component\Utility\Unicode() because of the Symfony mbstring polyfill landed.

phenaproxima’s picture

Very minor doc comment fixes. Otherwise, I think this is ready for final review.

idebr’s picture

Status: Needs review » Needs work

What great work! This will make many Media users happy :)

I found a number of nitpicks in patch and I am somewhat hesitant to post them given the progress. However, I think they collectively would finish up the patch so here goes:

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +    iframe_domain:
    +      type: string
    +      label: 'Domain from which to serve oEmbed content in an iFrame'
    +    oembed_providers:
    +      type: string
    +      label: 'The URL of the oEmbed providers database'
    

    Since we are expecting a URL, can we use "type: 'string'"?

  2. +++ b/core/modules/media/media.links.menu.yml
    @@ -4,3 +4,9 @@ entity.media_type.collection:
    +media.settings:
    +  title: 'Media settings'
    +  parent: system.admin_config_media
    +  description: 'Manage media settings.'
    +  route_name: media.settings
    

    Since this is an important configuration form, let's add it as a the 'configure' item in media.info.yml for discoverability.

  3. +++ b/core/modules/media/media.routing.yml
    @@ -24,3 +24,19 @@ entity.media.revision:
    +    _csrf_token: 'TRUE'
    
    +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,137 @@
    +   * Constructs a OEmbedController instance.
    

    "Constructs a" -> "Constructs an"

  4. +++ b/core/modules/media/src/Form/SettingsForm.php
    @@ -0,0 +1,105 @@
    +class SettingsForm extends ConfigFormBase {
    

    I think this violates our class naming convention, specifically #10:
    Classes and interfaces should have names that stand alone to tell what they do without having to refer to the namespace, read well, and are as short as possible without losing functionality information or leading to ambiguity.

    In other modules in Core, the settings form is typically prefixed with the module name, for example:
    - \Drupal\locale\Form\LocaleSettingsForm
    - \Drupal\statistics\StatisticsSettingsForm
    - \Drupal\update\UpdateSettingsForm

    I suggest this class is named MediaSettingsForm.

  5. +++ b/core/modules/media/src/Form/SettingsForm.php
    @@ -0,0 +1,105 @@
    +    $description = '<p>' . t('Displaying media assets from third-party services, such as YouTube or Twitter, can be risky. This is because many of these services return arbitrary HTML to represent those assets, and that HTML may contain executable JavaScript code. If handled improperly, this can increase the risk of your site being compromised.') . '</p>';
    +    $description .= '<p>' . t('In order to mitigate the risks, third-party assets are displayed in an iFrame, which effectively sandboxes any executable code running inside it. For even more security, the iFrame can be served from an alternate domain (that also points to your Drupal site), which you can configure on this page. This helps safeguard cookies and other sensitive information.') . '</p>';
    

    Use $this->t() over t() in classes.

  6. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +   * @param array $schemes
    +   *   List of URL schemes supported by the provider.
    +   * @param array $formats
    +   *   List of supported formats. Can be "json", "xml" or both.
    

    The data type in the construct method does not match the @var declarations (string[] vs array). Let's bring these in line, personally I prefer string[]

  7. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,99 @@
    +   * @param array $endpoints
    +   *   List of endpoints this provider exposes.
    

    @param array[] $endpoints would clarify the expected datatype.

  8. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -0,0 +1,122 @@
    +   * Low long the provider data should be cached, in seconds.
    

    "Low long" -> "How long"

  9. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -0,0 +1,122 @@
    +   * Constructs OEmbed class.
    

    The class mentioned in the doc comment does not match the actual class name (ProviderRepository)

  10. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -0,0 +1,122 @@
    +  public function __construct(Client $http_client, ConfigFactoryInterface $config_factory, TimeInterface $time, CacheBackendInterface $cache_backend = NULL, $max_age = 604800) {
    

    Let's cast the Client to ClientInterface instead. The same goes for ResourceFetcher, UrlResolver, OEmbed.

  11. +++ b/core/modules/media/src/OEmbed/ProviderRepositoryInterface.php
    @@ -0,0 +1,40 @@
    +   * @throws \InvalidArgumentException
    +   *   If there is known oEmbed provider with the specified name.
    

    If there is no known oEmbed provider with the specified name.

  12. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,387 @@
    +   * @see \Drupal\media\OEmbed\UrlResolverInterface::getResourceUrl()
    +   * @see https://oembed.com/#section2
    +   *
    +   * @var string
    

    phpcs: the @var tag must be the first tag in a member variable comment

  13. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,387 @@
    +   *   - $type is anything other than 'link', and $width and $height are empty.
    

    The code actually checks for: "$type is anything other than 'link', and $width or $height is empty."

  14. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,387 @@
    +   *   - $type is 'rich', or 'video' and $html is empty.
    

    The code actually checks for: "$type is 'rich' or 'video', and $html is empty."

  15. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,387 @@
    +   * @return string|NULL
    +   *   The title of the resource, if known.
    ...
    +   * @return string|NULL
    +   *   The name of the resource author, if known.
    ...
    +   * @return \Drupal\Core\Url|NULL
    +   *   The absolute URL of the resource author, or NULL if none is provided.
    ...
    +   * @return \Drupal\Core\Url|NULL
    +   *   The absolute URL of the thumbnail image, or NULL if there isn't one.
    ...
    +   * @return int|NULL
    +   *   The thumbnail width in pixels, or NULL if there is no thumbnail.
    ...
    +   * @return int|NULL
    +   *   The thumbnail height in pixels, or NULL if there is no thumbnail.
    ...
    +   * @return int|NULL
    +   *   The width of the resource in pixels, or NULL if the resource has no
    +   *   dimensions
    ...
    +   * @return int|NULL
    +   *   The height of the resource in pixels, or NULL if the resource has no
    +   *   dimensions.
    ...
    +   * @return string|NULL
    +   *   The HTML representation of the resource, if it has one.
    

    According to the Drupal coding standards, NULL should be written lowercase null in @param and @return types

  16. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,387 @@
    +   * @return \Drupal\media\OEmbed\Provider
    +   */
    

    Missing description for the @return value

  17. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,207 @@
    +   * Constructs OEmbedManager class.
    

    The class name in the __construct method does not match the actual class name (UrlResolver)

  18. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,271 @@
    +   * Constructs a OEmbedFormatter instance.
    

    "Constructs a" -> "Constructs an"

  19. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,271 @@
    +        // Render videos and rich content in a iFrame for security reasons.
    

    "a iFrame" -> "an Iframe"

  20. +++ b/core/modules/media/src/Plugin/Field/FieldWidget/OEmbedWidget.php
    @@ -0,0 +1,60 @@
    +    $message = t('You can link to media from the following services: @providers', ['@providers' => implode(', ', $source->getAllowedProviderNames())]);
    

    The StringTranslationTrait is available from PluginBase, so you can use $this->t() rather that t() here

  21. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedResourceConstraintValidator.php
    @@ -0,0 +1,133 @@
    +class OEmbedResourceConstraintValidator extends ConstraintValidator implements ContainerInjectionInterface {
    ...
    +   * Constructs a new OEmbedProviderConstraintValidator.
    

    The class name mentioned in the __construct method does not match the actual machine name (OEmbedResourceConstraintValidator)

samuel.mortenson’s picture

Reviewing #235:

  1. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,137 @@
    +      // @todo Log additional information from ResourceException, to help with
    +      // debugging.
    
    +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,99 @@
    +      // @todo Log the exception message to help with debugging.
    
    +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedResourceConstraintValidator.php
    @@ -0,0 +1,133 @@
    +    // @todo if $e is a ProviderException or ResourceException, log additional
    +    // debugging information contained in those exceptions.
    

    There are three @todo-s without issues.

  2. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,145 @@
    +      // Bring XML into JSON format to be unified with the JSON response.
    +      $encoder = new XmlEncoder();
    +      $content = Json::encode($encoder->decode($content, 'xml'));
    ...
    +    $data = Json::decode($content);
    +    $this->cacheSet($cache_id, $data);
    

    XML data is being decoded then encoded into JSON, then immediately decoded again. I don't see how this would "unify" the data - why not just use the decoded XML?

phenaproxima’s picture

There are three @todo-s without issues.

Opened #2972846: Improve oEmbed exception logging, but we need to mention it in the patch.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
138.43 KB
24.66 KB

Fixed everything in #236, except:

Since we are expecting a URL, can we use "type: 'string'"?

We're already using type: string...

And I fixed #238.1, but I left the JSON/XML stuff alone. That came from the original oEmbed patch and I didn't want to mess with things I don't understand :)

phenaproxima’s picture

Fixed #238.2, and ran all tests with my Internet connection disabled in order to ensure that tests are properly isolated. I did get a failure, so we should probably look into that before calling this done.

Sorry for the lack of an interdiff -- had to reroll against 8.6.x HEAD, for some reason.

samuel.mortenson’s picture

Status: Needs review » Reviewed & tested by the community

This looks good to me - my prior feedback has been addressed. Marking as RTBC.

martin107’s picture

I can see the smallest of imperfections.

According to the test run output CS error sections there is a @file tags missing from ....

media_test_oembed.module

larowlan’s picture

Status: Reviewed & tested by the community » Needs review
  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +    iframe_domain:
    

    shouldn't there be a default for this in media.settings.yml?

  2. +++ b/core/modules/media/media.install
    @@ -120,3 +153,12 @@ function media_update_8500() {
    +    ->set('oembed_providers', 'https://oembed.com/providers.json')
    

    same deal here?

  3. +++ b/core/modules/media/media.module
    @@ -99,6 +106,18 @@ function media_theme_suggestions_media(array $variables) {
    +      $provider_id = \Drupal::transliteration()->transliterate($provider_id);
    +      $provider_id = mb_strtolower($provider_id);
    +      $suggestions[] = "media__oembed__$provider_id";
    

    its possible that the provider ID will contain characters that are invalid in PHP function names - e.g. should we be casting - to _ etc?

  4. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,137 @@
    +      throw new MissingMandatoryParametersException('url parameter not provided');
    ...
    +      throw new \InvalidArgumentException('url parameter is invalid or not external');
    

    these bubbles up as a 500, should we instead use something that bubbles up as a 400?

  5. +++ b/core/modules/media/src/Form/MediaSettingsForm.php
    @@ -0,0 +1,105 @@
    +      '#type' => 'fieldset',
    

    is fieldset the right choice here? we did away with fieldsets for grouping (other than radios/checkboxes where that is their intended use) in #1168246: Freedom For Fieldsets! Long Live The DETAILS.

  6. +++ b/core/modules/media/src/MediaSourceBase.php
    @@ -301,7 +301,9 @@ public function createSourceField(MediaTypeInterface $type) {
    +    // Some media sources are using a deriver, so their plugin IDs are
    +    // containing a ':' which is not allowed for field names.
    

    nit: the english is slightly off here.

  7. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +   *   The endpoint URL.
    

    should we document that this expects a {format} slug?

  8. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +      if (preg_match("|^$regexp$|", $url)) {
    

    is there an existing prescribed format to which these conform or should we be using preg_quote here?

  9. +++ b/core/modules/media/src/OEmbed/ProviderException.php
    @@ -0,0 +1,36 @@
    +  public function __construct($message, Provider $provider = NULL, \Exception $previous = NULL) {
    

    is it worth adding a ::fromProvider factory method given that is the most likely argument folks will pass?

  10. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,388 @@
    +  public function __construct($type, $version, Provider $provider = NULL, $title = NULL, $author_name = NULL, $author_url = NULL, $cache_age = NULL, $thumbnail_url = NULL, $thumbnail_width = NULL, $thumbnail_height = NULL, $width = NULL, $height = NULL, $url = NULL, $html = NULL) {
    

    given the large number of optional parameters here, would it be worth having some factory methods?

    Resource::html()
    Resource::photo()

    etc

    So it was clear which arguments were required for each flavour.

    You could even go a step further and make the constructor protected so you had to use one of the factory methods.

    As it stands, the number of arguments, and the valid combinations for each type might end up being a bit 'trial and error' for developers.

  11. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,271 @@
    +      if (empty($item->{$main_property})) {
    ...
    +        $resource_url = $this->urlResolver->getResourceUrl($item->{$main_property}, $max_width, $max_height);
    ...
    +        $this->logger->error("Could not retrieve the remote URL (@url).", ['@url' => $item->{$main_property}]);
    ...
    +          '#url' => Url::fromUri($item->{$main_property}),
    

    is it worth putting this in a local variable to aid readability?

  12. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,271 @@
    +      if ($resource->getType() === 'link') {
    ...
    +      elseif ($resource->getType() === 'photo') {
    

    given these have special meaning both here and in the constructor for Resource, should they be constants?

  13. +++ b/core/modules/media/tests/src/Functional/UrlResolverTest.php
    @@ -0,0 +1,128 @@
    +        'https://twitter.com/swear_trek/status/975446772205092864',
    ...
    +        'https://vimeo.com/14782834',
    ...
    +        'http://www.collegehumor.com/video/40002870/lets-not-get-a-drink-sometime',
    

    i didn't check these, but we're semi-endorsing them by linking to them here - so are we sure they're something we want to be associated with?

  14. +++ b/core/modules/media/tests/src/Functional/UrlResolverTest.php
    @@ -0,0 +1,128 @@
    +      'XML resource' => [
    ...
    +        'http://www.collegehumor.com/oembed.json?url=video_collegehumor.html',
    

    xml resource with a .json url? is it worth a comment here to point out this uses a rel link that is in xml format?

phenaproxima’s picture

Addressed #243:

  1. I don't think so, because the value will vary from site to site. We can open a follow-up to look into potentially setting a default value in hook_install() or elsewhere, though.
  2. See #1.
  3. Fixed.
  4. Fixed.
  5. Ah, didn't know that! Fixed.
  6. Fixed.
  7. Fixed.
  8. These are meta-URLs, like 'https://vimeo.com/*', so I'm not sure we need preg_quote().
  9. Decided not to do this, because I'm not sure there's any real value in throwing a ProviderException without a message.
  10. Fixed.
  11. Fixed.
  12. Fixed.
  13. Fixed, but that Swear Trek one is (IMHO) really funny. :)
  14. Fixed.

In addition to this stuff, I fixed a few other things and addressed a couple of local test failures. Sorry for the rather weighty interdiff; the factory methods added a lot of code (mostly doxygen) to Resource.

samuel.mortenson’s picture

  1. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -145,55 +166,14 @@
    -    if ($type !== 'link' && (empty($width) || empty($height))) {
    -      throw new \InvalidArgumentException('The resource must provide dimensions.');
    -    }
    

    I don't see that this check has moved to the factory method.

  2. +++ a/core/modules/media/tests/src/FunctionalJavascript/MediaJavascriptTestBase.php
    @@ -2,7 +2,6 @@
    -use Drupal\FunctionalJavascriptTests\DrupalSelenium2Driver;
    
    @@ -15,8 +14,6 @@
    -  protected $minkDefaultDriverClass = DrupalSelenium2Driver::class;
    -
    

    Why was this done?

phenaproxima’s picture

I don't see that this check has moved to the factory method.

The factory method calls setDimensions(), which will validate the incoming dimensions and throw an exception if they're not numbers greater than zero.

Why was this done?

Because switching to DrupalSelenium2Driver is out of scope for this patch, and that line is an artifact from my local testing which I forgot to remove from the previous patch.

samuel.mortenson’s picture

Status: Needs review » Reviewed & tested by the community

Got it - moving to RTBC based on my review of the interdiff from #244.

alexpott’s picture

re #243.13 - there are still links to www.collegehumor.com in the patch - do we need them?

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work

For years, I've wished for oEmbed support. Looks like it's now actually happening! 🎉😵🎉

I could not resist reviewing this…


Overall, zero mentions of @internal in this issue. We should make as much as possible @internal to avoid committing to supporting APIs that only one in a thousand or even fewer people will ever use.

  1. +++ b/core/modules/media/config/install/media.settings.yml
    @@ -1 +1,2 @@
    +oembed_providers: 'https://oembed.com/providers.json'
    

    This triggered my security OCD, but I see @alexpott already brought that up 👌

  2. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +      label: 'Domain from which to serve oEmbed content in an iFrame'
    

    Übernit: we usually write iframe.

  3. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +      label: 'The URL of the oEmbed providers database'
    

    Perhaps mention this is expected to be JSON, and point to the relevant section of the spec at https://oembed.com/?

  4. +++ b/core/modules/media/media.api.php
    @@ -20,6 +20,22 @@ function hook_media_source_info_alter(array &$sources) {
    + * @param \Drupal\media\OEmbed\Provider $provider
    

    Oh, nice, an actual value object!

  5. +++ b/core/modules/media/media.api.php
    @@ -20,6 +20,22 @@ function hook_media_source_info_alter(array &$sources) {
    +    $parsed_url['query']['foo'] = 'bar';
    

    Can't we make this a more meaningful example?

    For example, automatically make all YouTube URLs use the "nocookie" YouTube domain?

  6. +++ b/core/modules/media/media.routing.yml
    @@ -24,3 +24,19 @@ entity.media.revision:
    +media.oembed.render:
    +  path: '/media/oembed'
    +  defaults:
    +    _controller: '\Drupal\media\Controller\OEmbedController::render'
    +  requirements:
    +    _permission: 'access content'
    +    _csrf_token: 'TRUE'
    
    +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +/**
    + * Controller which renders an oEmbed resource in a bare page (without blocks).
    + */
    +class OEmbedController implements ContainerInjectionInterface {
    ...
    +  /**
    +   * Renders an oEmbed resource.
    ...
    +  public function render(Request $request) {
    

    Interesting…

    wouldn't it better to call this route/method something like iframeSource? So that we can also explicitly document that the HTML returned by this is not to be trusted?

  7. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +    if (!UrlHelper::isValid($url, TRUE) || !UrlHelper::isExternal($url)) {
    +      throw new BadRequestHttpException('url parameter is invalid or not external');
    +    }
    

    Nit: why not split these up?

  8. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +    // Create a response object so that the frame content will not have all
    +    // the blocks and page elements normally rendered by Drupal.
    

    Return a response instead of a render array so that …

  9. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +    $response = new CacheableResponse(Markup::create(''));
    

    Why Markup::create()? AFAICT you can call the constructor without any parameters.

  10. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +      $resource_url = $this->urlResolver->getResourceUrl($url, $request->query->getInt('max_width', NULL), $request->query->getInt('max_height', NULL));
    

    TIL getInt() is a thing.

  11. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +      // Render the content in a new render context so that cacheable metadata
    +      // will bubble up correctly.
    

    We're not bubbling cacheable metadata. We're capturing the cacheability of the rendered HTML.

  12. +++ b/core/modules/media/src/Controller/OEmbedController.php
    @@ -0,0 +1,138 @@
    +          '#post' => Markup::create($resource->getHtml()),
    

    Perhaps add an explicit comment to document that using Markup::create() here creates a trusted ("safe") string, even though it's for external HTML. This is only okay because this response is going to be used in an iframe.

  13. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +class Endpoint {
    

    Can we make this @internal?

  14. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +   * List of supported formats.
    +   *
    +   * @var string[]
    

    Should this list that only json and xml are allowed?

  15. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +   *   The endpoint URL. May contain a '{format}' placeholder.
    

    @code {format} @endcode?

  16. +++ b/core/modules/media/src/OEmbed/Endpoint.php
    @@ -0,0 +1,169 @@
    +   * @param string[] $formats
    +   *   List of supported formats. Can be "json", "xml" or both.
    ...
    +    $this->formats = array_map('mb_strtolower', $formats);
    

    Let's then also do assert(empty(array_diff($formats, ['json', 'xml'])));

  17. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +class Provider {
    

    Can't be @internal because exposed in an API (the alter hook).

  18. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +   * @var \Drupal\media\OEmbed\Endpoint[]
    ...
    +   * @param array $endpoints
    

    Typehint mismatch.

  19. +++ b/core/modules/media/src/OEmbed/Provider.php
    @@ -0,0 +1,100 @@
    +   *   The provider $url.
    

    Übernit: s/$url/URL/

  20. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,525 @@
    + * Value object representing an oEmbed resource.
    ...
    +class Resource implements RefinableCacheableDependencyInterface {
    
    1. Can we make this @internal?
    2. Note that because this uses RefinableCDI and not just CDI, this is a mutable value object.
  21. +++ b/core/modules/media/src/OEmbed/Resource.php
    @@ -0,0 +1,525 @@
    +  protected function setThumbnailDimensions($width, $height) {
    

    But apparently it already is mutable anyway.

    Are you sure you need this mutability? I don't think you do. It seems you only have this for some of the static helper methods on this class.

  22. +++ b/core/modules/media/src/OEmbed/ResourceException.php
    @@ -0,0 +1,63 @@
    +   * The oEmbed resource which caused the exception.
    ...
    +   * @var array
    ...
    +   * @param array $resource
    +   *   (optional) The oEmbed resource which caused the exception.
    ...
    +   * @return array
    +   *   The oEmbed resource which caused the exception.
    

    Why is this an array instead of a Drupal\media\OEmbed\Resource value object?

  23. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,207 @@
    +  /**
    +   * Static cache of discovered oEmbed resources.
    +   *
    +   * A discovered oEmbed resource is the oEmbed URL for a specific media object.
    +   * This oEmbed URL is fetched from the canonical URL of the media object.
    +   *
    +   * @var array
    +   */
    +  protected $urlCache = [];
    

    An array of what? URLs? Drupal\media\OEmbed\Resource objects?

  24. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,207 @@
    +  protected function findUrl(\DOMXPath $xpath, $format) {
    +    $result = $xpath->query("//link[@type='application/$format+oembed']");
    +    return $result->length ? $result->item(0)->getAttribute('href') : FALSE;
    +  }
    

    Nice!

  25. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,207 @@
    +    $cache_id = 'media:oembed_resource_url:' . hash('sha256', $url . $max_width . $max_height);
    

    Why hash this?

  26. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,207 @@
    +    // Let other modules alter the query string, because some oEmbed providers
    +    // are providing extra parameters. For example, Instagram also supports the
    +    // 'omitscript' parameter.
    

    Query string? This alter hooks allows you to alter much more than just the query string!

  27. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,276 @@
    +    foreach ($items as $delta => $item) {
    +      $main_property = $item->getFieldDefinition()->getFieldStorageDefinition()->getMainPropertyName();
    +      $value = $item->{$main_property};
    +
    +      if (empty($value)) {
    +        continue;
    +      }
    +
    +      try {
    +        $resource_url = $this->urlResolver->getResourceUrl($value, $max_width, $max_height);
    +        $resource = $this->resourceFetcher->fetchResource($resource_url);
    +      }
    +      catch (ResourceException $exception) {
    +        $this->logger->error("Could not retrieve the remote URL (@url).", ['@url' => $value]);
    +        continue;
    +      }
    

    If we'd move this into a helper method, then this could become:

    foreach ($this->helper($items) as $resource) {
    …
    

    i.e. then viewElements() would truly only contain view (render) logic.

  28. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,276 @@
    +        // Render videos and rich content in an Iframe for security reasons.
    

    Übernit: Iframe, that's yet another way of capitalizing it.

  29. +++ b/core/modules/media/src/Plugin/Field/FieldFormatter/OEmbedFormatter.php
    @@ -0,0 +1,276 @@
    +          '#cache' => [
    +            'tags' => $this->config->getCacheTags(),
    +          ],
    

    Shouldn't this also set a max age, per the max age setting for the resource providers?

  30. +++ b/core/modules/media/src/Plugin/Validation/Constraint/OEmbedResourceConstraint.php
    @@ -0,0 +1,46 @@
    +  public $unknownProviderMessage = 'The given URL does not match any known oEmbed providers.';
    ...
    +  public $disallowedProviderMessage = 'The @name provider is not allowed.';
    

    Given that this will likely be seen quite often by end users, how about prefixing it with Sorry, ?

  31. +++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
    @@ -0,0 +1,452 @@
    +  public function getMetadata(MediaInterface $media, $name) {
    

    Wow, cool to see all this metadata become available in Drupal!!! Structured data FTW :)

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
148.01 KB
25.28 KB

Well, you know what they say...250th time's the charm.

I was able to address almost all of Wim's feedback in #249:

  1. Okay, cool.
  2. Fixed.
  3. Fixed, although I'm not seeing any place in the spec which says that the provider database must be JSON.
  4. I think you can thank @alexpott for that suggestion. :)
  5. Fixed.
  6. Renamed to OEmbedIframeController (and the route is now media.oembed_iframe) and added a comment on the class explaining that the HTML is not trusted.
  7. Fixed.
  8. Fixed.
  9. Markup has no explicit constructor, so the string itself will be NULL unless we use create(). I'd rather use create() to explicitly set the value to an empty string.
  10. Learning is fun!
  11. Fixed.
  12. Fixed.
  13. Done.
  14. Done.
  15. Done.
  16. Done.
  17. Okay, I left it as-is.
  18. It's not a mismatch; the endpoints are passed into Provider's constructor as an array of arrays, and converted to Endpoint objects, which are returned by getEndpoints().
  19. Fixed.
  20. Marked it internal and mentioned that it is mutable. I left setDimensions() as-is because it needs to be used from a few different code paths. It's protected anyway, so as far as consuming code is concerned, the dimensions are immutable.
  21. See previous point.
  22. Because it is only thrown before the Resource object has been created. We're using it to wrap InvalidArgumentExceptions thrown by Resource's factory methods, so as to provide additional debugging data. To clarify, I've added a subclass of it called RawResourceException, which is only to be thrown before the value object has been created.
  23. Fixed.
  24. Glad you liked it!
  25. So that very long or complicated URLs don't overflow the bounds of the cache ID column or introduce other weirdness.
  26. Nice catch. Fixed.
  27. Decided against this because, due to the way oEmbed works (i.e., passing max width/height to the provider, needing to send the resource URL to theme functions in certain cases), I found that the helper method actually made things a little less clear and introduced extraneous function calls. I'm willing to revisit this in a follow-up, though, if you'd like.
  28. Fixed.
  29. Fixed.
  30. How polite of you! Fixed.
  31. What can I say? Media's API is really nice.

As for @alexpott's question in #248: CollegeHumor is listed in the canonical oEmbed provider database, and they support serving resources in XML format, which makes them a useful and needed test case since we do support XML resources. I watched the video itself and it's totally G-rated (not sure what the international equivalent is, the G is the rating that Disney cartoons get).

phenaproxima’s picture

Minor doc comment fixes.

phenaproxima’s picture

Whoops. Fixing the bad assert().

Wim Leers’s picture

Status: Needs review » Needs work
  • +++ b/core/modules/media/src/Controller/OEmbedIframeController.php
    @@ -0,0 +1,152 @@
    +          // Even though the resource HTML is untrusted, Markup::create() will
    +          // create a trusted string. The only reason this is okay is because
    +          // we are serving it in an iframe, which will mitigate the potential
    +          // dangers of displaying third-party markup.
    

    👍

  • 9. I meant it could just be new CacheableResponse(). Remove Markup::create().
  • 20+21. Oh, both setters are protected indeed, all the more reason to NOT use RefinableCDI then! Because that means this value object is not mutable, except for its cacheability. That does not make sense.
  • 25. The cache system takes care of that already. No need for us to reimplement that here.
  • 27. Please don't worry about function calls. We can do many tens of thousands of function calls for the time it takes to do a single HTTP request to fetch the oEmbed representation for URL. If you prefer keeping it like this, fine.
  • WRT @internal:
    adam.hoenich [16:19]
    @wim.leers Also we need to talk about the "internal" stuff -- how much should be internal? Should we remove the service interfaces?
    wim.leers [17:03]
    as much as possible should be internal
    keep only things non-internal that will _commonly_ be used
    adam.hoenich [17:18]
    @wim.leers I'm not sure what else I should mark internal. The exceptions? The services?
    @wim.leers If I'm marking the services internal, I feel like I oughta remove the interfaces too
    One generally doesn't have an interface for internal things =P
    @wim.leers Are there any official guidelines for this?
    wim.leers [17:23]
    +1. Although you could say that the interfaces are there to be used, but they will only be finalized (_non-internal’d_) in a later release, once there are multiple contrib modules consuming it
    https://www.drupal.org/core/d8-bc-policy
    https://www.drupal.org/project/drupal/issues/2550249
    But really, make as much as possible internal, so that you don’t have to go through crazy things to fix bugs and break like the one custom module that does crazy things
    
phenaproxima’s picture

Status: Needs work » Needs review
FileSize
149.31 KB
7.78 KB
  1. Thanks!
  2. Done.
  3. Okay, I relent. I have fallen back to CacheableDependencyInterface and its trait. However, I'd like to say for the record that the reason I originally went with RefinableCacheableDependencyInterface is that oEmbed operations can be very expensive, especially at scale, and I wanted to ensure that skilled developers had some wiggle room to fine-tune the cacheability of resources as needed. However, seeing as how the Resource object is internal, I'm okay with revisiting this in the future if needed.
  4. Okay, fixed.
  5. Leaving it as-is, at least for now.
  6. I decided to mark the following classes explicitly internal:
    • OEmbedIframeController (covered as a controller under BC policy, but marked explicitly internal anyway)
    • ProviderException
    • RawResourceException
    • ResourceException
    • OEmbedFormatter (covered as a plugin under BC policy, but marked explicitly internal anyway)
    • OEmbedWidget (ditto)
    • OEmbedResourceConstraint
    • OEmbedResourceConstraintValidator
    • OEmbedDeriver

    Everything else is marked neither internal nor part of the API. The services and their interfaces are covered under the 1:1 class/interface rule, as are the new OEmbedInterface interface and OEmbed base class for media sources.

phenaproxima’s picture

This will fix the broken test.

It turns out that Twitter triggers an edge case which our tests are catching -- Twitter gives its oEmbed resources an insanely high cache lifetime of 100 damn years. When that number of seconds is added to the current request timestamp, it overflows the 'expire' column of most database cache tables, causing an SQL exception. To prevent that, we are going to arbitrarily limit resource cache lifetimes to 5 years.

Wim Leers’s picture

revisiting this in the future if needed.

👌

100 damn years

😂

It seems Twitter is forcing many HTTP clients to take care of edge cases, including us!


  1. +++ b/core/modules/media/src/OEmbed/ProviderException.php
    @@ -0,0 +1,40 @@
    + * @internal
    + *   This is an internal part of the oEmbed system and should only be used by
    + *   oEmbed-related code in Drupal core.
    + */
    +class ProviderException extends \Exception {
    

    I think this cannot be internal because Provider is not internal.

    Let's see what committers think.

  2. +++ b/core/modules/media/src/OEmbed/ProviderRepository.php
    @@ -0,0 +1,40 @@
    +class ProviderRepository implements ProviderRepositoryInterface {
    +++ b/core/modules/media/src/OEmbed/ProviderRepositoryInterface.php
    @@ -0,0 +1,40 @@
    +interface ProviderRepositoryInterface {
    

    Why not internal?

  3. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,166 @@
    +class ResourceFetcher implements ResourceFetcherInterface {
    
    +++ b/core/modules/media/src/OEmbed/ResourceFetcherInterface.php
    @@ -0,0 +1,32 @@
    +interface ResourceFetcherInterface {
    

    Why not internal?

  4. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,208 @@
    +class UrlResolver implements UrlResolverInterface {
    
    +++ b/core/modules/media/src/OEmbed/UrlResolverInterface.php
    @@ -0,0 +1,54 @@
    +interface UrlResolverInterface {
    

    Why not internal?

phenaproxima’s picture

Why not internal?

Quoth the BC policy:

(undocumented): If an API carries neither of these distinctions, we treat it as "we'll try very hard not to break it, but we may. These will receive a release notes mention.

I believe these interfaces/classes fall under that umbrella. This is a new API, so it's probably unwise to consider it "final except for critical security fixes" -- which rules out the use of the @api tag. On the other hand, these are stable interfaces with implementing classes -- so they are not exactly @internal, the 1:1 rule applies, and we don't anticipate changing them any time soon. The BC policy does not say that there should be any sort of marker on these interfaces, then, although I can add an explicit comment on all three if you'd like. But it seems clear that neither @api nor @internal applies in their case.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

I personally prefer explicitly communicating that they should be considered @internal, because most people don't ever read https://www.drupal.org/core/d8-bc-policy#interfaces. But that shouldn't hold up this patch.

Thanks for addressing all that feedback so quickly; I'm glad we squashed a few bugs that we'd have run into later otherwise, and I'm especially glad that this now increases D8's API surface far less!

phenaproxima’s picture

Fixed a few other very small nits -- mostly tiny things in comments -- plus:

  • Marked MediaSettingsForm @internal, which is in line with Drupal\aggregator\Form\SettingsForm.
  • Removed a call to hash() in the URL resolver, as per Wim's previous feedback about a similar call in the resource fetcher.
alexpott’s picture

Crediting reviewers who made comments material to the outcome of the patch.

alexpott’s picture

Issue tags: +8.6.0 release notes

Re #260.1 I think removing from @internal is fine. So happy to put it in as is.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 8f6fcbd and pushed to 8.6.x. Thanks!

diff --git a/core/modules/media/tests/modules/media_test_oembed/media_test_oembed.module b/core/modules/media/tests/modules/media_test_oembed/media_test_oembed.module
index c118d1a783..c8f8ba1f3d 100644
--- a/core/modules/media/tests/modules/media_test_oembed/media_test_oembed.module
+++ b/core/modules/media/tests/modules/media_test_oembed/media_test_oembed.module
@@ -1,5 +1,10 @@
 <?php
 
+/**
+ * @file
+ * Helper module for the Media OEmbed tests.
+ */
+
 use Drupal\media\OEmbed\Provider;
 
 /**

Fixed on commit.

  • alexpott committed 8f6fcbd on 8.6.x
    Issue #2831944 by chr.fritsch, phenaproxima, marcoscano, seanB, slashrsm...
marcoscano’s picture

Well done everyone! This is a huge win. 👏 👏 👏

A small minor bug seems to have slided into the formatter though: #2976795: Fix warning being thrown when configuring media type displays

steinmb’s picture

Fantastic work everyone!!

mondrake’s picture

I am afraid this has broken HEAD testing on SQLite https://www.drupal.org/pift-ci-job/979584

alexpott’s picture

Status: Fixed » Needs work

With #268 and #270 unfortunately I think this means we need a revert.

  • alexpott committed cc59fcb on 8.6.x
    Revert "Issue #2831944 by chr.fritsch, phenaproxima, marcoscano, seanB,...
phenaproxima’s picture

Here is a fail patch, and fix, for #2976795: Fix warning being thrown when configuring media type displays. I have also included an interdiff against #263, and an interdiff of the bug fix. Thanks for finding this one, @marcoscano!

I'll try to look into the SQLite problem too.

phenaproxima’s picture

@alexpott and I worked on this for a few hours over Slack.

I got stymied by issues related to the brittle nature of the HTTP request mocking in the oEmbed tests, but @alexpott discovered that SQLite is failing due to a database transaction lock conflict during the entity save process, triggering this exception:

<em class="placeholder">Drupal\Core\Entity\EntityStorageException</em>: SQLSTATE[23000]: Integrity constraint violation: 19 NOT NULL constraint failed: file_managed.uri: INSERT INTO {file_managed} (uuid, langcode, uid, filename, uri, filemime, filesize, status, created, changed) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?); Array

As @alexpott said, we do not want to be in an HTTP request during a locking database transaction, SQLite or no SQLite. So we have some work to do in this department.

However, we agreed that a good (very temporary) solution is to skip the failing test on SQLite, and address the failure over in https://www.drupal.org/project/drupal/issues/2964636. We intend to refactor the media_test_oembed module so that it wraps intelligently around the HTTP client service, which should minimize or eliminate a lot of the problems we are experiencing in testing this functionality. So I will be bumping that issue to major.

In addition to skipping the broken test, I have made the following changes in this patch as a result of talking with Alex (and my own attempts to debug the problem):

  • In certain places were oEmbed-related exception messages are logged, I'm logging the entire exception chain (i.e., $exception->getPrevious()), since the oEmbed system uses exception wrapping a lot.
  • RawResourceException is now ResourceException, and it is only ever thrown before a Resource object is instantiated. The reason for this is that Resource is an immutable value object; once it's created, it's stable and needs not throw any exceptions. The dicey part is the fetching and parsing of the resource, and that's what ResourceException will be used for. I did this because @alexpott rightfully pointed out that RawResourceException cannot extend ResourceException because it will violate the Liskov substitution principle once Drupal embraces PHP 7 and its return type support.
alexpott’s picture

One thing the SQLite fails show is that the following line
$thumbnail_uri = $this->getSource()->getMetadata($this, $this->getSource()->getPluginDefinition()['thumbnail_uri_metadata_attribute']);
can end up with $thumbnail_uri = NULL

Which then breaks:

\Drupal\media\Entity\Media::updateThumbnail()
phenaproxima’s picture

Re #276: I opened an issue to describe that in detail: #2976875: oEmbed source plugin can return null thumbnail URI and break things

phenaproxima’s picture

In further discussion with @catch and @alexpott on Slack, we agreed to try moving the potentially brittle code which may trigger HTTP requests out of Media::preSave() -- which is called in the middle of a locked database transaction -- and into Media::save(). Hopefully this works as a quick-fix, but it definitely needs to be revisited so that remote API interactions are never taking place during an entity save.

tstoeckler’s picture

Hmm... doing this in Media::save() is not super nice, because we explicitly support $storage->save($entity) as a (or to be precise: the) method to save an entity, so it's pretty important that Entity::save() stays just a simple wrapper around that and is not overridden. I guess we can get away with it as a quick-fix/workaround, but just saying...

I know people aren't excited about hooks these days, but @hchonov opened #2892654: Entity save post transactional hook a while back because we had the same use-case at bio.logis. Maybe we can get some eyes one that.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
153.57 KB
2.06 KB

I guess we can get away with it as a quick-fix/workaround

That's the idea. It definitely sucks, but it's a temporary solution, not a permanent one. I have a few ideas on how to improve things, but they're out of scope for this issue.

Meanwhile, this should fix the test failures. I have also removed the SQLite test skip from #275. Fingers crossed.

phenaproxima’s picture

Here, for the record, is how I suggest we improve things in a follow-up(s)...

Following the SQLite fiasco, it's clear that Media has some serious implementation issues when it comes to dealing with remote data related to a media item. That includes things like thumbnails which need to be fetched from a remote source, or miscellaneous metadata from an oEmbed resource.

The problem boils down to the fact that we are trying to update thumbnails and metadata -- which, in the case of oEmbed, usually involves talking to some remote API over HTTP -- in the Media::preSave() method. This method is called once the entity saving process has begun, which means a database transaction has started. Interacting with a remote API is an expensive and complicated process, and the database transaction may introduce a lock which can break the API interactions, or vice versa, or any number of different kinds of weirdness. The long and short of it is that we do NOT want to be waiting on an HTTP request(s) while we are in the middle of a database transaction! Now that we (will) have a remote media implementation in core, we need to consider all situations in which a media source may interact with a remote API, and ensure that we encounter none of those situations during a destructive operation like an entity save.

When dealing with oEmbed (and really, any media source that deals with remote systems in any way), virtually any available metadata attribute may trigger a remote API interaction. Therefore, we should not be trying to update a thumbnail, or retrieve any sort of remote metadata, for oEmbed items during an entity save. Only before or after it.

So, this is what I propose: move all operations that may talk to a remote API -- that is, metadata mapping and thumbnail fetching -- into a queue, so that it can all be done when the entity is not in the middle of being saved. One possible way that could look:

  • Add a new method to MediaSourceInterface, called processQueue() or something like that. This method is responsible for doing all metadata mapping and thumbnail downloading for a media item.
  • Deprecate MediaTypeInterface's thumbnailDownloadsAreQueued() and setQueueThumbnailDownloadsStatus() and rename them to something like isQueueable() and setQueueable().
  • Change the media type form so that we explain to site builders what queueing means, and insert a bold statement that we HIGHLY recommend enabling queueing for any remote media type, e.g. oEmbed. In fact, we could even add a flag to the MediaSource annotation to indicate whether that particular media source recommends queueing.
  • Replace the ThumbnailDownloader queue worker with something called QueueProcessor or similar. All it does is call $media->getSource()->processQueue($media) for each item in the queue.
  • In Media::preSave(), call $this->getSource()->processQueue($this) if, and only if, the media type is configured not to allow queueing. Otherwise, set generic default values (or no value) for the thumbnail, and any mapped metadata fields.
  • For queued media types, expose a local task/action (or a button) on the media item edit form, which allows the site builder to manually trigger queue processing for that item. Just in case they can't bear to wait for the queue to run normally.

This approach would have the advantage of treating the thumbnail like a piece of remote data (which, for many media types, it is), and solving the as yet-unsolved debate about how to correctly handle thumbnails.

I would love to hear thoughts about this.

marcoscano’s picture

@phenaproxima @alexpott thanks for digging into this!

I like the approach from #282, but I would suggest a minor adjustment:

- We could maybe use the https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Destructa... mechanism to trigger the first processing of the queue item still during the same php process that saved the entity, but after the response is sent back to the browser. If for some reason this processing fails, we can re-queue the item to be processed again later, but at least we have an opportunity to try to have everything set almost at the same moment of the entity save.

For the record, this could either be solved directly in #2878119: Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction or the follow-up to solve this could supersede that issue.

alexpott’s picture

I think we have to solve #276 in this issue because we have to expect that the network request for the thumbnail will fail sometimes. At the moment that will stop you from saving an entity and there will be an error.

alexpott’s picture

Can do something like - when we create a media entity we always generate a thumbnail pointing to modules/media_entity/images/icons/generic.png and then update it once the media entity is saved and if $this->getSource()->getMetadata($this, $this->getSource()->getPluginDefinition()['thumbnail_uri_metadata_attribute']); is successful?

Wim Leers’s picture

I think we have to solve #276 in this issue because we have to expect that the network request for the thumbnail will fail sometimes. At the moment that will stop you from saving an entity and there will be an error.

Things will fail. YouTube will be down at some point. Everything will be down at some point.

phenaproxima’s picture

In discussion with @alexpott on Slack, we realized that the thumbnail logic of the oEmbed plugin is very convoluted. So I decided to untangle it.

This patch removes the thumbnail_url, thumbnail_local_uri, and thumbnail_local metadata attributes. There is now only thumbnail_uri, which will do the following things:

  1. If the resource has no thumbnail at all, the default thumbnail is returned. End of story.
  2. The local thumbnail URI is computed, and if it already exists, that URI is returned.
  3. If the URI does not exist, we try to download it. If the download fails, errors are logged and the default thumbnail is returned.
  4. But if everything goes swimmingly, the local thumbnail URI is returned.

Additionally, this patch renames the thumbnails_uri configuration option to thumbnails_directory, for clarity's sake.

My fingers and toes are crossed that this passes tests.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
153.94 KB
2.49 KB

This oughta fix the tests.

alexpott’s picture

+++ b/core/modules/media/src/Plugin/media/Source/OEmbed.php
@@ -0,0 +1,462 @@
+    $local_thumbnail_uri = "$directory/" . $media->uuid() . '.' . pathinfo($remote_thumbnail_url, PATHINFO_EXTENSION);

I've been thinking about this line. I think we need to include something that would vary when the thumbnail url varies. Because if the source is updated then we regenerate thumbnail but the file is going to exist. Also for the purposes of caching this feels problematic. Maybe crc32() the url or something.

phenaproxima’s picture

This patch implements two suggestions @alexpott made, both here and in Slack:

  1. In the name of landing this damn patch, media items will now use their default thumbnail (which never needs to be downloaded) in Media::preSave(), and it will be updated before the actual save operation takes place. This is really ugly, convoluted logic, but it can be (and in fact, must be, for sanity's sake) cleaned up in a follow-up issue.
  2. As requested in #290, the local thumbnail URI is cached by SHA256-ing the remote thumbnail URL.

This will definitely fail tests, but it shouldn't (hopefully) be too hard to fix those failures.

phenaproxima’s picture

Okay, this should fix the tests. Also made an additional small change @alexpott requested on Slack -- changed the SHA256 hash to Crypt::hashBase64().

alexpott’s picture

  1. Fix the test that's going to fail.
  2. Addresses a couple of coding standards
  3. Sorts out allowed_providers vs supported_providers. The provider's each plugin supports are now providers and the providers each oembed media source allows are just providers. The difference is confusing because they are just lists of providers on different levels. Also reduced the API because listed supported providers on the source level is not needed. Less API ftw
alexpott’s picture

For me the Media::save() stuff is still the wrong way around - we should save the media with the default thumbnail first and then attempt to get the remote thumbnail once the media entity is created.

phenaproxima’s picture

we should save the media with the default thumbnail first and then attempt to get the remote thumbnail once the media entity is created.

As in, save it once with the default thumbnail, fetch the remote thumbnail, then re-save it? I was trying to avoid a double-save, but if that's what we gotta do, that's what we gotta do...

alexpott’s picture

Status: Needs work » Needs review
FileSize
1016 bytes
157.4 KB

@phenaproxima it's only likely to be one double save right? To create the file entity pointing to the default thumbnail. After that all media entities will be linked to the same thumbnail - right?

Here's a fixed patch.

phenaproxima’s picture

I must be confused about what you're asking, because this is what I was thinking:

  1. During the preSave process, we set the media item to use the default thumbnail. That file entity may already exist, or it may not. If it doesn't exist, we create it, save it, and reference it. Otherwise, we just reference it.
  2. We save the media item.
  3. Then, after the initial save (but still in Media::save()), we try to update the thumbnail by fetching the real one. If we succeed in fetching it, we need to create a file entity for that, save it, reference it, and then re-save the media item so that it's storing the new reference.

That's a lot of saves, which is why I set the default thumbnail in preSave(), then request an update just before we do the actual save operation. What am I missing?

alexpott’s picture

@phenaproxima ah yes we'll need to re-save the media in order to update the reference. Maybe this is the best solution for now until we can discuss a better fix which involves queues. One thing I end up thinking is why are we even bothering doing the local thumbnail creation in the first place. Your content is already coupled to the availability of the third party and the whole behaviour of local thumbnails makes it super complex.

phenaproxima’s picture

It's unsustainably messy, no doubt about that. I know there's a follow-up issue (filed when the Media module was originally committed to core), about cleaning up thumbnail generation and update code. I'll find that issue and bump its priority.

alexpott’s picture

I've tried to do a naming things review - a few things have come up including a security issue. I've also got a patch that addresses some of this. I'll post in the next comment detailing what's been addressed.

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +    iframe_domain:
    +      type: string
    +      label: 'Domain from which to serve oEmbed content in an iframe'
    
    +++ b/core/modules/media/media.install
    @@ -120,3 +153,12 @@ function media_update_8500() {
    +function media_update_8600() {
    +  \Drupal::configFactory()->getEditable('media.settings')
    +    ->set('oembed_providers', 'https://oembed.com/providers.json')
    +    ->save(TRUE);
    +}
    

    Should we provide a default value of an empty string? And set in the update? I think so because if we don't then config will change by just going to the form and pressing save. Also this is a uri too :)

  2. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -5,6 +5,12 @@ media.settings:
    +    oembed_providers:
    +      type: string
    +      label: 'The URL of the oEmbed providers database in JSON format'
    

    This should a type of uri plus the name could be better - maybe oembed_providers_url - atm it sounds like a list.

  3. +++ b/core/modules/media/media.api.php
    @@ -20,6 +20,23 @@ function hook_media_source_info_alter(array &$sources) {
    +function hook_oembed_resource_url_alter(array &$parsed_url, \Drupal\media\OEmbed\Provider $provider) {
    

    Is this new hook in the CR?

  4. +++ b/core/modules/media/media.module
    @@ -72,6 +73,11 @@ function media_theme() {
    +    'media_oembed' => [
    +      'variables' => [
    +        'post' => NULL,
    

    This theme function is only about iframes. It should be named accordingly and post is also not a great name. Let's called it media as that's what is being displayed.

  5. +++ b/core/modules/media/media.module
    @@ -99,6 +106,18 @@ function media_theme_suggestions_media(array $variables) {
    +  $source = $media->getSource();
    +  if ($source instanceof OEmbedInterface) {
    +    $suggestions[] = 'media__oembed';
    +
    +    $provider_id = $source->getMetadata($media, 'provider_name');
    +    if ($provider_id) {
    +      $provider_id = \Drupal::transliteration()->transliterate($provider_id);
    +      $provider_id = preg_replace('/[^a-z0-9_]+/', '_', mb_strtolower($provider_id));
    +      $suggestions[] = "media__oembed__$provider_id";
    +    }
    +  }
    

    A couple of thoughts here.
    1. Do we fear clashes when a bundle and view mode clash with the name oembed and the provider ID?
    2. Should we generalise this so its availabble for all sources not just oembed - we could use the source plugin ID.

  6. +++ b/core/modules/media/src/Controller/OEmbedIframeController.php
    @@ -0,0 +1,162 @@
    +    $url = $request->query->get('url');
    +    if (!$url) {
    +      throw new BadRequestHttpException('url parameter not provided');
    +    }
    +    if (!UrlHelper::isValid($url, TRUE)) {
    +      throw new BadRequestHttpException('url parameter is invalid');
    +    }
    +    if (!UrlHelper::isExternal($url)) {
    +      throw new BadRequestHttpException('url parameter is not external');
    +    }
    

    Been thinking about the security of this. I don't think we should be passing url or other info here. I think we should be passing the media ID and loading it and getting the info from there. This will stop users from being able to manipulate the url and displaying things on sites that the site did not intend to be displayed. Or we should also pass a token on the URL so the attributes can't be futzed with.

  7. +++ b/core/modules/media/src/Controller/OEmbedIframeController.php
    @@ -0,0 +1,162 @@
    +          '#post' => Markup::create($resource->getHtml()),
    

    This should use a special media markup object specific to the purpose. Like ViewsRenderPipeline.

  8. +++ b/core/modules/media/src/OEmbed/UrlResolver.php
    @@ -0,0 +1,213 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function isSecure($url) {
    +    if (!$url) {
    +      return FALSE;
    +    }
    +    $url_host = parse_url($url, PHP_URL_HOST);
    +    $system_host = parse_url($this->requestContext->getCompleteBaseUrl(), PHP_URL_HOST);
    +
    +    // The URL is secure if its domain is not the same as the domain of the base
    +    // URL of the current request.
    +    return $url_host && $system_host && $url_host !== $system_host;
    +  }
    

    I don't think this is the right place for this. It's not really about URL resolution. Not sure where to put it though.

  9. +++ b/core/modules/media/src/OEmbed/ResourceFetcher.php
    @@ -0,0 +1,166 @@
    +    try {
    +      return call_user_func_array(Resource::class . '::' . $data['type'], $arguments);
    +    }
    

    This is clever but feels overly complex. We have the static methods - let's just call them. We don't have to shift arguments depending on type we just call each one.

alexpott’s picture

Patch addresses #304.1, .2, .4, .6 by using a token - but we need to add tests for a bad token, .7 and .9.

The answer to #304.3 is yep it is on the CR - nice one.

We still need to resolve #304.5, add tests for .6 and .8

phenaproxima’s picture

This patch:

  • Adds a kernel test of OEmbedIframeController to ensure that bad hashes raise the correct exception. In discussion with @alexpott on Slack, we also agreed to remove the URL validation, since the hash pretty much obviates it. So that's #304.6 done.
  • Discussed #304.5 with @lauriii and @alexpott, and we agreed that although the existing suggestions can't be changed for BC reasons, the new suggestions should have some prefixing. So the new suggestions will be:
    • media__source_$source_id (for non-derived sources only)
    • media__source_$source_id__derivative_$derivative_id (for derived sources only)
    • media__source_$source_id__derivative_$derivative_id__provider_$provider_id (only for oEmbed sources which know the provider name)
  • For #304.8, I moved isSecure() into a new service, marked internal, called media.oembed.iframe_url_helper, and renamed the existing (extensive) tests of isSecure() accordingly.

Status: Needs review » Needs work

The last submitted patch, 306: 2831944-306.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

I fear the people who worked so much on this aren't going to like this, but I wanted to bring it up..

So we have this 165kb patch that adds a lot of very new, pretty complex (security-wise for example) stuff to the media module, which means it is stable. We also just found a pretty serious security issue, *after* it has been committed for the first time.

There was also a lot of discussion about API stability, @internal and so on.

What if.. we add this instead as a media_oembed module that we can mark as experimental? That means the barrier for getting it committed is much lower. And the only reason media got into core as non-experimental (although hidden) module was that it was fairly close conceptually to the stable contrib module, while this is large amount of completely new code. The media module is growing rapidly and while it made sense to add all the basic media source plugins directly into that module, we also have a bunch of separate modules for various field types for example.

alexpott’s picture

Status: Needs work » Needs review
FileSize
3.06 KB
165.59 KB

Patch attached fixes the test fails.

@Berdir thinking about the suggestion to put this in another module. That would make it easier to get this into core, I agree. And perhaps be a little less risky. But I'm not sure that adding another alpha experimental module for this is a good idea. Here's my thinking:

  • I think the idea that media supports oembed out-of-the-box is a good one. It means that the site builder only needs to enable media to be able to embed youtube etc properly. This feels like great out-of-the-box functionality.
  • There is only one new piece of API added by the patch
  • The "security" issue found yesterday turned out to not be a security issue - @samuel.mortenson had identified it before - it's just that it was so secure that media would not be displayed to anonymous users!
  • Most of the patch is just consuming the existing media APIs

I don't think this patch far away now.

I need to review @phenaproxima's changes in #306 more.

alexpott’s picture

alexpott’s picture

Reviewed #306 more. All looks good. Patch attached makes the test mocking consistent and stops using test class properties unnecessarily. Also use iFrameUrlHelper consistently to avoid confusion with UrlHelper.

samuel.mortenson’s picture

Status: Needs review » Reviewed & tested by the community

My review of the latest interdiffs looks good to me. I'm not sure if we need another core-person to response to #308, but I agree with @alexpott.

borisson_’s picture

I agree with @alexpott's reponse to #308.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Second time lucky!

Committed 00c2069 and pushed to 8.6.x. Thanks!

  • alexpott committed 00c2069 on 8.6.x
    Issue #2831944 by chr.fritsch, phenaproxima, alexpott, marcoscano,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

xjm’s picture

What update instructions do module developers or site owners need from this issue? If it's just a highlighted new feature of 8.6 it should be tagged "8.6.0 highlights" instead.

xjm’s picture

2dareis2do’s picture

"Enter a different domain from which to serve oEmbed content, including the http:// or https:// prefix. This domain needs to point back to this site, or existing oEmbed content may not display correctly, or at all."

If I understand this correctly there advice here is to set up a subdomain that points to the main url (301 redirect) and add this to your media config? What am I missing here.? When following these instructions I simply get a 404 error where the iframe would be?

Any detail on how the is done would be useful. Do I need to set up a drupal multisite config to achieve this?

I am sure this is a great security feature, but as with quite a lot of things in Drupal the documentation is either hard or simply lacking.