This project is not covered by Drupal’s security advisory policy.

This module provides a Migrate process plugin to enable you to parse and extract a $needle (string) from a $haystack (string). This will either return $match[1] or the orignal $haystack if no match is found.

Use Case

This migrate_process_extract_regex module is useful where developers are
handling remote images with the remote_steam_wrapper module. Sometimes remote
media assets (pictures etc.) do not contain a valid query string and will fail
to import.

e.g.

This is handled fine:

`https://www.ilfordrecorder.co.uk/resources/images/17245750/?type=og-image`

However, this does not work on other images that are appended with a query e.g.

`https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT0003...`

Therefore, to avoid exceptions and to ensure the image can be handled by remote steam wrapper, it may be better to strip the query string as part of the migration process. The specific configuration that handles this would be along these lines:

      -
        plugin: migrate_process_extract_regex
        regex:  /^(.*)\?/ 

Obviously, you can change the regular expression used here to suit your use
case.

Under the Hood

This module uses the php preg_match() function.

In particular this returns the first match e.g. &$matches[1].

$matches[1] will have the text that matched the first captured parenthesized subpattern

https://www.php.net/manual/en/function.preg-match.php

Installation

Download the module and enable it.

Example Usage

* @code
 * process:
 *   'body/value':
 *     -
 *       plugin: migrate_process_html
 *       source: link
 *     -
 *       plugin: dom
 *       method: import
 *     -
 *       plugin: dom_select
 *       selector: //meta[@property="og:image"]/@content
 *     -
 *       plugin: skip_on_empty
 *       method: row
 *       message: 'Field image is missing'
 *     -
 *       plugin: extract
 *       index:
 *         - 0
 *     -
 *       plugin: skip_on_condition
 *       method: row
 *       condition:
 *         plugin: not:matches
 *         regex: /^(https?:\/\/)[\w\d]/i
 *       message: 'We only want a string if it starts with http(s)://[\w\d]'
 *     -
 *       plugin: migrate_process_extract_regex
 *       regex:  /^(.*)\?/ 
 *     -
 *       plugin: file_remote_url
 *
 * @endcode

Please note that using `skip_on_condition` with 'matches' requires the excellent
migrate_conditions module.
https://www.drupal.org/project/migrate_conditions

Project information

Releases