Problem/Motivation

We have a persistent issue where content editors will link to other pages on their site by copying and pasting entire URLs into the LinkIt dialog. For example, they will copy and paste https://sample.com/some/page and link to that, rather than typing in the title "Some Page" and waiting for the matcher to return a result.

As a result, we get lots of absolute URLs in links throughout the site, which is especially problematic if the editors are using a temporary domain (like https://sample-new.princeton.edu), since all those links will break once the temporary domain goes away and the site they're working on goes live.

Proposed resolution

1. Strip the base URL from pasted URLs into the Linkit autocomplete.
2. Check if the URL is a path alias to some other path, and then use that system path instead.

Then the LinkIt filter would convert the system path back to alias on render.

Remaining tasks

  • Support stripping base urls in frontend (when using the CKEditor 5 link widget or the field widget)
    • Support unmatched urls (No content suggestions found. This URL will be used as is.)
  • Support stripping base urls in backend (when bypassing the frontend widget)
    • Support urls matched as entities (in EntityMatcher)
    • Support external links missing a protocol (in ExternalMatcher)
    • Support urls to the front page (in FrontPageMatcher)
  • Support base urls other than the current url (e.g. in case of headless systems or pre-production environments). Ideally this would be configurable, like the Pathologic module does

Implications

This is pretty complicated to do because we cannot just swap out the system path, we also need to add the matcher attributes.

CommentFileSizeAuthor
#103 3078075-disable_insensitive-102.patch636 bytesjoseph.olstad
#102 Screenshot 2024-07-02 at 10.11.40.png97.23 KBmatthiaso
#98 strip-baseurl-onkeydown.3078075.98.patch14.52 KBmark_fullmer
#98 linkit-remove-baseurl.gif314.63 KBmark_fullmer
#83 CleanShot 2023-11-08 at 18.53.05@2x.png60.56 KBdieterholvoet
#77 linkit6x-3078075-77.patch16.67 KBsmulvih2
#76 linkit6x-3078075-76.patch16.37 KBsmulvih2
#68 linkit6x-3078075-68.patch17.4 KBsylus
#62 linkit6x-3078075-62.patch16.43 KBmark_fullmer
#62 linkit6x-3078075-59_62.interdiff.txt2.61 KBmark_fullmer
#59 linkit6x-3078075-59.patch18.04 KBjoseph.olstad
#56 z_linkit_example-fr_instead.png137.86 KBjoseph.olstad
#56 z_LINKIT_EXAMPLE_francais_prefix.png134.21 KBjoseph.olstad
#53 base_url_problem#1.png7.42 KBjamesyao
#47 Screen Shot 2022-01-28 at 1.19.26 PM.png38.63 KBrajeshreeputra
#43 interdiff_41_to_43-3078075.txt3.8 KBjoseph.olstad
#43 3078075-43.patch16.65 KBjoseph.olstad
#41 interdiff_40_to_41-3078075.txt1.25 KBjoseph.olstad
#41 3078075-41.patch18.89 KBjoseph.olstad
#40 interdiff_26_to_40-3078075.txt5.21 KBjoseph.olstad
#40 3078075-40.patch18.61 KBjoseph.olstad
#37 3078075-37.patch15.08 KBjoseph.olstad
#37 interdiff_26_to_37-3078075.txt1.68 KBjoseph.olstad
#26 3078075-26.patch14.17 KBjoseph.olstad
#26 interdiff-3078075_24_to_26.txt1.77 KBjoseph.olstad
#24 3078075-24.patch13.78 KBjoseph.olstad
#24 interdiff-3078075_22_to_24.txt1.58 KBjoseph.olstad
#22 interdiff-3078075-20_to_22.txt1.25 KBjoseph.olstad
#22 3078075-22.patch12.81 KBjoseph.olstad
#21 interdiff_20_to_pending.txt1.25 KBjoseph.olstad
#19 3078075-20.patch12.81 KBjoseph.olstad
#18 3078075-19.patch11.64 KBjoseph.olstad
#17 Wxt__00054.mp41.37 MBjoseph.olstad
#17 3078075-17.patch11.08 KBjoseph.olstad
#13 3078075-13.patch9.03 KBidebr
#13 3078075-13-test-only.patch3.1 KBidebr
#13 3078075-13-COMPATIBLE-do-not-test.patch8.49 KBidebr
#13 interdiff-11-13.txt776 bytesidebr
#11 3078075-11.patch9.02 KBidebr
#11 3078075-11-test-only.patch3.1 KBidebr
#11 3078075-11-COMPATIBLE-do-not-test.patch8.48 KBidebr
#2 3078075-2-test-only.patch950 bytesidebr
#2 3078075-2.patch2.09 KBidebr
#4 interdiff-3078075-2-4.txt2.46 KBlendude
#4 3078075-4.patch4.54 KBlendude
#5 interdiff-3078075-4-5.txt7.04 KBlendude
#5 3078075-5.patch8.23 KBlendude
#6 3078075-5-link-widget-compatible.patch7.7 KBlendude
#7 interdiff-3078075-5-7.txt3.71 KBlendude
#7 3078075-7.patch8.89 KBlendude
#8 3078075-9-COMPATIBLE.patch8.42 KBlendude
#8 interdiff-3078075-7-9.txt508 byteslendude
#8 3078075-9.patch8.95 KBlendude
#9 interdiff-8-9.txt598 bytesidebr
#9 3078075-9-test-only.patch3.1 KBidebr
#9 3078075-actually-9.patch8.99 KBidebr

Issue fork linkit-3078075

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

bkosborne created an issue. See original summary.

idebr’s picture

Title: Detect and strip base URL if pasted in before calling matchers » Detect and strip base URL from pasted URLs to increase matching hits
Status: Active » Needs review
FileSize
950 bytes
2.09 KB

The base entity matcher class already tries to convert relative URLs to an entity in \Drupal\linkit\Plugin\Linkit\Matcher\EntityMatcher::findEntityIdByUrl(). When the host is stripped from the pasted URL, the entity is then correctly matched in \Drupal\Core\Url::fromUserInput().

Attached patch converts the absolute URL to a relative URL, and adds test coverage for this new scenario.

lendude’s picture

Looks great, one nit:

+++ b/src/Plugin/Linkit/Matcher/EntityMatcher.php
@@ -516,6 +517,16 @@ class EntityMatcher extends ConfigurableMatcherBase {
+      if (UrlHelper::externalIsLocal($user_input, \Drupal::request()->getSchemeAndHttpHost())) {

Ideally this would use dependency injection instead of \Drupal::

lendude’s picture

This addresses #3

This also provides hardening to the \Drupal\linkit\Plugin\Linkit\Matcher\EntityMatcher::findEntityIdByUrl method for when there are multiple route parameters, the key() / end() logic breaks when the entity type paramater isn't the first and only parameter. Might need its own issue, but needed it here ¯\_(ツ)_/¯

lendude’s picture

This now makes this work for multilingual sites when using a URL from a different language than the current one. (basically the D8 version of #2848429: Pasting an internal url gives no result if the url has a language prefix). Sorry no tests for this yet

lendude’s picture

This is the patch from #5 but to make it compatible with #2712951: Linkit for Link field for those that need that.

lendude’s picture

This reverts the unwanted changes that somehow snuck into #5 and which caused the test to fail.

This also adds test coverage for the multilingual scenario.

lendude’s picture

Cleaned up too much when I removed an unused variable, sorry 'bout the noise

idebr’s picture

Attached patch fixes the last coding standard issue in the patch. Also included a test-only patch.

johan den hollander’s picture

Status: Needs review » Needs work

Tested this today on the 5.x-dev. Patch applied but I could not notice a difference. Links were pasted into a text paragraph with the full url.
The full url was saved to database.

idebr’s picture

Status: Needs work » Needs review
FileSize
8.48 KB
3.1 KB
9.02 KB

Attached patch fixes the following error in the log messages when entering incomplete URLs:

invalidArgumentException: A path was passed when a fully qualified domain was expected. in Drupal\Component\Utility\UrlHelper::externalIsLocal()

#10 The patch already contains automated tests to confirm the behaviour is applied correctly. Can you verify your use case matches what the code is doing?

The last submitted patch, 11: 3078075-11-test-only.patch, failed testing. View results

idebr’s picture

Note: should only check for external URLs.

The last submitted patch, 13: 3078075-13-test-only.patch, failed testing. View results

anneke_vde’s picture

Status: Needs review » Reviewed & tested by the community

The above patch fixes the invalidArgumentException error.

joseph.olstad’s picture

+1, please commit this patch

joseph.olstad’s picture

FileSize
11.08 KB
1.37 MB

Hi, I made some UI improvements to this patch, see demo below.

Click to watch video

joseph.olstad’s picture

FileSize
11.64 KB
joseph.olstad’s picture

FileSize
12.81 KB

New patch, if you're dealing with token get params for access unpublished tokens on draft content
OR if your get params are case sensitive, prior to this patch, linkit would unceremoniously switch everything to lower case (including case-sensitive parameters)

Highly recommend this patch here:

joseph.olstad’s picture

I would like to add another test to this and also handle a possible multisite use case where the path prefix is part of the vhost, so

Three use cases need to be taken into consideration if we want this to work for everyone:

http(s)://domainname/prefix/langcode
http(s)://domainname/prefix/langcode/path/to/content
http(s)://domainname/prefix/
http(s)://domainname/prefix/langcode/
http(s)://domainname/prefix/langcode/path/to/content
http(s)://domainname/prefix/langcode/langcode/path/to/content
http(s)://domainname/langcode (normal)
http(s)://domainname/langcode/path/to/content
http(s)://domainname/langcode/ (normal)
http(s)://domainname/langcode/path/to/content/
http(s)://domainname/
http(s)://domainname/path/to/content/
http(s)://domainname
http(s)://domainname/path/to/content

suspect maybe need a bit of a tweak to handle all these possibilities

joseph.olstad’s picture

Assigned: Unassigned » joseph.olstad
Status: Reviewed & tested by the community » Active
FileSize
1.25 KB

ALL or MOST of the above previous patches are NOT COMPATIBLE WITH DRUPAL 9!!!

Fix to follow shortly

joseph.olstad’s picture

Assigned: joseph.olstad » Unassigned
Status: Active » Needs review
FileSize
12.81 KB
1.25 KB

@TODO:
These use cases need to be taken into consideration if we want this to work for everyone:

http(s)://domainname/prefix/langcode
http(s)://domainname/prefix/langcode/path/to/content
http(s)://domainname/prefix/
http(s)://domainname/prefix/langcode/
http(s)://domainname/prefix/langcode/path/to/content
http(s)://domainname/prefix/langcode/langcode/path/to/content
http(s)://domainname/langcode (normal)
http(s)://domainname/langcode/path/to/content
http(s)://domainname/langcode/ (normal)
http(s)://domainname/langcode/path/to/content/
http(s)://domainname/
http(s)://domainname/path/to/content/
http(s)://domainname
http(s)://domainname/path/to/content
joseph.olstad’s picture

HEAD tests for D9 have an issue, not related to this patch
#3109544: Update LinkitUpdateTest for Drupal 9

joseph.olstad’s picture

FileSize
1.58 KB
13.78 KB

Ok this patch is perfection, well, you can criticize but it will work everywhere which is what we want.


see patch 26, it is the good one

joseph.olstad’s picture

Assigned: Unassigned » joseph.olstad

Ok, so I spoke too soon, new patch and interdiff

joseph.olstad’s picture

Interdiff and new patch

joseph.olstad’s picture

Assigned: joseph.olstad » Unassigned

unassigning myself

sylus’s picture

Status: Needs review » Fixed

Committed and attributed!

Thanks for all the hard work!

sylus’s picture

Status: Fixed » Needs review

Whoops wrong queue my apologies

joseph.olstad’s picture

Just a FYI note about patch 26, the auHash logic is to support access_unpublished, shouldn't cause an issue elsewhere, I forgot to add a comment to the js code about this.

@TODO improve access_unpublished support:
could be improved by passing the 'key' name as a setting into the javascript as I did for the baseurl, since the access_unpublished key is configurable (but the default is auHash)
for now just supporting the 'default' access_unpublished key.

If you're not using access_unpublished everything else still works either way.

joseph.olstad’s picture

This patch has test coverage, is a significant improvement and we've been using it in a number of sites.

joseph.olstad’s picture

Title: Detect and strip base URL from pasted URLs to increase matching hits » Detect and strip base URL from pasted URLs to increase matching hits and support multilingual
joseph.olstad’s picture

patch 26 is still clean vs HEAD

Status: Needs review » Needs work

The last submitted patch, 26: 3078075-26.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

joseph.olstad’s picture

Assigned: Unassigned » joseph.olstad
joseph.olstad’s picture

Hmm, test failure appears to be perhaps due to a change in the test bot or to the testing framework it'self.

I am running the patch with Drupal 9.2.7 no issues, no errors. No reported issues either from anyone I know using this. Drupal 9.1.13 running linkit with patch 26 also had no issues that were brought to my attention.

Investigating still, perhaps a change is needed for the automated tests

joseph.olstad’s picture

FileSize
1.68 KB
15.08 KB

Reroll, see interdiff for changes, only changed automated tests.

joseph.olstad’s picture

Status: Needs work » Needs review

not sure if I needed this line or not, or if it might cause an issue:

+ $this->installEntitySchema('path_alias');

Status: Needs review » Needs work

The last submitted patch, 37: 3078075-37.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

joseph.olstad’s picture

New patch, new interdiff, this is only for tests , no other change, only tests.

joseph.olstad’s picture

FileSize
18.89 KB
1.25 KB

Attempt to fix these D8 tests now so they work with D9

joseph.olstad’s picture

Assigned: joseph.olstad » Unassigned
Category: Feature request » Bug report
Priority: Normal » Major
Status: Needs work » Needs review
joseph.olstad’s picture

w00t!

joseph.olstad’s picture

Category: Bug report » Feature request
jamesyao’s picture

Thanks joseph.olstad for quickly providing the patch in Drupal 9.

liam morland’s picture

rajeshreeputra’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
38.63 KB

It works as expected.

Pasted http://localhost/drupal/node/1 and it auto converted to /node/1 in link popup.

stijnhau’s picture

Patches aren't working anymore

martijn de wit’s picture

Tests are failing with php 8.1: https://www.drupal.org/pift-ci-job/2485665
Probable by #3221026: [PHP8] Tests are failing in PHP 8 / Drupal 9.2.x ? .

But I don't know if we have to change the patch

martijn de wit’s picture

what problem do you encounter @stijnhau.

stefanos.petrakis@gmail.com’s picture

Status: Reviewed & tested by the community » Needs work

This would be great to figure out eventually, lots of work behind this issue.
I am not sure if all the cases reported in #20 are covered at the moment, didn't have time to look at this in detail; #47 that set this to RTBC only reported on one of those cases.

Additionally, the previous three comments report possible PHP8.1 hiccups.

Setting this back to "Needs work" and hope I can find some time to work on this soon.

joseph.olstad’s picture

jamesyao’s picture

FileSize
7.42 KB

Hi @joseph.olstad
I applied the patch #20, #26, and #43 in my local DEV environment, respectively. The patches do not strip the base URL from pasted URLS on Drupal Core 9.4.x and PHP 8.0.

The base URL is not removed

jamesyao’s picture

Hi @joseph.olstad the patches do not work with any French pages when i copied/pasted the absolute path of the French pages. It leaves a relative path for French page instead of /node/nid.

joseph.olstad’s picture

@jamesyao, your use case is with entity_translation_unified_form which edits french/english content at the same time using the english interface the javascript and linkit with this patch according to your screenshot above is resolving to the correct data-entity-uuid, the patch is still doing it's work. It's a huge improvement compared to without the patch. Remove the patch and you'll see how things behave in a much inferior way.

Very few people use entity_translation_unified_form , far more people do not use it than those that use it. With that said, this patch still does improve things even when using entity_translation_unified_form compared to using linkit without the patch.

joseph.olstad’s picture

@jamesyao

I tested the patch using a similar setup as yours and it's working.

My apache is using /alias for /french and /english prefix and I have a symlink in the root of my Drupal environment that points to "."

both urls have the same linkit link, the same page

Linkit automatically generates the relative link in the href attribute

see screenshots, one is https://example.ca/fr
 /fr

and the other is https://example.ca/francais/fr
 /francais/fr

jamesyao’s picture

Thanks @joseph.olstad for the clarification. The French relative path issues in my DEV environment is related to a bug in entity_translation_unified_form module.
Have you got a chance to test stripping base URL off case? Is it related to a bug in entity_translation_unified_form module?

joseph.olstad’s picture

patch appears to need a reroll for PHP 8.1 compatibility

joseph.olstad’s picture

FileSize
18.04 KB

reroll for 6.0.x

joseph.olstad’s picture

Version: 8.x-5.x-dev » 6.0.x-dev
Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 59: linkit6x-3078075-59.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mark_fullmer’s picture

Status: Needs work » Needs review
FileSize
2.61 KB
16.43 KB

It appears that the NodeMatcherTest failed in #59 because Kernel tests still need to be declared with 'public' visibility (in contrast to Functional/FunctionalJavascript tests)?

Here's a re-roll that changes the visibility back from 'protected' to 'public'.

joseph.olstad’s picture

Thanks @mark_fullmer , 👍

mark_fullmer’s picture

Issue summary: View changes
mark_fullmer’s picture

Re-reading the original stated motivation for this issue -- namely, that pasted absolute URLs are *rendered* and if the base URL changes, those URLs will be wrong -- I want to ask whether that part of the motivation would more universally be solved in the context of text format fields by using the pathologic module, which specifically supports stripping matching base URLs from content entered in text editors.

That said, there still are two reasons that Linkit should maybe also do this:
1. The Linkit field widget/formatter would not be able to make use of Pathologic to fix this.
2. There is utility in having the base URL stripping happen during the autocomplete process: content editors will get confirmation that the URL they are pasting is, in fact the entity they are trying to link to, since it will show in the autocomplete suggestions.

So: I still think I'm in support of doing this, but others coming to this issue should also consider the pathologic module for solving the similar goal in other content in text format fields.

joseph.olstad’s picture

Hi Mark , in our case the patch you rolled works perfectly and is desired. we have one Drupal build that is used by 100+ installations, each with the same domain, there is a different base path for all 100 of these. With all due respect, pathologic is an inferior solution that would require us to have 100 additional configurations for our build.

mmarler’s picture

Assigned: Unassigned » mmarler
sylus’s picture

FileSize
17.4 KB

Just a quick re-roll of #65

Status: Needs review » Needs work

The last submitted patch, 68: linkit6x-3078075-68.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

joseph.olstad’s picture

possible solution to failed test

Line 41 of
tests/src/Functional/LinkitBrowserTestBase.php
change this from:

  protected static $modules = ['linkit', 'linkit_test', 'block'];

To this:

  protected static $modules = ['linkit', 'linkit_test', 'block', 'pathauto'];
lendude’s picture

Only recently updated to Linkit 6 so we were still on patch #13, client wasn't happy with the UI changes that got added so made a local patch without them.

Won't derail the thread here, so not uploading, just something to take into consideration before this would be merged. I'm sure the warning is useful for some users but not for ours, so I think it should be made optional at the very least.

idebr’s picture

@joseph.olstad Can you elaborate why this issue needs user interface changes in #16? The patch in #13 converts absolute URLs without bothering the user.

joseph.olstad’s picture

@idebr

Patch #68 includes enhancements over #13

  1. Include support for the base path
  2. include support for access unpublished token
  3. Notice message

I created a video demo here:
download and view the video for a demo.
the demo is from comment #17 above
#3078075-17: Detect and strip base URL from pasted URLs to increase matching hits and support multilingual

I agree with what was mentioned, it would be nice to have some of this as optional. The token processing doesn't cause issues if it's not present.

mark_fullmer’s picture

Assigned: mmarler » Unassigned
moshe weitzman’s picture

One thing that this patch doesn't handle (I think) is that editing is sometimes done on a different domain than viewing (for security). For example, the author is editing on edit.mass.gov but pastes in a link to www.mass.gov/node/n. Ideally this code would recognize www.mass.gov as an additional base_url and this recognize the pasted link as internal. So, give sites a way to provide a list of domains, not just the one that the author is using.

smulvih2’s picture

FileSize
16.37 KB

Re-roll of #68 after 3212820 was merged into the 6.0.x branch.

smulvih2’s picture

FileSize
16.67 KB

Found an issue with my last patch in #76, which seems to be an issue with some of the previous patches as well. This issue was in AutcompleteController.php, where $this->linkitProfile is not defined.

I tested this new patch against the proposed resolution in the ticket description and can confirm that using an absolute URL gets handled correctly, and the content then uses the system path instead of the alias.

DieterHolvoet made their first commit to this issue’s fork.

dieterholvoet’s picture

@joseph.olstad Can you elaborate why this issue needs user interface changes in #16? The patch in #13 converts absolute URLs without bothering the user.

I agree, the bright red message is unnecessary. I did notice that the message and the disabling of the submit button doesn't even work in CKEditor 5 because CSS selectors changed, so that's another reason to leave it out. I'll start a MR based on the last part and remove that part of the code.

dieterholvoet’s picture

One oddity I discovered is that when using the Linkit widget, absolute node URLs are resolved to their node/xxx paths. After saving the form and reloading the page, that internal path is again converted to the path alias including the language prefix. Can't we change it so we resolve absolute urls to the aliased path instead of the internal path? Or is that a separate issue?

joseph.olstad’s picture

@DieterHolvoet

Unless I misunderstood something, I believe that we do not want to use the alias path in the storage.

The alias can be changed if say someone regenerates aliases with a new pattern for example, or edits the alias manually. The alias can also be deleted with the node still in-tact. The node id does not change. and is a reliable value until the content is deleted. We've built a whole referential integrity system that relies on linkit with the safedelete module and it's far easier doing referential integrity with a reliable and simple value.

Rebuilding cache and loading the rendered page brings the current alias value ,

The end result is that people see the up-to-date alias as the intended functionality is that the alias is brought in from the alias when the page is rendered or cache rebuilt, the alias is then cached in the rendered output of the page.

Unless I misunderstood something I believe you should undo todays related commits in the MR.

dieterholvoet’s picture

No, I do not mean we need to store the alias in storage. I mean that in the Linkit field widget, the alias is usually displayed to the user instead of the internal path. After submitting the form it's converted to the internal path. The inconsistency is that now, when resolving an absolute url, the internal path is displayed to the user instead of the alias.

Screenshot

The changes in the MR are unrelated to this, they are related to my comment in #79.

joseph.olstad’s picture

@DieterHolvoet,
Your change also removed access_unpublished support for the hashkey processing and it also dissolves support for base path , so if your site is using a prefix path the absolute url won't be recognized using your merge request.

With that said, this issue is now several years old, any small improvement would be good, can always re-patch on top of whatever the maintainer decides to do.

Quite a lot of work went into this, you basically undid three different capabilities, two of which are actually pretty important.

joseph.olstad’s picture

var baseUrl = drupalSettings.base_url;
This base url is actually pretty useful and important to many implementations of Drupal. The whole section of logic handling this and the access_unpublished support was blown away in your MR.

Significant reduction in capability.

dieterholvoet’s picture

Your change also removed access_unpublished support for the hashkey processing

Any logic added in linkit.autocomplete.js has to be written separately for CKEditor 5, which isn't done yet. So either we also add support for that to the CKEditor 5 Linkit implementation, or we remove it here and deal with it in a follow up. Like you said, this issue has been open for years so it would be nice to be able to move this forward.

and it also dissolves support for base path , so if your site is using a prefix path the absolute url won't be recognized using your merge request.

By prefix path, do you mean language prefixes? Because those certainly still work, I tested it with the Linkit field widget and the CKEditor 5 implementation.

joseph.olstad’s picture

@DieterHolvoet

No, this is not refering to the language prefix although that is also included in the base url so deals with it. Multiple drupal sites can be run under the same domain or drupal sites could be under different domains.

if we want to support all of Drupal possibilities we should include the base_url as was done previously above before the knockout.

Example
https://example.localhost/multisite-a/node/1
if using language prefix would look like:
https://example.localhost/multisite-a/en/node/1
https://example.localhost/multisite-a/fr/node/1

Multisite-b
https://example.localhost/multisite-b/node/1
if using language prefix would look like:
https://example.localhost/multisite-b/en/node/1
https://example.localhost/multisite-b/fr/node/1

In your vhost you'd have an alias entry
in your base project you'd have a symlink to match the alias entry

Some do a lot more and make a unique vanity prefix per language, that is covered also by the base_url

Without the base_url logic in, the absolute url is not recognized and the utility of this solution is diminished.

For example, one of my clients has over 170 drupal instances using the same domain, each drupal site is on a different server and they're distinguished not by domains, but by the prefix it'self.

In addition they add a vanity prefix for more than just to support the english language so that each vhost has actually two alias entries because the prefix also has to be localised

We're writing enterprise solutions and by using base_url it ensures that the solution works for EVERYONE!

We're not just wanting this to work for your site that's managed by domains only, but every possible way to use drupal.

dieterholvoet’s picture

Well, the code you wrote is only being used for CKEditor 4 and it's deprecated, Drupal support to be removed by the end of 2023, so I don't think it makes sense to keep this in the MR. If you want that base_url logic it has to be rewritten in a way that's compatible with CKEditor 5 and the field widget. Maybe the find-replace can happen in a backend controller instead of in the frontend?

Anyway, if you insist on keeping that logic in the MR, I guess this issue will stay on Needs work until someone has time to re-implement it.

dieterholvoet’s picture

Did some more testing and all code in the if ($this->moduleHandler->moduleExists('language')) block isn't necessary either. You're converting an aliased path to an internal path, but that's not necessary: Url::fromUserInput() is able to parse aliased paths just fine. Tested using CKEditor 5 and the field widget.

dieterholvoet’s picture

Just came across a potential issue: the current state of this patch makes it impossible to link to a specific entity translation which isn't the current language. It's a niche feature, but something that shouldn't be made impossible IMO. Maybe it would make more sense that if an editor pastes an URL containing a language prefix, the field will link to the entity translation in that specific language instead of the current language?

joseph.olstad’s picture

Our distribution (wxt) is using patch #77. For my review of MR #35 see comment #87 and #85, same review applies to the head of the MR #35

bkosborne’s picture

I initially created this issue over 4 years ago. I think the current MR, while an improvement, does not go far enough to address the concerns I had when I created the issue:

  1. It only deals with the EntityMatcher. What if someone pastes in a link to a webform, file, a views page, or a path that's handled by a custom route?
  2. Even if every matcher was accounted for, someone can simply ignore the matcher results. In fact, I think most people that are pasting in an absolute URL will ignore the matcher results. If you have a URL you're pasting in, you already know the page you're linking to. You don't need a matcher to find it for you based on some partial page title or something you typed/pasted in. These people will just proceed without clicking any results from the autocomplete list

Again, I think the MR here is good and should be kept and merged in as it's a UX improvement, but I think we need a different approach here to really solve the problem. I don't want content editors pasting in absolute links to their own site, full stop. I help maintain a platform that hosts >1,000 sites, and those sites are often given temporary production URLs to work in until the site goes "live" with the real domain. We need a way to prevent those temporary domains from spreading into content.

With CKEditor 4, I solved the problem using this code:


/**
 * Implements hook_form_FORM_ID_alter().
 *
 * Alter the CKEditor link dialog.
 */
function MY_MODULE_form_editor_link_dialog_alter(&$form, FormStateInterface $formState) {
  $form['#validate'][] = '_editor_link_dialog_url_validate';
}

/**
 * Custom validate for linkit dialog.
 *
 * Remove base URL from a link.
 */
function _editor_link_dialog_url_validate(&$form, FormStateInterface $formState) {
  // Simply strip base URL if found in the link path to force the path to
  // be relative. Ideally we'd do more and check if the link is a path alias
  // and swap it with the system path, but it's complicated.
  // See https://www.drupal.org/project/linkit/issues/3078075.
  $attrs = $formState->getValue('attributes');
  if (isset($attrs['href'])) {
    $convertedLink = _change_url_to_relative($attrs['href']);
    if ($convertedLink !== $attrs['href']) {
      $attrs['href'] = $convertedLink;
      $formState->setValue('attributes', $attrs);
    }
  }
}

This worked completely outside of the LinkIt context. Unfortunately, this doesn't work with CKEditor 5, which doesn't use Drupal's form API for building the link widget interfaces. This feels like something that should be solved in the CKEditor 5 plugin directly, where it strips out the base URL when the link dialog is closed. For the field widget, we can likewise have a form validation callback that does it there.

Anyone have thoughts on that?

bkosborne’s picture

Actually, the more I think of it, the more I realize that my problem should indeed be solved like this:

  1. A custom CKEditor 5 plugin that detects when a link is inserted via the Link dialog (which the LinkIt module simply extends) and strips the base URL from there
  2. A modification to the LinkIt field widget to add the validation function to remove the base URL

The first point I realized after seeing that the LinkIt CKEditor 5 integration is just extending the main Link plugin from CKEditor.

joseph.olstad’s picture

@bkosborne patch #68 #77 cleans up the FQDN for an absolute if it matches the current host, this gives us matcher results afaik. You should try patch #68 #77 . The Merge Request is some rogue changes that were made by someone ignoring what I was talking about.

dieterholvoet’s picture

@joseph.olstad if you actually read my comments, you would know that those changes you did and I reverted only worked for CKEditor 4, not 5. So they won't help @bkosborne. I didn't ignore what you talked about, like I said I removed these CKEditor 4 related changes for now, until someone is able to re-implement them in a way that's compatible with both 4 and 5 (or just 5, because 4 is not even supported anymore by core).

joseph.olstad’s picture

ck4 is still being used with D10.2+ in large numbers , in fact I published this module fork during Drupalcon Pittsburgh June 2023 and it went from 0 installs to now over 7600 installs
https://www.drupal.org/project/ckeditor4_codemirror

But ya, ck5 obviously not ready yet which is probably why I was not responding to your comments.

mark_fullmer’s picture

ck4 is still being used with D10.2+ in large numbers

Weighing in as the maintainer of the Linkit module, I would prefer to see work for this initially to be scoped to CKEditor 5 coverage, and not dependent on CKEditor 4 coverage. I'm not going to put any roadblocks into continued support for CKEditor 4 from the community, but like to treat that as a "backport" of sorts.

mark_fullmer’s picture

Version: 6.0.x-dev » 6.1.x-dev
FileSize
314.63 KB
14.52 KB

We need a way to prevent those temporary domains from spreading into content. (from comment #92

The Pathologic module supports this, albeit at the page rendering stage. I am inferring that the ask here is that the base URL of the existing site be stripped from the data saved to the database, right? If so, I think the big question is *when* this should be triggered. On keyup (i.e., immediately)? On click (i.e., when the user moves to save the link)?

The attached patch and screencast demonstrate the most aggressive approach -- on keyup (this GIF demonstrates that the keyup will trigger both on manual keyboard entry and copy-paste from clipboard):

screencast of automatic url removal

If we were to go this route...
1. I think it would need to be an opt-in option, since some sites, such as decoupled architecture, might not want this behavior.
2. Maybe the baseUrl needs to be configurable, rather than automatically detected, along the lines of how Pathologic provides multiple baseUrl matchers?
3. For the Linkit *suggestion* to work in conjunction with this without requiring a user selecting the matching autocomplete option, #3222939: Do not require selection of autocomplete if only one match (pressing 'Submit' is sufficient) would need to land.

joseph.olstad’s picture

@mark_fullmer there's a reason why drupalSettings.base_url is used. A lot of work went into this to ensure that it works with all the ways Drupal can be installed, configured and used, including with vhosts based on a path prefix.

I've commented many times above explaining the various use cases.

mark_fullmer’s picture

there's a reason why drupalSettings.base_url is used

Thanks for this! I'm on board with using drupalSettings.base_url, and had seen this discussed in the comments. The patch in #98 ("If we were to go this route...") is only meant to be a proof of concept for the alternate implementation proposed by bkosbone (part 1, #93).

If we end up going that route, there are a number of things that will need to be modified about #98.

mark_fullmer’s picture

Version: 6.1.x-dev » 7.x-dev
matthiaso’s picture

This patch (tested with 3078075-77) converts all characters to lowercase, which breaks some links, e.g., Google Drive documents.

Lowercase convert

joseph.olstad’s picture

@MatthiasO here's a patch for the patch you can try.

looks like to increase hits by using case insensitive we need to check first if the path is matching the current site.

This patch disables the insensitive optimisation completely. It should be improved but in a pinch should handle drive.google.com links by disabling the insensitive optimisation completely.

So, this patch should be applied AFTER patch 77 is applied.

so apply patch 77 then this patch.

dieterholvoet’s picture

I rebased the MR against the 7.x branch.

I also looked into the lowercasing of paths. I get the intention, and correct me if I'm wrong, but I don't think it's necessary to do it manually. Linkit's entity matcher uses an entity query to search for entities and both entity queries and database queries are case insensitive by default in Drupal (source). I removed the code related to lowercasing incoming paths from the MR.

dieterholvoet’s picture

Issue summary: View changes
dieterholvoet’s picture

I added a 'Remaining tasks' section to the issue summary in an attempt to create an overview what still needs to be done here. I think the biggest thing would be to create a (configurable) list of base urls and use that both in backend and in frontend to strip entered urls, instead of only using the current base url (global $base_url, which is the same as \Drupal::request()->getSchemeAndHttpHost()).

stewest’s picture

The change in #104 https://www.drupal.org/project/linkit/issues/3078075#comment-15665863 which was undoing the removal of the lowercase function means that urls like youtube playlists when created as an external link (not media) will have their url lowercased which break the external link.

I'm not sure if this should be another issue created specifically for this, but that code change here, would affect this outcome.

So instead of
https://www.youtube.com/playlist?list=PLpeDXSh4nHjThszrEoBfJ8sY2NYif0k6b
we end up with
https://www.youtube.com/playlist?list=plpedxsh4nhjthszreobfj8sy2nyif0k6b

dieterholvoet’s picture

You're right, I think adding back mb_strtolower was accidental. I'll remove it again.