Problem/Motivation

Longer than the allowed 255 chars may be attempted to be inserted as translatable string location, throwing PDO errors.

Proposed resolution

Truncate the string inserted. There are only two significant types of location:

  • JS file names, which are used to look up strings from JS files for translation JS file generation
  • config file names, which are used to look up translations for strings appearing in config files for config file translation generation

If a JS file name or config file name exceeds 255 chars, than we'll be unable to associate the strings with the file. A 255+ long JS or config file name is not realistic and would not warrant a field length expansion.

We also store URLs on where we found strings as we go as locations, that is informational and is not used for back-referencing (in core). l10n_client in contrib uses it for back-referencing on a *best effort basis* to include strings on its panel that are not displayed currently on the page but used to be. I don't think we should bend the core schema for that best effort use case.

Original report by @eule

Hello,
i´m not a coder just a tester!
this time i try toconfig a custom_block

i setup a new one called "Handytarife Block Typ" and add a Picture Field called "ein test" Maschine Name is field_ein_test after all i try to save it ...this brings me the error up

Fehlermeldung
Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'name' at row 1: INSERT INTO {locales_location} (sid, type, name, version) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array ( [:db_insert_placeholder_0] => 7267 [:db_insert_placeholder_1] => path [:db_insert_placeholder_2] => /admin/structure/custom-blocks/manage/handytarife_block_typ/fields/custom_block.handytarife_block_typ.field_ein_test/field?destinations[0]=admin/structure/custom-blocks/manage/handytarife_block_typ/fields/custom_block.handytarife_block_typ.field_ein_test&destinations[1]=admin/structure/custom-blocks/manage/handytarife_block_typ/fields [:db_insert_placeholder_3] => 8.0-dev ) in Drupal\locale\StringDatabaseStorage->updateLocation() (Zeile 158 von ./core/modules/locale/lib/Drupal/locale/StringDatabaseStorage.php).
The website has encountered an error. Please try again later.

Hope it helps..if it is known..dupe it

Comments

larowlan’s picture

Component: block.module » locale.module

That's a locale table

berdir’s picture

Status: Active » Needs review
Issue tags: +Needs tests, +Needs backport to D7, +D8MI
StatusFileSize
new786 bytes

Nice find! The problem is that the URL is insanely long on those pages, too long for locale.module, which tracks on which page a translation was first requested.

Note: To be able to reproduce this, you need to install locale, add a second language and add the field (any entity type/bundle will probably work). You need to be visiting that page the first time in that language.

This should fix it, needs tests.

The issue probably exists in 7.x too, although such long URL's are less likely there.

berdir’s picture

Title: Data too long for column custom_block » Very long URL (>255) leads to PDO exception when translation source locations are updated
Status: Needs review » Needs work

Setting to needs work for tests. I think we can test this directly, by passing in data to that function that contains a too long name.

Also updating issue title.

berdir’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
StatusFileSize
new1.7 KB
new2.59 KB

Ok, here are tests.

A bit confused, it looks like we don't have an upgrade path for the locales_location table?

And even more confused by the mess with lid and sid in the locales_location and locales_source tables..

berdir’s picture

Yeah, so confused that I forgot that I wasn't finished and created and uploaded the patches already..

Status: Needs review » Needs work

The last submitted patch, locale-location-name-2046097-5-test-only.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review

Seems like the test-only patch first trick seems to fail more frequently?

gábor hojtsy’s picture

@Berdir: the upgrade path for locale locations as with other locale related things that are required to bootstrap Drupal is in https://api.drupal.org/api/drupal/core%21includes%21update.inc/function/.... There is no data migration for the location data as that is not possible. The new data is much more detailed. Drupal 8 sites will pick up that data as they go + as new translations are imported.

As for the lid confusion, that is true, the location table has lid for location id, while the source/target tables have lid for locale string id (which is sid in the location table). At the time locations were introduced we planned to clean up the schema (eg. the table names are horrendous :) and clean up the column names along the way. We also planned to rename the module to interface translation. I'm not sure such cleanups fit into the current phase even though they are DX enhancements, they are also backwards compatibility breaks for things that were already in Drupal 7.

berdir’s picture

upgrade path: Oh, missed that, ignore that then, sorry :)

Yeah, just noticed the sid/lid thing because I spent 30min debugging because I filtered/misread that test query incorrectly and that's why I mentioned it. It might be possible to still change it with the argument that it's now a pluggable service with a proper API and anyone still querying tables directly is doing it wrong ;) But it has nothing to do with this issue.

Any feedback about the issue/patch? This is complete from my point of view, except we want to limit other columns as well, but they seem less likely to break.

gábor hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +sprint, +language-ui

Looks good to me!

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Sorry, what's the impact of blindly truncating this data with no visible error/warning? I can't make it out from the issue summary.

webchick’s picture

Status: Needs review » Needs work

Also...

+++ b/core/modules/locale/lib/Drupal/locale/Tests/LocaleStringTest.php
@@ -105,6 +105,18 @@ function testStringCRUDAPI() {
+    // Test various locations

(minor) Needs a period.

(normal) Why? What exactly are you testing for here?

berdir’s picture

There's no point in a visible warning, as there's nothing anyone could do about it. There is only one alternative to blindly cutting it of, and that would be to increase the length of that column, not sure if that's a good idea.

We have similar patterns elsewhere, we truncate_utf8() aggregator feed titles and url's for example an we substr() various things in dblog_watchdog(). It's either that or PDO exception.

This used to be displayed in the UI, but I don't think we still do that, can't find any references. No idea what contrib does with it.

gábor hojtsy’s picture

Yeah, the reason we need this code is because earlier default MySQL setups blindly truncated without warning. We are just reproducing that behaviour. I don't think we can do much better.

webchick’s picture

Hm. I don't remember us truncating URLs. Titles, yes, because you can find the full title at the source.

A truncated URL is a broken URL. What are these URLs used for?

berdir’s picture

dblog_watchdog does this:

'link' => substr($log_entry['link'], 0, 255),
   * @param string $type
   *   Location type that may be any arbitrary string. Types used in Drupal
   *   core are: 'javascript', 'path', 'code', 'configuration'.
   * @param string $name
   *   Location name. Drupal path in case of online discovered translations,
   *   file path in case of imported strings, configuration name for strings
   *   that come from configuration, etc...

Nothing can assume it is a valid URL, there are many other possible options.

As Gabor said, we're using the MySQL strict mode now (already do in 7.x), without that, MySQL truncates automatically, everything. Now we have to.

webchick’s picture

Right, I'm familiar with that tactic being used elsewhere, but it was correct in that context. I don't have the context for what we do with this variable in order to understand the impact of truncating it. That's all I'm asking for. What uses this? Do we ever make links to this? Is it just a disambiguation from one "foo" and another "foo"? etc.

Tagging.

gábor hojtsy’s picture

There are only two significant types of location:

- JS file names, which are used to look up strings from JS files for translation JS file generation
- config file names, which are used to look up translations for strings appearing in config files for config file translation generation

If a JS file name or config file name exceeds 255 chars, than we'll be unable to associate the strings with the file. I don't think a 255+ long JS or config file name is realistic and would warrant a field length expansion.

We also store URLs on where we found strings as we go as locations, that is informational and is not used for back-referencing (in core). l10n_client in contrib uses it for back-referencing on a *best effort basis* to include strings on its panel that are not displayed currently on the page but used to be. I don't think we should bend the core schema for that best effort use case.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new2.06 KB
new2.52 KB
new1.75 KB

Ok, updated the test to avoid relying on autoincrement, improved the comment, better testing of the types. I still think this is OK like this, as pointed out in #16, is *is* common practice to cut off links.

@Gabor: that l10n_client use case sounds interesting, wouldn't that be more useful if we would log the path without GET arguments? Then the chance of running into this would be much smaller and l10n_client might have more useful suggestions (e.g. show translatable strings that only show up with ?page=1 and so on. Separate issue of course.

gábor hojtsy’s picture

Yeah, we can do that, but this issue is not about that :D

gábor hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs issue summary update

Updated issue summary.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, locale-location-name-2046097-19.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new1.08 KB
new2.52 KB

Stupid mistake in the test, forgot that I actually had a long string on configuration, switched that now to reflect the actual case here.

gábor hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Better with the updated issue summary? :)

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Awesome, thanks.

Committed and pushed to 8.x!

Moving to 7.x.

berdir’s picture

Version: 7.x-dev » 8.x-dev
Status: Patch (to be ported) » Fixed
Issue tags: -Needs backport to D7

Checked 7.x, location is a column in the locales source table, and it is a text of size big, which means you can write whole books into it. Doesn't need to be fixed then, there.

webchick’s picture

Status: Fixed » Needs review

Er. Then how/why did this get changed in D8?? Seems like we should roll this back in favour of doing the same thing.

berdir’s picture

Assigned: Unassigned » gábor hojtsy

The location is now a completely separate table, that allows different types of locations. I think location was mis-used by i18n before, so maybe that's why it had to be longer. I like the new schema better, I don't see a reason that it has to be a long text.

Gabor?

gábor hojtsy’s picture

It is not possible to upgrade from the old location system to the new location system (the new one has different types of location information while the old one did not have that), so D8 will start collecting locations fresh. Therefore we can define our new rules the way we see it would be better. I can cite at least two reasons form the top of my head:

- I think for the JS and config lookups, using a shorter text field will result in quicker SQL resultsets, but I cannot prove that ATM.
- The path based location information was/is always collected on a best effort basis, it is not guaranteed to be stored 100% accurate.

webchick’s picture

How much in the critical path is that quicker SQL? If it's not (for example, only compiled at the occasional "find new strings to be translated" time or whatever), choosing a storage method that doesn't truncate data seems preferable to me. Since this is new, we have an opportunity to capture this in full which won't be possible again if 8.0 ships with truncation. Since we are not sure how contrib might want to use this data, seems better to be safe than sorry unless the speed difference is significant.

gábor hojtsy’s picture

As said, the location is used for config keys and JS file names in core (as lookup). Paths are stored but not looked up. Config keys are only added for shipped config, when a language is added or a module is installed, the new strings are added. JS files are discovered on the fly, so once a site has all JS files discovered and no JS changes are happening, there are theoretically no lookups to the location at all on the site (until a module is enabled or a JS file is updated). I don't have numbers and don't think this is a critical area of focus in Drupal 8 to spend time on producing numbers on performance. I'm happy if those interested do tests :)

gábor hojtsy’s picture

Status: Needs review » Fixed

Sounds like that would be a different issue, so closing.

gábor hojtsy’s picture

Issue tags: -sprint

Yay thanks!

eule’s picture

look at my issue ...some new stuff ;-)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated with template