The changes in #3136510: Incorrect notice language specific field types missing broke the Portuguese language files again. They had been fixed previously in #3091574: Dynamic solr field type check encodes special characters.

So, we need to replace dashes with underscores in the case of zh-hans and zh-hant, but not in the case of pt-pt and pt-br. Not sure what the best way to solve this is. Maybe change the config like was done for Portuguese in #3087744: Update the Portuguese config to use the langcodes used by Drupal, or to change the line now committed from

$solr_field_type_name = 'text' . '_' . str_replace('-', '_', $language->getId());

to

$solr_field_type_name = 'text' . '_' . str_replace('zh-han', 'zh_han', $language->getId());
CommentFileSizeAuthor
#14 3154019_complete.patch167.31 KBmkalkbrenner
#13 3154019.patch58.58 KBmkalkbrenner
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jcnventura created an issue. See original summary.

mkalkbrenner’s picture

I'm not sure about this.
Our tests use "de-at" which will be converted to "de_at"

See https://lucene.apache.org/solr/guide/8_5/defining-fields.html

Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed.

So it is correct that we convert all language identifiers. One issue is that the comment is misleading and should not just mention "zh-*" as example.

So if there's an issue with "pt-*" the bug must be something else.

jcnventura’s picture

jcnventura’s picture

Running grep -R "name: text_" . | grep -v unstemmed | grep -v phonetic | grep -v spell | grep -v "text_..$" on the config/optional directory results in the following lines that are the only ones affected by this 'bug':

./search_api_solr.solr_field_type.text_pt-pt_7_0_0.yml:  name: text_pt-pt
./search_api_solr.solr_field_type.text_pt-br_7_0_0.yml:  name: text_pt-br
./search_api_solr.solr_field_type.text_zh-hant_7_0_0.yml:  name: text_zh_hant
./search_api_solr.solr_field_type.text_zh-hans_7_0_0.yml:  name: text_zh_hans
jcnventura’s picture

Issue summary: View changes
mkalkbrenner’s picture

The config names are correct!
The conversion has to happen for the resulting field names in the Solr index.

Can you post your schema_extra_fields.xml?

jcnventura’s picture

grep text_ schema_extra_types.xml:

<fieldType name="text_edgenstring" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_edge" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_unstemmed_en" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_phonetic_und" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_phonetic_en" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_und" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_spell_und" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_ngramstring" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_pt-pt" class="solr.TextField" positionIncrementGap="100">
<fieldType name="text_unstemmed_pt-pt" class="solr.TextField" positionIncrementGap="100">
jcnventura’s picture

And this is grepping the output of drush --pipe solr-gsc localhost > lala.zip && unzip -p lala.zip | grep "text_pt"

<fieldType name="text_pt-pt" class="solr.TextField" positionIncrementGap="100">
<dynamicField name="ts_X3b_pt_X2d_pt_*" type="text_pt-pt" stored="true" indexed="true" multiValued="false" termVectors="true" omitNorms="false" />
<dynamicField name="tm_X3b_pt_X2d_pt_*" type="text_pt-pt" stored="true" indexed="true" multiValued="true" termVectors="true" omitNorms="false" />
<dynamicField name="tos_X3b_pt_X2d_pt_*" type="text_pt-pt" stored="true" indexed="true" multiValued="false" termVectors="true" omitNorms="true" />
<dynamicField name="tom_X3b_pt_X2d_pt_*" type="text_pt-pt" stored="true" indexed="true" multiValued="true" termVectors="true" omitNorms="true" />
      <str name="suggestAnalyzerFieldType">text_pt-pt</str>
jcnventura’s picture

If the config names are correct, we need the code fix in the issue summary, as we can't have 'text_pt-pt' be replaced with 'text_pt_pt' in that case.

But if "Field names should consist of alphanumeric or underscore characters only", then those config names can't be right.

mkalkbrenner’s picture

No, the field names are correctly encoded. Everything you posted here of the the XML files is correct.

mkalkbrenner’s picture

Priority: Normal » Minor

I think I got it. Working on a patch and a test ...

mkalkbrenner’s picture

Priority: Minor » Normal

First of all it isn't a critical issue as search basically works.
But while reading the code, I noticed that a lot of edge cases are involved that could all lead to have that warning persisting. You'll see it in the patch, hopefully tomorrow.

mkalkbrenner’s picture

Priority: Normal » Major
Status: Active » Needs review
FileSize
58.58 KB

It turned out that a bigger patch is required. There might be different reasons why the warning persists. I hope that I captured all of them.

The patch is not final. The status report page still requires more testing and the update hook needs to take care about the new "pt" configs.
I think we should simply entirely overwrite the exists ones instead of updating them. What do you think?

mkalkbrenner’s picture

FileSize
167.31 KB

I completed the update hook and updated the jump-start config-sets.

@jcnventura could you test the patch?

  • mkalkbrenner committed 260f6c4 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed 371a0c6 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed 3fb2cba on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed 561cbc4 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed 6aa8241 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed 78f904c on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed bc986c4 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed cd0eae1 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
  • mkalkbrenner committed cfdb060 on 4.x
    Issue #3154019 by jcnventura, mkalkbrenner: There are some language-...
mkalkbrenner’s picture

Status: Needs review » Fixed

I mark it as fixed. But feedback is welcome!

jcnventura’s picture

Yes, works perfectly fine. Thanks for fixing this. Even while it started looking like a minor issue, it clearly led to some major refactoring!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.