Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
unicode.inc contains global functions that cannot be lazy-loaded. Convert these to a unicode utility class.
Followups:
Comment | File | Size | Author |
---|---|---|---|
#50 | drupal-unicode_component-1938670-50.patch | 72.73 KB | ParisLiakos |
#50 | interdiff.txt | 1.34 KB | ParisLiakos |
#49 | drupal-unicode_component-1938670-49.patch | 72.73 KB | ParisLiakos |
#49 | interdiff.txt | 6.38 KB | ParisLiakos |
#45 | 1938670-45.patch | 70.89 KB | ParisLiakos |
Comments
Comment #1
XanoComment #2
XanoComment #4
XanoRe-roll.
Comment #5
XanoAdded a missing file deletion.
Comment #6
XanoThere is also a file ./core/includes/unicode.entities.inc, but I couldn't find any usage of it.
Comment #8
XanoThis patch fixes all issues, but it requires #1938228: Convert TypedDataTest to make use of DrupalUnitTestBase to be committed first.
Comment #9
pp CreditAttribution: pp commentedThe patch doesn't work. I re-rolled it.
And what about core/includes/unicode.entities.inc? Is it necessary?
Comment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedGiven that this is a strategy pattern to begin with, we should split the different implementations to different classes and register the proper one in the container.
Comment #11
pp CreditAttribution: pp commented@Damien Tournoud You are right but this issue only resolves "unicode functions will be a class" nothing else. I think "X module/class will use "lazy loaded" Unicode class" is a different issue. I think your suggestion is too big step (or I don't understand you, please clarify the issue, e.g. give me a link which helps me to understand).
Comment #12
Xano@amateescu disagreed that this should be registered as a service.
@msonnabaum was working on a String class. We still have to decide whether to move string manipulation functions there or keep them in Unicode, but I vote for moving them to String.
Comment #13
RobLoachCross-posting that issue: #1938972: Start moving string functions into a utility class
Comment #14
Xano#9: drupal_1938670_04.patch queued for re-testing.
Comment #16
ParisLiakos CreditAttribution: ParisLiakos commentedthis patch is unreviewable..i think we should keep procedural functions as component wrappers initially..and mark them deprecated
Also, unicode_requirements should move to system.install (not here of course, after the conversion)
Comment #17
ParisLiakos CreditAttribution: ParisLiakos commentedi am gonna give this a shot
Comment #18
XanoI don't agree. It's fairly trivial to convert all usages and it's easy to review (we did the same with #1969540: Convert token.inc to a service.
Agreed. That's what I realized when writing the patch as well.
Thanks! I don't really have time to help with this anytime soon anyway.
Comment #19
ParisLiakos CreditAttribution: ParisLiakos commented@Xano: The token patch was 3 times smaller than the one above:) Which also mean it will break more patches
Ok, so here it is:
Ah. also...
_mimeHeaderDecode
method needs a name:PComment #21
RobLoachVery well done! I enjoy this patch a lot. The following are a couple notes I had. Some quick fixes, some questions, mostly praise ;-) ...
Is the ClassLoader not available at this point? We couldn't move it to after the loader is initialized, could we?
Could we also add @deprecated here? @deprecated as of Drupal 8.0. Use Drupal\Component\Utility\Unicode::truncateBytes() instead? Something along those lines.
Was working on this patch a bit last week and had second thoughts on it due to the global use of $multibyte.
In this instance, global $multibyte could be removed completely, but in other functions where it's actually used, it feels rather messy. Should String become an actual object with a multibyte getter/setter? Really unsure about that. This might just be an interim solution... We'll have to think about that. Perhaps push it to a follow up....
Unicode::$status... Should we switch that to Unicode::getStatus() and Unicode::setStatus()?... Global and static states on classes seem quite messy. Maybe the $status becomes a parameter that's passed in with a default state? Maybe we just leave it like this for now.
The problem with this is that during testing, the $status would stay the same state between tests, unless we manually wipe it, which is dirty. We'll have to figure something out.
Definitely like the move from global
define()
functions to a constant.This is what I was talking about... I understand the reason for it, but it might be better solved with Polymorphism. We could figure that out in a follow up issue, I suppose.
Although this handles unicode.inc, what are your thoughts on unicode.entities.inc? Should that be done in a follow up issue?
Comment #22
ParisLiakos CreditAttribution: ParisLiakos commentedhi:) thanks!
Yeah unfortunately its not available..i ll try to see if this can be moved after loader init.
Agreed about the deprecated note, will be in next patch.
The multibyte global is now a Unicode property ($status) :) i think i forgot to remove this line.
+1 agreed, did not think it much, i just stuck as public temporarily to see if ti works.
Already opened: #1981190: Remove unicode.entities.inc
Comment #23
ParisLiakos CreditAttribution: ParisLiakos commentedI think this is a lot better now:) also installation is fixed
Comment #24
ParisLiakos CreditAttribution: ParisLiakos commentedAlso the unicode DUTB should be converted to PHPunit and new ones should be added for new methos in the String class
Comment #25
ParisLiakos CreditAttribution: ParisLiakos commenteddecodeEntities()
toString
fromUnicode
, since it is more or less,checkPlain
's reversei messed up with interdiff..sorry
Comment #26
RobLoachTests pass. Looks great!
Nicely done.
Love the use of @dataProvider here. Much nicer to read that the old tests.
Comment #27
RobLoachSwitched to
setExpectedException
:Comment #28
ParisLiakos CreditAttribution: ParisLiakos commentedwoot had no idea you can do this..thanks!!
Comment #29
alexpottI get a local failure... with one of the singlebyte lowercase tests...
My PHP version is 5.3.15...
And actually the above output reminds me...
In PHPUnit assertions the expected value comes first...
Comment #30
ParisLiakos CreditAttribution: ParisLiakos commentedhmm this is weird, but since simpletest does not catch it, my guess is an enviroment specific php bug, since it happens on single byte mode.
Unfortunately i dont have any available 5.3 installation to test it:(
I fixed PHPUnit assertions! i did not know that. i guess i should pay more attention to docs:) thanks!
Comment #31
alexpottWell something has changed in the tests as the old Drupal\system\Tests\System\UnicodeUnitTest passes for me...
Comment #32
RobLoachPass on PHP 5.4 as well...
Comment #33
alexpottWell I'm still getting local failures and I recently fixed the phpunit timer test failing on windows machines... It's important we don't have these as this prevents people from using phpunit properly.
Comment #34
alexpottWell that was fun tracking down... we need to setlocale in Drupal's bootstrap.php for PHPUnit. So we have a known environment... as the comment in drupal_environment_initialize() says:
+1 for sanity :)
Comment #35
ParisLiakos CreditAttribution: ParisLiakos commentedgasp! thank you for tracking this down:D
Comment #36
ParisLiakos CreditAttribution: ParisLiakos commentedSo, alexpott asked me to do some profiling aroung String::lowercase because of caseFlip
And indeed he was right.
Singlebyte
8.x HEAD:
Total Incl. MemUse (bytes): 8,480 bytes
Total Incl. PeakMemUse (bytes): 2,536 bytes
Number of Function Calls: 8
patch #34:
Total Incl. MemUse (bytes): 11,072 bytes
Total Incl. PeakMemUse (bytes): 3,392 bytes
Number of Function Calls: 10
patch below:
Total Incl. MemUse (bytes): 9,880 bytes
Total Incl. PeakMemUse (bytes): 2,720 bytes
Number of Function Calls: 9
Multibyte
8.x HEAD:
Total Incl. MemUse (bytes): 4,160 bytes
Total Incl. PeakMemUse (bytes): 1,064 bytes
Number of Function Calls: 4
patch #34 and patch below (same code):
Total Incl. MemUse (bytes): 5,392 bytes
Total Incl. PeakMemUse (bytes): 1,448 bytes
Number of Function Calls: 5
General Note:
Here we are slower also because of the additional Unicode::getStatus() call..before we just had a global. Also generally multibyte is a lot faster (and most systems will use multibyte) because we rely on extensions
Comment #37
alexpottI think we should follow PHP here... and call this String::substr()
Comment #38
ParisLiakos CreditAttribution: ParisLiakos commentedlets hear other opinions, i would really hate String::substr
msonnabaum proposed String::slice in IRC, which i find really good
Comment #39
alexpottAfter polling on IRC there was a strong feeling from @webchick, @timplunkett, @larowlan than we not use
String::semantic_func_name()
... ieString::length()
but convertdrupal_strlen
toString::strlen()
...to quote @webchick:
...and now we've agreed to put these function in Unicode so
drupal_strlen
becomesUnicode:strlen()
... and also agreed to remove the @deprecated from the procedural functions
Comment #40
alexpottRemoving tag added back unnecessarily...
Comment #41
ParisLiakos CreditAttribution: ParisLiakos commentedok here is the patch with what we decided on IRC
at least it is visible now on why we wrap standard php functions..for unicode support...Unicode::
sorry interdiff does not make any sense here
Comment #42
RobLoachShould be strLen to go by lowerCamel function naming.
strToUpper()
strToLower
ucFirst
subStr()
Comment #43
ParisLiakos CreditAttribution: ParisLiakos commentedwell, i specifically avoided that, (no matter how much i agree and as much as i liked original String::length), after a long irc conversation yesterday night, where the bottom line was for php function wrappers to be
Class::<php_func_name>
, egUnicode::strlen
because that is the PHP name and makes more sense to other PHP devs.lets move on with #41 to avoid any more bikeshed here and make possible for more PHPunit tests to come:)
see also #39
Paris
Comment #44
RobLoachDeal. Here it is with this docfix:
Put a follow up to discuss the function names at #1985104: Clean up Unicode function names.
Comment #45
ParisLiakos CreditAttribution: ParisLiakos commentedgreat, thanks! added it as followup.
also found out a couple more leftovers:)
Comment #46
RobLoachAny thoughts on:
It feels rather weird to have too much stuff in bootstrap.php. If we were separate the Unicode component from bootstrap.php, tests would fail. Keeping setlocale() in setUp() ensures the setlocale() function is still called since it would be in check().
Comment #47
alexpottre #46 the setlocale affects far more than just the functions used in the Unicode... I think if we want to move it we should explore this in a followup. And we need to move it from drupal_environment_initialize() as well.
Comment #48
dawehnerI agree with alex and the comment, that this is needed in other places then unicode handling.
Nitpick-alarm: Missing @return
Just in general I prefer to use static instead of self, as you might extend the class ... you never know.
Missing dots.
I really like all this unit tests. These methods needs @return documentation.
Comment #49
ParisLiakos CreditAttribution: ParisLiakos commentedthanks for the review!
Comment #50
ParisLiakos CreditAttribution: ParisLiakos commentedmeh, fixing one more issue with docs
Comment #51
RobLoachComment #52
RobLoach#50: drupal-unicode_component-1938670-50.patch queued for re-testing.
Comment #53
alexpottCommitted 686c269 and pushed to 8.x. Thanks!
Comment #54
ParisLiakos CreditAttribution: ParisLiakos commentedChange notice: http://drupal.org/node/1992584
Comment #56
tim.plunkettTagging for posterity
Comment #56.0
tim.plunkettadded naming followup
Comment #57
jhedstrom