Performance: Improve multiple item processing in big taxonomy hierarchy setups. [#544324]

Comment	File	Size	Author
#75	hierarchical_select-high_performance_fix-544324-75.patch	3.87 KB	gigabates

#71	hs_high_performance_544324.patch	3.56 KB	hibersh

#59	hs_high_performance_4.patch	14.16 KB	chrisolof

#29	hs_high_performance_3.patch	14.16 KB	crea

#18	hs_high_performance.patch	10.8 KB	crea

#23	hs_high_performance_2.patch	10.8 KB	crea

#4	hierarchical_select-544324-4.patch	1.06 KB	cpliakas

#3	taxonomy_csv-6.x-4.1testing4-1.5mb-revision-1.png	107.93 KB	Francewhoa

Comment #1

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 10 August 2009 at 15:43

Component:	Code - Content Taxonomy	» Code - Taxonomy
Status:	Active	» Postponed

This is because I use use the Taxonomy API, which is unscalable. This can be circumvented by creating our own queries.

Will be for HS 4 probably. Patches are welcome, of course. If you provide a patch, I'll commit it as soon as it's ready instead of waiting for version 4.

Log in or register to post comments

Comment #2

Francewhoa

he/him

French

Sept-Îles, Québec, 🇨🇦

CreditAttribution: Francewhoa commented 10 August 2009 at 19:44

@Wim: Thanks I understand now. I'm not a coder but will be happy to contribute patch testing.

@All: Any volunteers for creating a patch with custom queries to workaround the unscalable Taxonomy API?

Log in or register to post comments

Comment #3

Francewhoa

he/him

French

Sept-Îles, Québec, 🇨🇦

CreditAttribution: Francewhoa commented 12 August 2009 at 21:28

File	Size
taxonomy_csv-6.x-4.1testing4-1.5mb-revision-1.png	107.93 KB

@all: If someone else wants to contribute memory usage testing here are 2 large taxonomy files. And a small one for quick test. To download right click on below link then select Save Link As option.
href="http://ccsbcy.ca/_drupal_org_forums/large-taxonomy/vocabulary-large-utf8..."
target="_blank">vocabulary-large-utf8-1.5mb.csv
href="http://ccsbcy.ca/_drupal_org_forums/large-taxonomy/vocabulary-large-utf8..."
target="_blank">vocabulary-large-utf8-20mb.csv
href="http://ccsbcy.ca/_drupal_org_forums/large-taxonomy/vocabulary-small-utf8..."
target="_blank">vocabulary-small-utf8-2kb.csv

File format is UTF-8
3 levels deep

The easiest way that I know of to import large taxonomy files is using the batch feature of 'taxonomy_csv' module version 4.1 or more recent at http://drupal.org/project/taxonomy_csv
Find attached screenshot for module settings.

Note that those links aren't permanent. So if they don't work that's normal.

Log in or register to post comments

Comment #4

cpliakas CreditAttribution: cpliakas commented 15 March 2010 at 17:20

Status:

Postponed

» Needs review

File	Size
hierarchical_select-544324-4.patch	1.06 KB

Hi Wim.

Thanks for a great module. I have a taxonomy hierarchy of about 150,000 nodes, which obviously blows up the core Taxonomy API. Specifically, it chokes after line 831 of the modules/taxonomy/taxonomy.module file because that SQL query loads the entire tree. The Hierarchical Select Taxonomy module actually has the same query on line 691 of the hs_taxonomy.module file. The attached patch adds a condition to this query that only loads terms with the given $parent, the result being that only the data required for the level of the hierarchy you are on is loaded. Believe it or not, my taxonomy tree now scales thanks to your module, and things are very snappy and memory efficient. Although this patch works perfectly in my instance, I haven't tested multiple use cases and and not sure what the wider implications of the code change are. I would love to get some more feedback from the community.

Thanks,
Chris

Log in or register to post comments

Comment #5

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 16 March 2010 at 07:05

hi Chris,

Thank you for your hard work. I am willing to test this, but my node numbers are rather small compare to your 150k. Also, my taxonomy tree has only 7-8 parents with 3-4 levels of children.

My question is how would I know the difference after applying your patch? (given the fact that I have only a few nodes and small taxonomy)

Anyways, I am applying your patch now (manually against latest dev that has the related code in line 741) and will let you know how it went.

BTW... patrida?

Log in or register to post comments

Comment #6

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 16 March 2010 at 07:16

I did test it (that was fast wasn't it?), but there is no way to know of any memory or speed improvements. Any test I can run?

I find the logic behind it legit and believe that it should be done like that in the first place, but I cannot comment on the code (Wim?). Good news is the patch didn't cause any mess ;)

Thank you once again.

Log in or register to post comments

Comment #7

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 16 March 2010 at 13:06

Status:

Needs review

» Needs work

HAHA! Brilliant! :)

There's one problem with this approach: it will fail when there are multiple Hierarchical Select instances on the same page that use the same vocabulary …

Unfortunately, that's a big enough negative side to not be able to commit this otherwise very elegant patch :(

Log in or register to post comments

Comment #8

cpliakas CreditAttribution: cpliakas commented 16 March 2010 at 13:49

Ah-ha! There is always a catch, and great pick up on your part :-). For some ungodly reason I seem to end up on projects that have large vocabularies, so I will probably have to take another crack at this one sometime soon, possibly even for the current project I am working on. My feeling is that there has to be some workable solution to this problem.

@klonos - Check out the Devel module. After enabling, visit admin/settings/devel and select the Display page timer and Display memory usage checkboxes. All users with the access devel information permission will see the performance metrics at the bottom of every page. Although it isn't perfect, it's a very easy way to get a basic overview of a page load's efficiency.

Thanks for everyone's feedback and testing,
Chris

Log in or register to post comments

Comment #9

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 16 March 2010 at 14:11

That's AWESOME, cpliakas! Looking forward to your next stab at this issue :)

Log in or register to post comments

Comment #10

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 18 March 2010 at 20:44

Just a note on Wim's catch on #7 it doesn't only fail when there are multiple HS instances on the same page that use the same vocabulary...

@Wim: I don't know exactly what you mean by 'it will fail'. I mean the actual result an end user will see, but this is what I've had...

I have a content type set that uses two instances of two different vocabularies. One of them is a simple one-level drop down selector, while the other has 3+ levels. They both show as usual in the form, but the multilevel one shows only 1st level terms and none of their children.

Removing Chris's one-liner restored missing child items from the multi level HS drop down.

Is that what you meant or is this another side effect?

Log in or register to post comments

Comment #11

3dloco CreditAttribution: 3dloco commented 26 March 2010 at 07:22

Issue tags:

+taxonomy, +memory, +scalability, +Hierarchical Select

+1

Log in or register to post comments

Comment #12

cpliakas CreditAttribution: cpliakas commented 26 March 2010 at 14:19

Just an update, I moved on from my previous employer and do not need this functionality for any active projects I am working on, but I still think it is important and still plan on attacking this in my free time. It just may not be as quickly as I would like since my new job is keeping me quite busy :-).

Log in or register to post comments

Comment #13

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 26 March 2010 at 16:08

Okay, thanks for the status update cpliakas, good look at your new job and thanks in advance for your efforts! :)

Log in or register to post comments

Comment #14

crea CreditAttribution: crea commented 6 April 2010 at 11:22

I think proper patch should also fix different places where _hs_taxonomy_hierarchical_select_get_tree() is called (or not called). Without it this "optimization" will actually SLOW us down because if you are blindly calling _hs_taxonomy_hierarchical_select_get_tree() with NULL maxdepth that means you are going to load whole tree in memory anyway and breaking the whole process into steps will only add overhead.
I am working on the patch which uses single query for every $depth step and combines branches to load in the "IN (..)" fragment of the query. Atm it runs slower when the function is called with NULL (i.e. unlimited) maxdepth.

I suspect HS Content Taxonomy may also need this patch cause it's not efficient enough with big vocabs.

This is a very complicated task and one-line patch like the one in #4 won't do it.

Log in or register to post comments

Comment #15

crea CreditAttribution: crea commented 6 April 2010 at 12:40

Taxonomy depth is pain with this optimization. With previous approach, whole tree is loaded into the memory and depth is easy to count. Now we optimize it to load only
part of the tree starting from $parent, and thus real depth is unknown. We need to evaluate consequence of this, maybe we don't need real depth at all ?

HS Content Taxonomy depth setting is a pain too especially combined with "parent term" feature. Problem is it's _real_depth_ measured from the root of vocabulary, not the _current_depth_ measured from the "parent term". So given arbitrary "parent term" and depth > 0, we need to figure out "real depth", and the only way to do it is to load all vocabulary starting from the root step by step until we find "parent term".

Log in or register to post comments

Comment #16

crea CreditAttribution: crea commented 6 April 2010 at 14:05

There's an additional problem with this opmization: while memory usage drops significatly, number of queries grows significatly at the same time because HS runs it's hook to fetch children for every term in selection. Thus I'm not sure if this is a good approach for most sites.

Maybe HS needs a new hook to optimize this: something like hook_hierarchical_select_children() but one that processes multiple parents in a batch. That way it would be possible to keep number of queries at sane level.

Log in or register to post comments

Comment #17

crea CreditAttribution: crea commented 6 April 2010 at 15:25

I wonder how people that work with big vocabularies like 20mb are handling Content Taxonomy configuration ? There's "Parent term" select element which loads all terms from all vocabularies in a single $options array. LOL.

Log in or register to post comments

Comment #18

crea CreditAttribution: crea commented 8 April 2010 at 10:55

Title:

Improve multiple item processing to reduce memory usage, page generation times and to improve scalability

» Reduce memory usage to improve scalability

File	Size
hs_high_performance.patch	10.8 KB

_hierarchical_select_hierarchy_add_childinfo() is a very evil function. It's a natural performance killer cause it fetches children for every item in hierarchy (!) only to count them. To optimize it, I introduced additional hook_hierarchical_select_children_count() that takes arbitrary number of parents and returns children counts array keyed by parents. Thus I was able to greatly optimize HS Content Taxonomy by making it calculate counts in one big fat SQL query.
Some rough numbers. I have Content Taxonomy field with World Regions 3-level vocabulary containing about 12000 items. On node edit page memory usage dropped from 40mb to 15mb. Page execution times dropped from ~1 sec to ~400ms. All numbers are reported by Devel.
Need to note that I have Content Taxonomy field "depth" set to 0 (unlimited depth). This is very important part (read below).

Attaching first version of patch. Summary:

New HS hooks

hook_hierarchical_select_children_count() accepts multiple $parents and counts children for every parent.

hook_hierarchical_select_children_multiple() accepts multiple $parents and returns all children together. I've only found one place to put it's invocation (_hierarchical_select_hierarchy_generate() function). This needs more testing: I haven't noticed any performance difference cause in my setup that code doesn't seem to run at all. Anyway, having such optional hook I think is a good idea.

These hooks are entirely optional and meant to be high-performance replacements when it's needed. A HS API implementation can simply ignore them if performance is not an issue.

Rewritten _hs_taxonomy_hierarchical_select_get_tree() function

- it accepts both single and multiple $parents. Multiple parents are very handy because one could load whole vocabulary using only 1 SQL query per each depth level (for example, to find a term's depth).
- loads and caches "branches" starting from parent, instead of the whole tree - idea half-implemented in patch in #4 originally

Implemented hook_hierarchical_select_children_count() and hook_hierarchical_select_children_multiple() for the HS Content Taxonomy submodule

Note that "depth" feature of Content Taxonomy is preventing major performance improvements: it's real depth i.e. number of levels from the root. To work with terms we need to know their "real depth" thus we are stuck loading whole vocabulary. That's why major performance gains implemented in this patch work only with depth = 0 (unlimited depth).

I'm sure there are lots of bugs introduced with this patch so we need as much testing as possible. I especially worry because there are so many configurations HS supports and there also could be issues related to compatibility with other modules and db_rewrite_sql().

Log in or register to post comments

Comment #19

crea CreditAttribution: crea commented 6 April 2010 at 18:44

Title:	Reduce memory usage to improve scalability	» Improve multiple item processing to reduce memory usage, page generation times and to improve scalability
Status:	Needs work	» Needs review

Log in or register to post comments

Comment #20

crea CreditAttribution: crea commented 6 April 2010 at 21:03

On depth problem: I think we need a general way of getting term depth without loading whole vocabulary tree. Only idea I have now is to load tree step by step, starting from root level, using our shiny new _hs_taxonomy_hierarchical_select_get_tree() and check the term existance on that level i.e. load "root level", if target term is not found, go next level etc.

Log in or register to post comments

Comment #21

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 6 April 2010 at 22:47

I am willing to test and report back, but I need to know if the issue I report in #10 was addressed first. Since my vocabularies aren't that big in order to to provide considerable performance measurements, I can only report if it breaks anything along the way.

Log in or register to post comments

Comment #22

crea CreditAttribution: crea commented 7 April 2010 at 09:22

@klonos
My patch has nothing to do with the one in #4 - it's completely different and should work in (I hope) all cases. So you are welcome to try it.

Log in or register to post comments

Comment #23

crea CreditAttribution: crea commented 7 April 2010 at 17:01

File	Size
hs_high_performance_2.patch	10.8 KB

Fixed a bug with wrong counting of terms for HS Content Taxonomy and a little bug inside _hierarchical_select_hierarchy_add_childinfo() function.

Log in or register to post comments

Comment #24

crea CreditAttribution: crea commented 7 April 2010 at 19:07

I haven't touched HS Taxonomy yet, but it needs lot of love too. For example, no wonder that memory usage is high: hs_taxonomy_hierarchical_select_root_level() calls _hs_taxonomy_hierarchical_select_get_tree() which (without my patch) loads the whole tree and statically caches it. Later hs_taxonomy_hierarchical_select_children() uses taxonomy_get_children() instead of _hs_taxonomy_hierarchical_select_get_tree() that means we have static cache almost unused, only eating memory ;)
My patch fixes it slightly, because whole tree is not loaded, but still this needs fixing to use _hs_taxonomy_hierarchical_select_get_tree() in both cases. Taxonomy_get_children() sucks anyway cause it doesn't use a static cache.

@klonos:
Do you use HS Taxonomy or HS Content Taxonomy ? I'm asking because I haven't touched HS Taxonomy yet and you may need a better patch to test.

Log in or register to post comments

Comment #25

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 8 April 2010 at 07:16

I use Content Taxonomy and I select HS as a widget for my fields.

HS Taxonomy/HS Content Taxonomy ?? are they actual modules or am I missing something here? Links please?

Log in or register to post comments

Comment #26

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 8 April 2010 at 07:28

...stupid me! you're talking about the sub-modules /hits newbie head at wall/ ...

I have them both enabled, but as I said, I use HS as a widget for fields. So, judging by the sub-module description, I guess its HS Content Taxonomy I am using.

Log in or register to post comments

Comment #27

crea CreditAttribution: crea commented 8 April 2010 at 11:00

Title:

Reduce memory usage to improve scalability

» Improve multiple item processing to reduce memory usage, page generation times and to improve scalability

On depth problem: for every child term we know "relative depth" from $parent passed to _hs_taxonomy_hierarchical_select_get_tree() function. So to calculate "real depth" of every child term we must calculate "real depth" of each $parent (*not* by loading whole vocab but in separate queries), and add the "relative depth". Remaining issue: $term->depth can contain *wrong* "relative depth" because terms are cached and $term->depth could contain depth calculated inside previous function calls (i.e. depth calculated starting from different $parents).

UPDATE: Actually we don't cache $term->depth and it always contains "current depth", i.e. depth starting from $parents originally passed to the _hs_taxonomy_hierarchical_select_get_tree(). So this is not an issue.

Log in or register to post comments

Comment #28

crea CreditAttribution: crea commented 11 April 2010 at 00:42

Status:

Needs review

» Needs work

I've found more errors both in the patch and in HS itself.

Log in or register to post comments

Comment #29

crea CreditAttribution: crea commented 20 April 2010 at 02:59

Title:	Improve multiple item processing to reduce memory usage, page generation times and to improve scalability	» Performance: Improve multiple item processing in big hierarchy setups.
Status:	Needs work	» Needs review
Issue tags:		+Performance

File	Size
hs_high_performance_3.patch	14.16 KB

Attaching next version of patch: fixed couple of bugs, implemented high-performance hooks for HS Taxonomy too. Note that the patch ignores exclude_tid parameter of HS Taxonomy (~~don't know if it breaks anything~~).
UPD: This parameter is only used on taxonomy_form_term form and doesn't prevent overall testing of the patch. We can deal with it later

Please test with big vocabularies!

TODO: Evaluate possibility of additional performance improvements:
1) HS Taxonomy with Term Permissions module installed (and exclude_tid parameter ??).
2) HS Content Taxonomy with depth > 0.

Log in or register to post comments

Comment #30

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 20 April 2010 at 22:24

@crea: thank you for your continued efforts in making HS better :) I'll reply to all your follow-ups below.

#14: you're absolutely right of course. It's by no means a complete solution. I was so enthusiastic because it was a simple way to get a big performance boost.

#15: exactly, when I first wrote hs_taxonomy, I also wanted to drop the depth stuff. But it's really necessary. What we could do is … devise a more efficient way to store taxonomy and keep this mirror in sync by using hook_taxonomy().

#16: yep. And I tend to agree (but am not entirely sure yet): I also think we need a new hook to be able to achieve the best possible performance and lowest memory usage.

#17: heh, yeah :) They just don't use Content Taxonomy :) That, or they're crazy.

#18: It's evil on its own, but it has allowed to perform far fewer callbacks from the client to the server to get the next level when it is not necessary to get the information at a next level…

#20: Yep, definitely. See my reply to #15, I think that's better?

#24: Yep, all implementations need serious revising for better performance. Most performance improvements are gained from having less *output*, not less *processing*.

#29: I'll review this ASAP.

All of this is very reminiscent of #448316: New hooks hook_hierarchical_select_offspring_count() and hook_hierarchical_select_children_count().

Log in or register to post comments

Comment #31

Bilmar CreditAttribution: Bilmar commented 22 April 2010 at 09:58

subscribing

Log in or register to post comments

Comment #32

crea CreditAttribution: crea commented 13 May 2010 at 12:12

Any updates ?

Log in or register to post comments

Comment #33

3dloco CreditAttribution: 3dloco commented 13 May 2010 at 23:01

Hello Crea,

Thanks for the patch, I've applied your patch on a dev site that has about 200,000 terms. Not sure if with my setup I am supposed to see improvements because I did not get any error messages but I did not see an significant improvement in memory consumption. Below find my setup and results, hope this helps...

When I measured node/add via Devel I got:

--devel_init()=5 MB with or without the patch.

--devel_shutdown()=117 with the patch and ~120 MB without the patch.

My setup for the node/add/content-type:

--The form has several vocabularies, one of them (categories) has about 1000 terms, starts at level 1 and has 4 levels. Two of them are flat vocabularies (only 1 level), one with about 190,000 and the other with 10 terms.

--The flat vocabularies use autocomplete/free tagging (about 190,000 terms) and select list as the widgets (about 10 terms), and are set for depth of 'blank'.

--The categories use hierarchical select widget starting at level 1 and the user can select up to three levels.

Also, I did not use the Term Permissions module.

Appreciate your help,

KH

Log in or register to post comments

Comment #34

crea CreditAttribution: crea commented 14 May 2010 at 12:25

@3dloco
Sounds like you have non-zero "depth" (maximum taxonomy tree level) parameter of Content Taxonomy field set up. With depth > 0 this patch currently works almost the same as before.
I recommend to look into ways of using zero depth (all levels selectable). Depth > 0 can get better performance with additional code, but it will always be slower than zero depth.

Log in or register to post comments

Comment #35

robby.smith CreditAttribution: robby.smith commented 8 June 2010 at 07:28

subscribing

Log in or register to post comments

Comment #36

greenc CreditAttribution: greenc commented 16 June 2010 at 03:27

+1

Log in or register to post comments

Comment #37

Francewhoa

he/him

French

Sept-Îles, Québec, 🇨🇦

CreditAttribution: Francewhoa commented 18 June 2010 at 05:29

Thanks crea & cpliakas for the patches :) I'll test on a fresh Drupal install. Then post my results here.

Log in or register to post comments

Comment #38

YK85 CreditAttribution: YK85 commented 20 July 2010 at 10:07

subscribing

Log in or register to post comments

Comment #39

phantomvish CreditAttribution: phantomvish commented 13 August 2010 at 15:34

+1 subscribing

I did try patch #29, Thanks to crea

I had many small vocabs and one large vocab with 29,272 terms (that uses HS widget for content taxonomy CCK ) with 3 levels and 96M php memory.

Before the patch - any page with HS was not even loading ! It just hung with the following php fatal error "Allowed memory size of 100663296 bytes exhausted (tried to allocate 71 bytes) in /home/thirayco/public_html/sites/all/modules/hierarchical_select/modules/hs_taxonomy.module on line 792"

After the patch - The page loads and HS selection works as expected.

waiting for the final patch.

Log in or register to post comments

Comment #40

crea CreditAttribution: crea commented 13 August 2010 at 09:41

@phantomvish:
Do you mean "as links" mode when terms are displayed in list ignoring hierarchy ? It shouldn't be affected by HS at all. It's displayed by CT only. HS implements hierarchical formatters only.

Log in or register to post comments

Comment #41

phantomvish CreditAttribution: phantomvish commented 13 August 2010 at 15:33

Sorry, you are right, on retest I just figured it was not related to HS.

Log in or register to post comments

Comment #42

jonvk CreditAttribution: jonvk commented 13 August 2010 at 15:36

I'm testing this patch on a site with a vocab with ~ 26k terms, with translations in many languages. So far so good, and it reduced peak memory from ~120M to ~40M on node/add pages (using xhprof for profiling).

Log in or register to post comments

Comment #43

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 13 August 2010 at 17:04

As I understand from #14 and other posts, this feature does solve performance/memory issues in cases where really large vocabularies are being used, but it does have its drawbacks. So I meant to propose to have this feature enabled only when large vocabularies are detected, in order to avoid any quirks or any overhead it might possibly cause. But I was thinking of the following matters...

- What makes a vocabulary 'large' enough?
- If only a single vocabulary is large, but there are other several 'small'(er) ones, do we have it enabled globally or only for the 'large' one and not for the others?
- In order to have more success chances we do need people testing this feature. If only people with large vocabularies test this, then we have only a few beta-testers and thus more chances to hit bugs from not having a broader audience testing it.
- On the other hand you might have people considering this a bloat that simply causes extra (as in 'unnecessary') processing and thus CPU/traffic load, either client or server side.

So, I propose the following things...

1. Keep this feature enabled by default in the dev until we are convinced enough testing was done and it is production-ready.
2. In the final stage of implementing this in a stable release, we make it a per-vocabulary option (a 'Optimize this vocabulary' checkbox with some help text explaining in which cases it might be required and in which ways it might help).
3. If people need it to be more 'automated', we can add a configurable 'global threshold' in terms included in a vocabulary. After this limit in number of terms is reached, this feature will be auto-enabled for the vocabularies that have the 'Optimize this vocabulary' enabled.
4. Another option for setting the default state of this setting for newly created vocabularies will solve the issue of people arguing whether it should be on or off by default.

I believe that these suggestions answer all the matters mentioned before. Thoughts?

Log in or register to post comments

Comment #44

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 25 August 2010 at 22:15

Thanks for the excellent overview, klonos! I should be able to review this soon.

Once again, great work, crea!

Log in or register to post comments

Comment #45

colan

he/him

English

Toronto 🇨🇦

CreditAttribution: colan commented 31 August 2010 at 20:15

Subscribing.

Log in or register to post comments

Comment #46

crea CreditAttribution: crea commented 18 October 2010 at 22:39

@klonos
As far as I can tell, there are no drawbacks in the latest version of patch. Information #14 is outdated. It was only correct at the time of writing the post.

There's only one small problem with the patch described in #29 (the patch ignores exclude_tid parameter used on taxonomy_form_term form). So anyone willing to work on the patch might want to fix that, and to add performance boost with depth >0 described in #20 and #27.

Log in or register to post comments

Comment #47

Francewhoa

he/him

French

Sept-Îles, Québec, 🇨🇦

CreditAttribution: Francewhoa commented 27 October 2010 at 06:49

@All: A Drupal 7 patch might be back-ported to Drupal 6. This patch allows Taxonomy API to scale :) This could be interesting for Hierarchical Select module. Any volunteers to contribute to back-port? Read more at
http://drupal.org/node/556842#comment-3589206
http://drupal.org/node/556842#comment-3589488

Log in or register to post comments

Comment #48

gianfrasoft CreditAttribution: gianfrasoft commented 29 October 2010 at 14:14

Subscribing.

Log in or register to post comments

Comment #49

dgastudio CreditAttribution: dgastudio commented 9 November 2010 at 19:42

subscribe

Log in or register to post comments

Comment #50

SophieG CreditAttribution: SophieG commented 29 November 2010 at 16:37

Hi everyone,

i have applied the patch made by crea but it does seem to work. Actually i don't know if what i get is normal or not.
I have a vocabulary with 3 levels (and 36000 terms), i can manage it using taxonomy manager, but i can't using the "core" management of taxonomy.
I am using content taxonomy and i can't edit the field into my content type, i am always having memory issues.

Could anyone help me ?

Thanks

Log in or register to post comments

Comment #51

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 1 December 2010 at 15:52

I am guessing beforehand that this has to do with your php memory limit, but please explain in more detail what you mean by this:

i am always having memory issues.

Log in or register to post comments

Comment #52

crea CreditAttribution: crea commented 1 December 2010 at 16:58

@SophieG
Looks like you describe a different problem.
IIRC, Content Taxonomy module loads _all_ terms from _all_ vocabs into the single select element on the field configuration form. See #17
It can't be fixed inside HS cause by the time we alter the form, options array is already loaded. It should be fixed inside Content Taxonomy.

Log in or register to post comments

Comment #53

SophieG CreditAttribution: SophieG commented 2 December 2010 at 14:34

Do you suggest i stop using content taxonomy ?
I am still on time to do so, so perhaps i should give it a try ...
But the problem is also that when i use the taxonomy core and try to edit my vocabulary (admin/content/taxonomy/edit/vocabulary) i can't edit it when my phpmemory limit is under 600Mo (which is not a realistic memory limit...)

Log in or register to post comments

Comment #54

crea CreditAttribution: crea commented 2 December 2010 at 15:08

No, I don't. Please read my last sentence.

Log in or register to post comments

Comment #55

SophieG CreditAttribution: SophieG commented 2 December 2010 at 15:58

ok i have found out that anyway not using content taxonomy is problematic when i try to use node_import module with large taxonomy...
Anyway it seems to work with a 300 Mo memory, i'll fix it later !
Thanks for your help !

Log in or register to post comments

Comment #56

crea CreditAttribution: crea commented 3 December 2010 at 15:28

Another improvement idea for the patch. Judging by Wim's words, I suspect HS only needs to count children as a way to not invoke ajax callback for a item's children, when the item doesn't have any child. So we actually don't need to count children: we only need to know if there are any. So we can modify children count hook to be TRUE/FALSE. Counting is costly operation in InnoDB, so we can instead use something like "SELECT .. LIMIT 1" and it will be blazing fast.

Log in or register to post comments

Comment #57

lelizondo CreditAttribution: lelizondo commented 11 December 2010 at 01:20

subscribing

Log in or register to post comments

Comment #58

chuckbar77 CreditAttribution: chuckbar77 commented 2 January 2011 at 16:34

subscribing

Log in or register to post comments

Comment #59

chrisolof CreditAttribution: chrisolof commented 12 January 2011 at 01:42

File	Size
hs_high_performance_4.patch	14.16 KB

Just a slight adjustment to patch 3 (comment #29):
On line 320 of the patch, there's a comment "We may or may not need this:"
In my testing, the commented-out code on line 321 *is* needed if you want your terms ordered by weight, and then by name.
Attached is patch 3 with line 321 uncommented.

Other than that the patch is working well for taxonomy vocabularies in the 20,000-30,000 term range.

Log in or register to post comments

Comment #60

puddyglum

English

Elk Grove, CA

CreditAttribution: puddyglum commented 13 January 2011 at 15:34

Does this have an impact on Hierarchical Menu? Or is this only for Taxonomy based Hierarchical selects drop-downs?

Log in or register to post comments

Comment #61

gandhiano CreditAttribution: gandhiano commented 22 January 2011 at 16:39

I have developed an hs implementation based on hs_taxonomy and, when applying patch from #59, no terms appear.

I have no idea why this happens, but it possibly has to do with way of building the hierarchy, which was based on the HS API tutorial (http://drupal.org/node/532724).

Log in or register to post comments

Comment #62

gianfrasoft CreditAttribution: gianfrasoft commented 24 January 2011 at 15:57

Ops.

Applying the patch from comment #59 I receive an "invalid argument supplied for foreach" error on the second foreach in the code:

// Update hierarchy.
foreach ($hierarchy->levels as $depth => $level) {
  foreach ($depth_items[$depth] as $item) {                // <-- right here !!!
    $hierarchy->childinfo[$depth][$item] = $counts[$item];   
  }
}

while updating a user via user_save API.

Log in or register to post comments

Comment #63

ajayg CreditAttribution: ajayg commented 30 January 2011 at 05:03

Status:

Needs review

» Needs work

Log in or register to post comments

Comment #64

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 6 February 2011 at 15:05

I committed a lot of changes to 6.x-3.x-dev. After they've been tested and a release has been made, we might be able to commit this, if at least a few people with >10K terms confirm that it adds improvements.

Log in or register to post comments

Comment #65

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 23 February 2011 at 11:02

Status:

Needs work

» Closed (fixed)

Closing due to lack of response.

Log in or register to post comments

Comment #66

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

CreditAttribution: klonos commented 23 February 2011 at 11:35

Status:

Closed (fixed)

» Closed (won't fix)

Wim, just a question in order to make sure: this (the patch in #59/#29) did not actually get in with the rest of the changes/improvements you mention in #63. Did it? I don't think so, so this is more like a 'won't fix'. I am changing the status accordingly so that people don't misinterpret 'fixed' as a feature that has actually been implemented.

PS: In my post/summary in #43 I was kinda against this feature (or to be more precise reluctant/concerned), so I proposed to have it be optional and if possible a per-vocabulary setting that could be turned on/off or auto-enabled only if a vocabulary's size exceeds a certain (configurable or hard-coded) threshold. IMO these suggestions should be taken under consideration if this is re-thought and we decide to implement it in the future (think implemented as a new project of sub-module).

Log in or register to post comments

Comment #67

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 23 February 2011 at 16:39

Status:

Closed (won't fix)

» Active

You're right. "works as designed" is even more accurate. This is just a possible improvement, but if nobody works on it, then it won't happen for HS 3. HS 4 will definitely get improved performance though. The concerns/ideas in this issue will be considered there as well.

Actually, let's just keep this open. There's too much interesting stuff in here to close it.

Log in or register to post comments

Comment #68

twooten CreditAttribution: twooten commented 27 June 2011 at 23:07

I just wanted to confirm that the patch in #59 (and #29) does seem to speed things up. I have a taxonomy with over 11,000 terms.

Thanks!

Log in or register to post comments

Comment #69

awdlewis CreditAttribution: awdlewis commented 13 October 2011 at 07:55

~~Could somebody please post the patched module from #59?~~
Thanks

Log in or register to post comments

Comment #70

scatteredbrainV CreditAttribution: scatteredbrainV commented 3 July 2012 at 07:16

Is this patch also available for the Drupal 7 version of the Hierarchical Select module?
Thanks!

Log in or register to post comments

Comment #71

hibersh CreditAttribution: hibersh commented 2 November 2012 at 12:53

File	Size
hs_high_performance_544324.patch	3.56 KB

7.x branch patch

Log in or register to post comments

Comment #72

marcelomg CreditAttribution: marcelomg commented 16 July 2013 at 20:54

The above patch didn't do anything in my case.

Log in or register to post comments

Comment #73

kenorb CreditAttribution: kenorb commented 18 November 2014 at 19:57

Issue summary:	View changes
Status:	Active	» Needs review

Log in or register to post comments

Comment #74

stefan.r CreditAttribution: stefan.r commented 20 November 2014 at 13:26

Version:

6.x-3.x-dev

» 7.x-3.x-dev

Log in or register to post comments

Comment #75

gigabates CreditAttribution: gigabates commented 8 January 2015 at 17:29

File	Size
hierarchical_select-high_performance_fix-544324-75.patch	3.87 KB

Rerolled for 3.x-dev.

Log in or register to post comments

Comment #76

stefan.r CreditAttribution: stefan.r commented 30 January 2015 at 19:29

OK this will need a few reviews before it gets in -- @gigabates did it speed things up for you?

Log in or register to post comments

Comment #77

joseph.olstad

French

CreditAttribution: joseph.olstad commented 29 April 2015 at 20:00

Status:

Needs review

» Reviewed & tested by the community

This patch #75 shaved off 9 seconds off our "new draft" click.
from 12 seconds down to 3 seconds.

Thanks to Gigabates, Wim Leers, Francewhoa and everyone else above!

Log in or register to post comments

Comment #78

stefan.r CreditAttribution: stefan.r commented 29 April 2015 at 19:02

Status:

Reviewed & tested by the community

» Needs review

Good to know this helps with speed, however before this can be RTBC'ed this will need a proper code review.

Log in or register to post comments

Comment #79

joseph.olstad

French

CreditAttribution: joseph.olstad commented 29 April 2015 at 19:58

xhprof 0.9.4 on php 5.3.x with mysql 5.6 and apache 2.2 says:

Before patch:

Overall Summary	
Total Incl. Wall Time (microsec):	11,819,566 microsecs
Total Incl. CPU (microsecs):	10,374,423 microsecs
Total Incl. MemUse (bytes):	81,081,768 bytes
Total Incl. PeakMemUse (bytes):	80,924,288 bytes
Number of Function Calls:	667,545

After patch:

Overall Summary	
Total Incl. Wall Time (microsec):	3,141,281 microsecs
Total Incl. CPU (microsecs):	2,393,635 microsecs
Total Incl. MemUse (bytes):	49,345,240 bytes
Total Incl. PeakMemUse (bytes):	49,839,544 bytes
Number of Function Calls:	341,572

This is with a site with a LOT of content/taxonomies, languages, nodes, complexity, etc. So far no regressions. We were so desperate for a performance fix that we put this patch into production right away. If you don't hear back from me it's because everything works.

Log in or register to post comments

Comment #80

joseph.olstad

French

CreditAttribution: joseph.olstad commented 1 May 2015 at 14:16

xhprof profile details

BEFORE patch (xhprof callgraph cropped and saved in png image)
BEFORE patch (pdf with function call time and memory usage stats)
BEFORE patch (odt with function call time and memory usage stats)

xhprof callgraph before patching hierarchical_selec(need to zoom in to see)
before

AFTER patching hierarchical_select (xhprof callgraph cropped and saved in png image)
AFTER patching hierarchical_select (pdf with function call time and memory usage stats)
AFTER patching hierarchical_select (odt with function call time and memory usage stats)

after patching hierarchical_select (need to zoom in to see)

Log in or register to post comments

Comment #81

joseph.olstad

French

CreditAttribution: joseph.olstad commented 1 May 2015 at 16:56

Ok, I found the block of code that is costing us about 9 seconds, it's from this commit in hierarchical_select:
http://cgit.drupalcode.org/hierarchical_select/commit/?id=c4579a7589bad0...

+  // Provide support for Title module. If Title module is enabled and this
+  // vocabulary uses translated term names we want output those terms with their
+  // translated version. Therefore a full taxonomy term entity load is required,
+  // similar to taxonomy_get_tree().
+  if (module_exists('title')) {
+    $vocabulary = taxonomy_vocabulary_load($vid);
+    if (title_field_replacement_enabled('taxonomy_term', $vocabulary->machine_name, 'name')) {
+      $term_entities = taxonomy_term_load_multiple(array_keys($terms[$vid]));
+    }
+  }
+

This code provides support for the Title module which we ARE using in our case so it's in all likelihood necessary, so we should keep it. Due to the caching capabilities added with patch number 75 the performance hit of Title module support is reduced from 8.5 seconds down to about 300 milliseconds (a third of a second)

SO, the patch is good, it adds caching capability to hs_taxonomy.module and is properly working with the latest HEAD in git. (I just tested it)

Log in or register to post comments

Comment #82

joseph.olstad

French

CreditAttribution: joseph.olstad commented 2 May 2015 at 05:00

@stefan.r , I did a pretty thorough code review both directly and empirically, I performance profiled the past 25 commits of hierarchical_select to identify the expensive code and I found it and tested it against this patch. Looking through the patch 75 code we can tell that it's adding branch caching of terms and all the logic for this is in the hs_taxonomy.module. I did some heavy testing with it and profiling and discovered that it is compatible with all the commits I tested it against ( i tested several of the commits).

I read through this issue thoroughly and what people have been reporting is that this caching really helps performance when lots of terms are being loaded. For others they didn't notice a difference. This makes sense because some people might not be using the "title" module for example , in this case this caching likely won't make a difference for them. The logic in commit c4579a7 when run without the patch costs about 8 seconds against our schema, but when patched against patch 75 that cost drops down to less than a third of a second. For us we're using the "title" module and having lots of terms there's a huge improvement in performance as you can see from all the profiling data above.

I know the above paragraph stats because I ran a test where I commented out the code block (added in commit c4579a7 ) that I found in comment #81 and tested with and without the patch 75. In this case there was NO difference in the performance because I had commented out the expensive code. However when I used both patch 75 and restored the code block that provides title module support and translated terms then I noticed a huge performance gain of about 8000 milliseconds and the PDO calls reduced to almost half of what they were prior to patch 75.

Seeing as there have been no reported problems with this patch I'm inclined to say that we should commit this patch. In our case we noticed the speed increase in "edit draft" of content types using taxonomies, as well as "create draft" and also with breadcrumbs using taxonomies containing fields using the hierarchical_select widget.

Oh yes, and thanks to this patch, memory consumption drops by almost half from a peak of 81 megs down to a peak of 49 megs (against our fully populated schema).

Thanks.

RTBC+

Log in or register to post comments

Comment #83

stefan.r CreditAttribution: stefan.r commented 6 May 2015 at 10:22

Title:	Performance: Improve multiple item processing in big hierarchy setups.	» Performance: Improve multiple item processing in big taxonomy hierarchy setups.
Category:	Feature request	» Bug report
Priority:	Normal	» Major
Status:	Needs review	» Reviewed & tested by the community

This code looks a bit fragile, but these are not huge changes in logic and I'll trust that this has undergone extensive review and testing on your side.

With a >10 second waiting time should probably be considered a bug, so if this shaves off 8 seconds let's take a chance and commit this to dev with the caveat that this may be reverted if people report problems with this, and a new beta will have to wait at least a week or two.

Log in or register to post comments

Comment #84

6 May 2015 at 10:22

stefan.r committed a08d9e1 on 7.x-3.x authored by gigabates

Issue #544324 by crea, cpliakas, chrisolof, hibersh, gigabates, joseph....

Log in or register to post comments

Comment #85

stefan.r CreditAttribution: stefan.r commented 6 May 2015 at 10:22

Status:

Reviewed & tested by the community

» Fixed

Log in or register to post comments

Comment #86

20 May 2015 at 10:24

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Performance: Improve multiple item processing in big taxonomy hierarchy setups.

Comments

Attaching first version of patch. Summary: