Closed (fixed)
Project:
Drupal.org customizations
Version:
7.x-3.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Reporter:
Created:
14 Nov 2013 at 21:41 UTC
Updated:
2 Jan 2015 at 00:14 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
tvn commentedThe problem is that text search on issue queues can bring D.o down and it did cause some outages last week. Fixing that is not as fast. So for the time being to ensure the website is up, there is a script in place, which kills a query if it is being executed for more than some specific amount of time.
Updating the issue summary to focus this issue on actually fixing text search problem. Downgrading to Major since the search is not bringing the site down at the moment, and we do need to deal with the release packaging first.
Comment #2
tvn commentedComment #3
tr commentedI don't know if it's bringing d.o. down, but I do know that whenever I have searched the "Drupal.org D7 upgrade QA" issue queue over the past day or so it *always* resulted in a long wait (> 30 seconds) followed by an empty white page with no results (not a wsod, I get a 200 response but the page source is empty).
Comment #4
Anonymous (not verified) commentedActually I have found searching on any issue queue does not work here for me since d.o's D7 upgrade, modules I have tried to search the issue queue are Public download count, Views, Wysiwyg. It seems to me all issue queue search are affected by this issue.
For example: https://drupal.org/project/issues/pubdlcnt?text=not+counted&status=All
This is with FireFox with a few extensions and stock latest version of Chrome at home and at work.
Comment #5
liquidcms commentedIs D.org running on SOLR? is it hosted on an Acquia cloud server? Can we not just throw more resources at this issue. Starting to impact my business when i can no longer search issue queue. :(
maybe we could have a left a D6 version of issue queue up and running until D7 upgrade was tested a bit more?
Comment #6
betz commentedIs there another way of searching the issue queue for the moment?
Open data? :)
Comment #7
xanoLet's focus on fixing the problem for now by exploring possible solutions and short-term workarounds.
Try using a search engine, such as Google or Bing. They cannot filter on issue status, but if you have an idea of what you are looking for, you may be able to find it.
Comment #8
achtonTry Google in the meantime:
http://www.google.com/#filter=0&q=site:http://drupal.org/project/issues/...
Comment #9
webchickThis is also actively destroying my workflow and causing duplicate issues which is dragging down our velocity...
Any further details that could be provided for e.g. the Search API module maintainers and/or people with query optimization experience? Like what's the slow query/ies you need to keep killing?
Comment #10
brad.bulger commentedi'm noticing that even when the search "works" that it is not correctly searching for some terms. the example i just saw was searching for "hook_load" in an issue queue. it seems to be matching "hook%" or something like it.
not sure if this is a separate issue but this seemed like a good place to start.
Comment #11
klokie commentedI think comments #3, #5, & #9 hit the nail on the head - having an non-functional issue search is counterproductive to fixing bugs and improving Drupal. I don't know how things are set up, but from the looks of it I'm guessing that there are one or more giant Solr indices that are getting hammered by these queries. If that's the case, then maybe using a more recent Solr version which supports sharding the indices would be the way to go - e.g. one shard per project maybe? Even if the more recent version would need testing, I can't imagine it could be worse than this ;)
Is there anything that we mortals can do to help?
Comment #12
klokie commentedBy the way, it took me several days to find this issue (nudge, nudge...) even after searching Twitter, asking around on IRC, etc. I think it would be very helpful to put up a notice on the front page of d.o. if possible, until this issue is resolved.
Comment #13
kenorb commentedAnother example:
https://drupal.org/project/issues/views?text=%22Style+RSS+Feed+requires+...
Comment #14
drummI started work on this. Search API DB makes separate full text index tables per-field. The resulting query has a UNION to get to each, I expect limiting what MySQL can optimize. I have a first draft of this and need a few hours of work to test.
Comment #15
drumm(Oh and the actual fix is merging the indexes into one table.)
Comment #16
drummI added a couple child issues, including a significant improvement to SearchAPI DB: #2155767: Improve the speed of full text searches by using a single index table.
#2007874: Performance issues with multivalued (list) fields and multiple OR filter is also worth looking into.
Comment #17
drumm#2007874: Performance issues with multivalued (list) fields and multiple OR filter isn't specifically text search, but the preliminary difference in EXPLAIN output does look good, so I added it as a child issue. It currently needs more work.
Comment #18
drummI determined I was wrong about #2007874: Performance issues with multivalued (list) fields and multiple OR filter; I don't think that will help. It would still be a good change for Search API DB.
#2155767: Improve the speed of full text searches by using a single index table still needs any reviews it can get. I think it will help a lot.
Comment #19
jhodgdonI just tested this on http://search_api-drupal.redesign.devdrupal.org on all 5 issue search pages:
project/issues and its advanced search page
project/issues/(specific project) and its advanced search page
project/issues/user
I did not experience any problems. Searches seemed to be returning the correct results.
Comment #20
helmo commentedGreat, huge improvement :)
Comment #21
klonosIf this works as expected, then please please deploy!
Comment #22
drummIt will need an hour or two downtime window.
It may be possible to shut off only the issue queue. We would have to pay a lot more attention to how the DB servers are handling the load. Taking the whole site down would be easier and more safe.
Comment #23
klonos1-2 hours might be a lot for some, but I say we better do it now than later because no time will ever be proper. This issue is tormenting us and it's already set to critical.
Comment #24
drummAdding deployment instructions.
Comment #25
drummThe deployment announcement draft is at https://drupal.org/node/2168925
Comment #26
drummComment #27
drumm#2155767: Improve the speed of full text searches by using a single index table is deployed. Now we see how it does in production.
Comment #28
drummThis has not made a huge impact, so the query killer is back on.
Next we can:
Comment #29
drummRemoving deployment instructions which were done.
Comment #30
drummAttached are the top 100 words appearing in issue queues. Blacklisting a few of them would help, as long as they are not useful to search for.
Comment #31
jhodgdonI scanned the list and I think blacklisting all of them would help. I see some that you might think would be useful to search for, but they're also words that appear in things like "Issue automatically closed for 2 weeks without activity" so they become useless because every closed (fixed) issue has them. Blacklisting words that appear that frequently sounds like a resonable plan if you think it would help the search query efficiency.
But ... I often find myself searching for an exact error message, which could very well contain one of those words. If it was blacklisted would the search still work correctly (assuming I had a few non-blacklisted words in there too)?
Comment #32
klonosDo you think that we could find a way to exclude system messages (nodechange) text from the search? If we could do that, then terms like "activity" would still return useful results and at the same time we'd cut down a bit on the index db.
I'm all for excluding terms like "the", "a", "for" etc, but disagree with blocking *all* words in that list - especially "module", "node", "page" and the likes.
Comment #33
klonosAll this time all I was getting when searching was a WSOD. Just now I got this:
Fatal error: Class 'Database' not found in /var/www/drupal.org/htdocs/includes/bootstrap.inc on line 3146Comment #34
arnoldbird commentedI'm not sure it has any bearing on how to resolve the issue, but I don't ever use drupal.org to search for an exact error message. Without exception I paste the error message into my browser's location window and hit return. I don't see a need to use the drupal.org issue search for this, because it's almost never clear from an error message what module is at fault, so there is no real need to narrow the search to a particular project. In some cases a search result outside of drupal.org reveals that the problem is not caused by a drupal module at all -- even if the search result happens to pertain to some drupal site where another developer encountered the same non-drupal problem. What I'm suggesting is that initially it seems like an odd workflow to search for an error message using the search engine at drupal.org.
I find that I only use the drupal.org issue search when I want to know about issues reported in a particular project that pertain to some general concept. It's often something that can be expressed in a word or two, e.g. cache, javascript, theme, permission, etc.
Like I said, I don't know if these observations are helpful for devising a solution here. I suppose even if almost everyone is like me in these ways, there still might be some case for occasionally using drupal.org to search for an error message. It's not a case that I've encountered, though.
Comment #35
jhodgdonRE #34 - I use that type of search to figure out if someone has already filed an issue, to avoid posting duplicates.
The two main reasons I use text-based search on the issue queue are:
a) To figure out if an issue I'm looking at or just identified and want to file is a duplicate of an existing issue
b) To locate an issue I remember seeing before that I need to look at again or reference for some reason
And to these two use cases, we should add #34's main use case:
c) To find issues pertaining to a general concept, such as cache, javascript, theme, permissions, etc.
So we should make sure that this blacklist idea doesn't remove the functionality of any of these use cases.
Comment #36
drummThe words are still used in score calculation. Words in the body and title are scored 2× and 4× comment words.
With blacklisting, searches with "the" won't consider every issue with that word, and the DB will have less rows to examine. But, "the" is only a little over 1% of the index, so I'm not sure just how much it will help. I don't think we should be too aggressive up front.
Search API DB doesn't support phrases. (I believe the Solr 4 backend would.) It could be added to the logic of
SearchApiDbService::createKeysQuery(). Although I see some optimizations we can do in that function first.Comment #37
drummI think so, an extensions implementing
SearchApiProcessorInterface::preprocessIndexItems()looks like a way to do it.Comment #38
drumm#2170681: Remove a temporary and a filesort from text queries will potentially help quite a bit. The text queries currently are two temporary tables and two filesorts. Both cost a lot, especially when the temporary tables get large enough to land on disk.
Between my testing on search_api-drupal.redesign.devdrupal.org and the tesbot accepting it, I feel okay deploying it later today. That patch still could use people testing other code paths to the code changed there, such as facets and autocomplete. We don't use those features of SearchAPI on Drupal.org, so I have not tested them.
Comment #39
drumm#2170681: Remove a temporary and a filesort from text queries is now deployed.
Comment #40
drummThe issue summary needs more examples of searches that WSOD.
This has improved, but I don't think we are done yet. The one example I see in the issue summary does work every time I've tried. I have not looked into getting quantitative stats on how many failures there are; that would be a good idea.
Switching the project to Drupal, https://drupal.org/project/issues/drupal?text=drupal_static&status=All&p..., does WSOD. I think the fix for that will be configuring the tokenizer to not split on underscores. Results for "drupal_static" will be a whole lot more useful than "drupal static", and less rows to sort.
I'm also adding "Change the subquery to a JOIN" to the issue summary. As a followup to #2170681: Remove a temporary and a filesort from text queries, it will reduce the number of rows that need to be examined for searches that are also limited by another factor, such as project or status. It is a larger change to the logic in Search API, also made possible by #2155767: Improve the speed of full text searches by using a single index table. After that, we may be running out of query improvements, or close to it.
Comment #41
askibinski commentedHere is another query which gives me a timeout every time:
https://drupal.org/project/issues/drupal?text=hide+upload+button&status=...
Comment #42
klonos@askibinski: I don't know what can be wrong with your case, but that link took only a
couple offew seconds to bring up results for me from the first run.Comment #43
askibinski commented@klonos, indeed it does now, but it timed out around 15 times before. Must have been high load on the db server.
Comment #44
tr commented@askibinski's query fails every time for me right now ...
@drumm: Pretty much every query I try times out, how many examples do you want?
Comment #45
drumm10-20 examples would be good.
Comment #46
tr commentedOK, I can't tell if you're serious or not, but here are 20 real-world searches. As you can see, 15 out of the 20 always fail, and only 5 out of the 20 always work. So, I'm serious when I say the core issue queue search still does not work for me since the upgrade.
These searches are all things I know should return responses. Most, but not all, are one-word searches. I included @askibinski's search as the first example.
The first 15 all time out with a WSOD every time I try them. The WSOD appears approximately 40 seconds after I submit the search in all cases:
https://drupal.org/project/issues/drupal?text=hide+upload+button&status=...
https://drupal.org/project/issues/drupal?text=hook_update_N&status=Open&...
https://drupal.org/project/issues/drupal?text=config+validate&status=Ope...
https://drupal.org/project/issues/drupal?text=change+notice&status=Open&...
https://drupal.org/project/issues/drupal?text=hook_help&status=Open&prio...
https://drupal.org/project/issues/drupal?text=hook_date_formats&status=A...
https://drupal.org/project/issues/drupal?text=translation&status=All&pri...
https://drupal.org/project/issues/drupal?text=format_interval&status=Ope...
https://drupal.org/project/issues/drupal?text=simpletest&status=Open&pri...
https://drupal.org/project/issues/drupal?text=pdoexception&status=Open&p...
https://drupal.org/project/issues/drupal?text=pager&status=Open&prioriti...
https://drupal.org/project/issues/drupal?text=mobile+theme&status=All&pr...
https://drupal.org/project/issues/drupal?text=migrate&status=All&priorit...
https://drupal.org/project/issues/drupal?text=password&status=All&priori...
https://drupal.org/project/issues/drupal?text=permission&status=All&prio...
All of the above searches WORK on my personal issue queue (https://drupal.org/project/issues/user/202830) and on the Ubercart issue queue (https://drupal.org/project/issues/ubercart), both of which are smaller than the Drupal core issue queue, but are by no means small compared to the issue queues for other projects/users.
These other 5 searches worked first time and every time I tried:
https://drupal.org/project/issues/drupal?text=&status=All&priorities=All...
https://drupal.org/project/issues/drupal?text=doxygen&status=Open&priori...
https://drupal.org/project/issues/drupal?text=phpunit&status=Open&priori...
https://drupal.org/project/issues/drupal?text=doctrine&status=All&priori...
https://drupal.org/project/issues/drupal?text=responsive&status=All&prio...
Hope this helps ...
Comment #47
webchick#2173087: Regression: "Project" column on "Your issues" has lost its link was introduced sometime in the past few days. I'm not sure if it's related to work on this issue or not, since I'm not sure what all has been deployed.
Comment #48
drummThanks, that list is indeed helpful.
It looks like we need to adjust the tokenizer, which treats
[^[:alnum:]]as word boundaries._should be considered to be part of a word. Then searches for hook_whatever will not get all the results for "hook", which is time-consuming to sort, and not useful for us anyway.This requires a reindex, so we want to get them all done at once if possible. What else should we add?
_used in identifiers\namespace$variableThis is run on both English and embedded code, so we can't include common punctuation. We can use anything that fits in http://www.php.net/manual/en/regexp.reference.character-classes.php, basically any single character.
Comment #49
webchick::is the way to denote OO functions (#2175017: FieldDefinition::create() doesn't populate default 'settings' for the field type as an example title)foo.barfor file names (#1996238: Replace hook_library_info() by *.libraries.yml file as an example title)*is often used to denote "wildcard" (#2172235: Upgrade Twig to 1.15.* from 1.12.* as an example title)#for form API/render properties (#2048637: Add #type => 'attributes' and wrap in Attribute class with drupal_pre_render_attributes as an example title)These are all based on the first page of https://drupal.org/project/issues/drupal btw, feel free to check there for others.
Really, I would go the opposite way and only word break on whitespace characters, if it were me. I'm not sure if it's that simple, though.
Comment #50
jhodgdonBreaking on whitespace is more or less OK, but you also want to strip normal punctuation.
For instance, if you were indexing that previous sentence I just typed here, you would want to index the word "puncuation", not "punctuation.".
And then there are some punctuation things you want to strip out of the middle of words, like hyphens and dashes -- those probably should be word breaks.
It's a fairly complicated problem... The core Search module does a semi-OK job of it; not sure with Search API.
Comment #51
drummhttp://drupalcode.org/project/search_api.git/blob/refs/heads/7.x-1.x:/in... is the tokenizer code. It ignores single quotes by default, and that is configurable too.
I thought about
::, but that is 2 characters, and we may only be able to match one at a time. That will need testing, I'm not sure how much we can do. A single:is common enough in English.Maybe it is best to ignore
., so "punctuation. For" is indexed as "punctuation for", and "*.library.yml" is indexed as "*libraryyml", if we also consider*to be part of a word.These should get us closer in this issue. I'm sure more improvements in Search API's tokenizer would be welcome too.
Comment #52
drummActually, I bet we can't ignore
.. 1.2 would be the same as 12.Comment #53
helmo commentedThere might be some related discussion around http://codesearch.debian.net/about I have not read through it but guess they have had similar issues.
Comment #54
drummOn http://search_api-drupal.redesign.devdrupal.org/, I'm testing out these stopwords
Which removes approximately 11% of the rows in our text index.
I am also trying out
[^[:alnum:]_\$#*]for whitespace characters.I am leaving
[']as an ignorable character.Comment #55
drummThanks to #2178437: Add a DB user for cron, the query killer added to mitigate this issue, no longer blocks other issues, like #2129811: Reported install usage statistics count on the project page is not being updated. While we are not there yet, searches have improved. I think it is okay to bump this down a level.
http://search_api-drupal.redesign.devdrupal.org/project/issues has been re-indexed, so you can search for "drupal_static" and see results that will be faster and more relevant. Please note that dev sites have ice cold caches, so the first page load will be nowhere near representative. The second page load will be better, but still not a direct speed comparison; the site has a subset of issues, and the hardware is at least a couple orders of magnitude not as good as production. Do review for the results you get, considering issues from a 60-day window are on the site.
Comment #56
drummI deployed this round of changes. You will see some missing results when searching for words with underscores and other newly-non-word chars. For example, https://drupal.org/project/issues?text=comment_count_unpublished&project... just has the one issue I reindexed now, at this time.
More results will come in as reindexing everything happens. Issues are always reindexed on comment, so active issues will be reindexed faster.
Comment #57
drummDone reindexing in 2591m13.076s. I tried all the searches containing an underscore in #46 and the issue summary, and they all worked. It would be worth testing during Monday to Friday daytime before declaring those fixed, Sunday is our lowest traffic day.
Next, we need to organize the searches mentioned in comments into working & non-working sections in the issue summary.
Comment #58
drummAt least 2 followups are needed. I'll file proper follow-up issues when I revisit this issue.
Some issues did not get re-indexed and logged
A way to find these is looking for remaining stopwords in the index:
SELECT i.item_id, i.title FROM search_api_db_project_issues i INNER JOIN search_api_db_project_issues_text t ON t.item_id = i.item_id AND t.word = 'the' GROUP BY i.item_id ORDER BY NULL;Shows 50 rows. We can keep these rows for now and make sure re-indexing happens once the underlying bug(s) are fixed.
Second, I had to hack
SearchApiDbService::splitKeys()because it re-tokenizes Search terms that had already been tokenized along the way from Views.Comment #59
webchickAnother WSOD on this query:
Undefined index: distribution_name in drupal_install_profile_distribution_name()
in https://drupal.org/project/issues/drupal
Comment #60
mgiffordor for that matter https://drupal.org/project/issues/views?text=release
Comment #61
drummThe new top words are attached.
Blacklisting more words will have diminishing returns now. Since it is set up, adding words to the blacklist is easier: add them to sites/default/stopwords.txt, DELETE from the index. No reindexing required. So if a word is totally useless in searching, we can go ahead and blacklist it.
Looking into the blank line, #1678068: Problem with whitespace and db index, would be worthwhile, but also would not have much of an impact for this issue.
Next we need to take a look at the SQL for the failing queries and look for improvements.
Comment #62
sunIMHO this should be bumped back to critical. Whenever I'm trying to search the Drupal core queue, I either get a WSOD or a "Fatal error: Class 'Database' not found." — I wasn't aware that this is a long-standing issue until I asked in IRC.
Regarding the last attached stopwords list, most entries are common language and thus OK to blacklist, but I'd ask for the following to be removed, since those are actual names of concepts, components, subsystems, classes, and facilities in Drupal core → their high counts only mean that these terms are substantial in Drupal:
I did not include "drupal" in that list, but only "org" instead, since that's a unique term → Searches for "drupal.org" should still yield issues that are explicitly mentioning "drupal.org", despite "drupal" being blacklisted.
Comment #63
ianthomas_ukHas Solr been reconsidered recently, or has this issue been discussed with the maintainers of the solr modules? drunken monkey, pwolanin and nick_vh were all involved in earlier discussions.
It looks like the main arguments against Solr were that Solr 3 would mean changes would take several minutes to appear in the search results, but Drupal wasn't really ready for Solr 4 yet. Those discussions were over 8 months ago, search_api_solr officially supports Solr 4 and I'm not aware of it causing any serious difficulties. Initially this could be used for the advanced search only, to iron out any teething problems.
Tip for anyone who's getting white screens while trying to search: The more of the other filters you can complete, the faster your search is likely to run. I try to always filtering by version for Drupal core issues.
Comment #64
nick_vhEven with Solr 4 you still need to have replication in place. If drupal.org works with two masters, where one is the hot spare, this could probably work out just fine?
In the other case, we need to consider solrcloud as updates need to be submitted directly. If you work with a master-slave setup and you don't fix the replication lag, you're still 2 minutes behind...
Comment #65
Bevan commentedThis is currently the single most painful thing about Drupal.org for me, by a large margin. I am having to use google search instead but google does not understand facets like version number and issue status, so it is not very useful. I consider myself a power-user of Drupal.org (in that I know and use most of it's features). I search or use issue queues a couple of times per week and other features several times a day (usually project pages).
Comment #66
mgiffordHere's another string that's giving me a wsod consistently:
https://drupal.org/project/issues/search?text=profile&projects=&issue_ta...
Where by just adding an 's' it failed the first time and then loaded the second:
https://drupal.org/project/issues/search?text=profiles&projects=&issue_t...
It's like it's running out of memory processing the request. What are the server logs telling us?
We should have a tonne of fatal errors in the logs related to search requests.
Comment #67
drummSorry for being quiet for a few days here. Other issues have also been priorities, and not everything can be solved at once. I'll be concentrating on this again in the next few days, starting with:
Comment #68
drummThe #58 cleanups are at #2201041: Do not re-tokenize text. Only the last chunk of the patch has been deployed on Drupal.org, since #58 was written. The new patch would be a tiny improvement. It is allowing stopwords to be indexed when they are part of words over 50 characters that get tokenized again.
#1678068: Problem with whitespace and db index would also be a tiny improvement. Whitespace is indexed instead of the number
0.The issue summary now has a list of examples I could find in the comments here. Currently: 12 are working, 6 are slowly working, 9 WSOD.
The search_api dev site is now rebuilt.
The next step is getting the queries for WSOD pages from dev, and seeing what can be done to rewrite them at the console in production. (Dev doesn't have a full data set, so MySQL will execute queries differently. Staging is underpowered and likely inaccurate too.)
More reviews of the Search API patches would be appreciated.
Comment #69
drummComment #70
drummComment #71
webchickAdded a search for an error message I was getting to the WSOD part of the issue summary: "PHP Fatal error: Class Drupal\Core\TypedData\Plugin\DataType\LanguageReference contains 1 abstract method and must therefore be declared abstract or implement the remaining methods"
Comment #72
drumm(Fixing link since the link filter thinks colons are for punctuation.)
Comment #73
drumm(Re-ordering the lists.)
Comment #74
drummI added a covering index on project and issue status, which was a common combination in the searches reported in this issue:
ALTER TABLE search_api_db_project_issues ADD INDEX (field_project, field_issue_status);This has had excellent results overall. I've been able to load every sample search except https://drupal.org/project/issues/drupal?text=config+validate.
Comment #75
drumm(Re-add URL that got lost.)
Comment #76
drummhttps://drupal.org/project/issues/drupal?text=config+validate actually now loads too.
I added #2207205: Change score from float to int as a child issue, which should be a good speed boost.
Comment #77
ianthomas_ukI've just added https://drupal.org/project/issues/drupal?text=update+process&version=8.x to the examples. It worked when I set status to 'Active', but not '- Open issues -'.
Comment #78
Bevan commentedAwesome! Thanks drumm! :)
Comment #79
ofry commentedI added https://drupal.org/project/issues/views?text=Undefined+offset%3A+0&statu... to WSOD list.
Comment #80
mustanggb commentedAdded https://drupal.org/project/issues/drupal?text=roles+field to WSOD list.
Also https://drupal.org/project/issues/drupal?text=config+validate is still WSOD for me.
What about adding options to search:
And wouldn't it be better if we are going to have a timeout on search that instead of a WSOD we get to see the search page but with an error message saying something like "Search timed out, please narrow your search parameters".
Comment #81
mgiffordSometimes I can do a search, get a WSOD, then just re-load the search and get the results. Can't do that with this though:
https://drupal.org/project/issues/drupal?text=skip+to+main&status=All
Comment #82
ofry commentedAdded https://drupal.org/project/issues/drupal?text=Missing+configuration+file... string.
Comment #83
tim.plunkettAdded this to the issue summary, thanks #63:
Tip for anyone who's getting white screens while trying to search: The more of the other filters you can complete, the faster your search is likely to run. I try to always filtering by version for Drupal core issues.
#82, this worked quickly:
https://drupal.org/project/issues/search/drupal?text=Missing+configurati...
Comment #84
webchickAdded another one, but looking at the list, it seems like it's fair to say that basically every single one of the WSOD queries is around unfiltered text searches on queues with a large number of issues (drupal, views, "all"). So if further optimization efforts could focus there, that'd be great. The BIG BOLD WARNING at the top is nice and all, but it requires finding this bug report in the first place, and adding superfluous filters is never anything we had to do in D6's version of Drupal.org, so unfortunately this still makes it extremely difficult to find duplicate issues in the core queue.
Comment #85
webchickOh nice. The 8.x-filtered version WSODs too. :P
Comment #86
drumm#2207205: Change score from float to int will improve sort speed for all text searches, since DBs tend to be better at ints. (Unless you have some sort of GPU-backed DB, which we don't.) Once reviewed, we'll need to schedule a downtime for deployment.
Comment #87
tvn commented#2207205: Change score from float to int committed. We're good to go!
Comment #88
ianthomas_ukAlmost all my text searches seem to be WSODing at the moment (I've added one more example to the issue summary).
It doesn't sound like we're going to be able to fix all searches with SQL tweaks, and the issue database will only get bigger. Can we reconsider a Solr-based solution, even if only as an alternative search? I believe the technical issues have been resolved, although there would obviously be a non-trivial amount of work to switch to Solr.
Comment #89
heyyo commentedA lots of examples are now working:
https://drupal.org/project/issues/drupal?text=responsive
https://drupal.org/project/issues/drupal?text=pdoexception&status=Open
https://drupal.org/project/issues/drupal?text=hook_update_N&status=Open
https://drupal.org/project/issues/drupal?text=hook_help&status=Open
https://drupal.org/project/issues/drupal?text=doxygen&status=Open
https://drupal.org/project/issues/views?text=release
https://drupal.org/project/issues/drupal?text=migrate
https://drupal.org/project/issues/api?text=drupal_static
https://drupal.org/project/issues/drupal?text=doctrine
https://drupal.org/project/issues/drupal?text=drupal_static
https://drupal.org/project/issues/drupal?text=format_interval&status=Open
https://drupal.org/project/issues/drupal?text=hook_date_formats
https://drupal.org/project/issues/drupal?text=hook_date_formats
https://drupal.org/project/issues/drupal?text=phpunit&status=Open
https://drupal.org/project/issues/pubdlcnt?text=not+counted
https://drupal.org/project/issues/views?text=Style+RSS+Feed+requires+a+r...
https://drupal.org/project/issues?text=comment_count_unpublished&status=...
https://drupal.org/project/issues/drupal?text=update+process&version=8.x
https://drupal.org/project/issues/drupal?text=roles+field
https://drupal.org/project/issues/drupal?text=cache+tags
https://drupal.org/project/issues/drupal?text=cache+tags&version=8.x
Comment #90
tim.plunkett@heyyo, read the BIG TEXT at the top of the page. Using the exposed filters will help a lot.
Comment #91
drummDeploying #2207205: Change score from float to int will need 1-2 hours of downtime, which we don't want during Developer Days Szeged, so next week.
The pages that white screen will fluctuate a bit with the field cache, and other caches within MySQL. If a deployment, or something else, has cleared the field cache, each page is up to 50 cold node loads. When profiling slow queries, I routinely see the second run go quite a bit faster; there are InnoDB or other caches helping. (We don't have the basic query cache turned on, nnewton determined it was more harm then help with our workload.)
We still want to think about ways to make the queries from Search API DB faster. The big thing we need to avoid is temporary tables on disk:
On fewer rows - all three full text fields (title, body, comment bodies) are indexed separately. We would have 40% fewer rows if that was denormalized, either by
Comment #92
heyyo commented@tim.plunkett I DON'T have WSOD on all those pages, which are listed in the first post.
Comment #93
drumm#2207205: Change score from float to int has been finally been deployed. I expect the searches to improve. Although, we will be slow for a few minutes because caches were cleared.
Comment #94
helmo commentedIt's been a while for me but I have a new example: https://drupal.org/project/issues/webmasters?text=git+tag+release&status...
Filtering on open issues helped.
Comment #95
Bevan commentedAnother example that WSODs: https://drupal.org/project/issues/search/drupal?text=autocomplete.js+%23...
Comment #96
dddave commentedAdd another issue to the list and report personally that this happens to me from time to time.
Comment #97
webchickI do not at all understand why there are no logs that inform the d.o admins when this happens, and why human beings need to 1) manage find this issue (which is hilariously hard since ... search WSODs :)) and 2) manually report errors themselves. But nevertheless, this is what we were told to do in the D.o session in Austin today, sooooo...
https://drupal.org/project/issues/drupal?text=current+password
Context: Noticed while prepping a D8 demo that the "Current password" field is at the top of the list in user/1/edit as opposed to down by "New Password." Was trying to discover if there's already an issue for it. Still have no idea, so off to create a duplicate. :(
Comment #98
webchickOh, and just in case you're wondering, https://drupal.org/project/issues/drupal?text=current+password&version=8.x doesn't help.
Comment #99
tvn commentedMoving to the queue we watch.
Comment #100
webchickhttps://drupal.org/project/issues/user?text=preview&projects= Figured I'd try searching "my issues" instead of all of Drupal in the hopes that a smaller subset might result in not a WSOD. No such luck.
It then took me 7 minutes to actually find this issue in order to report it, because it is not easy to search for "white screen of death" on Google and find this specific issue in the myriad of results. The query that actually got me eventually here was "search white screen drupal.org site:drupal.org" (result 6 or so) which is a pretty sophisticated search for someone having this issue to have to run.
So I *really* hope that we're not taking lack of responses here as an indication the problem is fixed. It isn't. Logs should show that this happens nearly every time people search any reasonably large set of issues. I urge the d.o tech team to use logs to diagnose and troubleshoot this rather than rely on human reports. I just burned 10 minutes on this one, which makes me far less likely to report again in the future.
Comment #101
joshuamiIt is still very much on the list, but it is a big problem to solve. Search on Drupal.org in general is not as tuned as it should be. The issue queues are just particularly a problem because it is where we do our work. I will not let this fall off the radar, but I also cannot promise a quick fix—we have run out of quick fixes in this thread. We are likely going to need to consider changing the architecture a bit to get the performance we expect.
Comment #102
webchickI'm totally fine with "This is on the radar, but it's not a quick fix." I'm a lot less fine with "Unless we see reports from humans who managed to find their way to this issue and comment, we assume it's fixed." (Which is what it sounded like last week at DrupalCon.) If that's not the case, then carry on. :)
Comment #103
steven jones commentedAt the risk of just sounding like I'm saying that we should use whatever is 'hot' in tech at the moment, have we considered using Elasticsearch to power these views?
Elasticsearch's notion of real-time is an order of magnitude better than Solr's, since it's default configuration is for indexing changes to take 1 second before they are committed to the searchable index.
It seems like we're only going to get so far with MySQL and we're having to do lots of work to get full-text search when that's essentially a solved problem with these external tools.
Does anyone know of an issue that discusses using Solr or Elasticsearch for the issue queue search. I want to do some testing, like seeing how easy it would be to squirt all issues into Elasticsearch. @ianthomas_uk are you aware of such an issue kicking around?
Comment #104
drummEither Elasticsearch or Solr 4 would be a bit of a research project. It would be good to know there are others using them with Search API, with a number of records in our order of magnitude, 750133.
We are in a better situation than ever before for infrastructure to try these out. Spinning up a VM to try out either would be doable. And we do have some infra experience with Elasticsearch with Kibana for log hosting.
Where we can improve all around, including the search_api_db backend we use now, is BDD testing.
Comment #105
tr commentedA fundamental point is that while text search worked on drupal.org prior to the D7 upgrade, it hasn't worked very well since. So it seems to me this is a problem with our code, not with our infrastructure or database. Bailing and implementing an external tool as a solution may be pragmatic, but it ignores the underlying problem with DB/Search API performance in D7. If Drupal search can't be made to work on drupal.org, then maybe we should reconsider bundling the search module with core. After all, external search engines have worked better than our internal search for finding content on Drupal sites for as long as I've been using Drupal (6+ years), and I don't see that changing anytime soon.
Comment #106
helmo commented@TR: your saying two opposing things... Fix our code because d.o is an example site and don't use our own code because it has not worked for you in the recent past.
As much as I like us eating our own dog food... The > 700.000 items might be just over the top for plain db search. We don't have to expect our code to be the best for every site (e.g. 10.000 items has other considerations)
Therefore I'd be in favor of using an industry standard that fits this case. Solr or Elastic I don't care.
Comment #107
tr commented@helmo: I'm saying just one thing: Make a choice. Choose as a community either to support our core code or concede there are better solutions. Don't say it's good enough for everyone but us. Put the effort into either fixing Search for everyone or into better integrating with external tools so that implementing Solr on a Drupal site, for instance, isn't a major task.
Drupal doesn't have to re-invent the wheel for everything, and we've already acknowledged that there are things better left to other open source projects (Symfony, Guzzle, etc.). I'm suggesting that perhaps Search is also one of those things. Alternatively, if we choose to keep it in core, then it should be able to handle a use case like drupal.org. I would be happy either way,
Drupal.org is not just another site and we shouldn't be making decisions purely on whether it is the most expedient solution. Using drupal.org as a testbed can help us improve our product and make it more valuable to everyone in the Drupal community. So if we *do* chose to support our core code, we should be using drupal.org to identify problems and develop solutions to make Drupal a better product.
It's not just about number of records - even with a small number of records, Drupal search is not great at finding content. Search has for a long time been widely recognized as one of Drupal's weaknesses. If it can't be made to work well enough on drupal.org, then we ought to get rid of it and find a better solution for *all* Drupal users, not just put a band-aid on our own site.
Comment #108
steven jones commentedRight...sorry for starting discussion that is taking this issue off-topic.
Let's try and:
Use #2286501: [meta] 'Fix' issue queue listing/searching for discussing using an external search tool for D.o issue queue search.
Use this issue (for now) for improving the MySQL performance of the full-text search (@drumm fabulous work so far btw!)
@TR if you want to discuss ripping MySQL search out of core, then please find an appropriate issue to discuss it in.
We can discuss if Drupal.org should use MySQL search in #2286501: [meta] 'Fix' issue queue listing/searching.
Comment #109
steven jones commentedComment #110
WorldFallz commentedHere's one that will become important when d8 is released:
https://www.drupal.org/project/issues/drupal?text=comment+settings&statu...
I'm ashamed to admit, after taking d8 for a test drive today, I had to poke around the interface for at least 10 minutes and still couldn't find where the comment settings were moved to, lol, thus that search.
Also, the following (adding quotes around the text):
https://www.drupal.org/project/issues/drupal?text=%22comment+settings%22...
Comes up quickly with an empty result even though clearly, #731724: Convert comment settings into a field to make them work with CMI and non-node entities meets the search criteria (which I ended up having to use google to locate).
Comment #111
ofry commentedNew WSODing search:
https://www.drupal.org/project/issues/views?text=Grouped+filter+OR&statu...
Comment #120
ofry commentedNew WSOD'ing query:
https://www.drupal.org/project/issues/views?text=duplicate+when+sorting+...
Comment #123
Bevan commentedComment #124
joshuamiBetween the improvements that @drumm has deployed over the past few months and the powerful new database servers that were deployed yesterday, I'm not able to make any of these queries WSOD anymore.
Could those following this issue run some tests? I'm hopeful we might be able to close this one out.
Comment #125
drummLet's try all the examples after a cache-clearing deployment for a stress test. Even before yesterday, I only saw maybe a couple a day of these timing out in New Relic.
#2135385: Regression: the pager no longer lets me jump to pages, nor shows total number of pages has the potential to push this back over the edge. It effectively doubles the initial query. (Not including loading up to 50 nodes, that's not doubled.)
Comment #126
webchickActually, yeah. This does seem to be much faster than before! That's great. Must be the new db servers because even as of a couple weeks ago I was still getting WSODs on various queries.
#2135385: Regression: the pager no longer lets me jump to pages, nor shows total number of pages has been fixed enough for me personally, for months. The main problem was not being able to have an easy estimate/count of, say, "D8 critical issues with the tag D8 upgrade path" but that was solved back in Feb or so. I also see no one has been complaining about it since that commit, so maybe safe to close it out?
Comment #127
drummFor deploying #2394453: Test Mollom on Drupal.org, I did a cache clear all. Shortly afterward, most of the examples are quick. Some hang a little, but I think less than 10-15 seconds. (I just clicked them all starting at the bottom.) None white screened.
I think we're done here.
Comment #128
webchickYEAH!!! :D :D Awesome work, all. Thanks so much.