This is a first and cautious patch for a soon-to-be-committed patch in apachesolr. It should reduce the work that DS needs to do for Apachesolr customizations

This patch includes

  • A new indexing type xs_*. This is, after analyzing, more demanding during index phase, less demanding during request phase. The problem is that the only non-indexed type in the current schema of solr is of type binary and binary needs to be base64_encoded.
  • Implements a new hook hook_apachesolr_search_page_alter() to get the build array and process the results to include all the DS data that is needed
  • De-coupled the function apachesolr_search_ds_search_execute so it can process the results seperately
  • Attention : You do need the following patch for now to make this work #1314406: De-duplication of the apachesolr_search_execute and apachesolr_search_user_defined_search_page (Patch was submitted)

    Comments

    nick_vh’s picture

    StatusFileSize
    new4.31 KB
    nick_vh’s picture

    Issue summary: View changes

    Updated issue summary.

    swentel’s picture

    StatusFileSize
    new7.4 KB

    Update patch with the apachesolr entities support - you need the apachesolr-multientity branch to test.

    nick_vh’s picture

    nick_vh’s picture

    Status: Active » Needs work
    nick_vh’s picture

    Status: Needs work » Needs review

    All needed functionality is in Beta14 or 15 of apachesolr so I suppose this should not be a stopper anymore

    sandergo90’s picture

    The patch works exellent. Thanx for this ;)

    nick_vh’s picture

    Status: Needs review » Reviewed & tested by the community

    Let's push this in! :)

    milesw’s picture

    This worked well for me too.

    To give a bit more summary for newcomers, this patch stores complete serialized nodes as a field in the Solr index so that node_load() can be avoided when DS is used for search results. Note: things will be pretty broken until you reindex all your content.

    It looks like a new field zs_* was added to the apachesolr schema a few beta versions ago, so we can do away with the base64 encoding. Updated patch is attached.

    I just tested this with a set of 50,000 lightweight nodes (not much text content). My solr index grew from 52mb to 194mb. Didn't check the time, but indexing performance maybe twice as slow.

    swentel’s picture

    Status: Reviewed & tested by the community » Needs work

    Interesting re: that new field.

    Other than that, this patch needs work. If entity api is not enabled is not going to work at all. I need to think a bit about this more this weekend.

    swentel’s picture

    Assigned: Unassigned » swentel
    swentel’s picture

    Version: 7.x-1.x-dev » 7.x-2.x-dev

    moving to 7.x-2.x branch for now.

    nick_vh’s picture

    zs_* does not exists, see summary of this issue. It is meant to use xs_

       <!-- Binary fields can be populated using base64 encoded data. Useful e.g. for embedding
            a small image in a search result using the data URI scheme -->
       <dynamicField name="xs_*"  type="binary"  indexed="false" stored="true" multiValued="false"/>
       <dynamicField name="xm_*"  type="binary"  indexed="false" stored="true" multiValued="true"/>
    milesw’s picture

    It exists in my copy :)

     <!-- Unindexed string fields that can be used to store values that won't be searchable -->
     <dynamicField name="zs_*" type="string"   indexed="false"  stored="true" multiValued="false"/>
     <dynamicField name="zm_*" type="string"   indexed="false"  stored="true" multiValued="true"/>
    
    swentel’s picture

    Status: Needs work » Fixed
    StatusFileSize
    new17.76 KB

    Updated patch which went in for 7.x-2.x (with a small typo along in field_ui.inc heh).
    I'm currently *NOT* going to commit this in the 7.x-1.x branch as it involves some changes which could be confusing for users upgrading. Maybe in the future, we'll see. :)

    nick_vh’s picture

    Status: Fixed » Needs work

    @milesw, you are actually right. I must have been confused or have looked at another schema. I've added this zs_ field myself :-)

    so

    xs_* = binary fields, base64 encoded data, stored not indexed
    zs_* = string fields, stored not indexed.

    We can choose one or the other, but we need to understand fully why we choose one. Would there be a big difference if we serialize the whole node compared to base64 the whole object?

    swentel’s picture

    Also needs fix for files fallback if entity module doesn't exists as files can be indexed now as well which will be rendered in search results of nodes without entity module.

    swentel’s picture

    Ok did some tests as well to compare, and also tracked the memory usage between the two on a results page. Indexed 5000 nodes:

    With xs_
    Index size: 49M
    Search page: 1.04 MB, devel_shutdown()=5.72 MB, PHP peak=6.25 MB

    With zs_
    Index size: 49M
    Search page: 1.04 MB, devel_shutdown()=5.68 MB, PHP peak=6.25 MB.

    I have no differences in my index size, and memory is more or less the same (a little less at shutdown, but still). The goal (serializing the node) is still there.

    (fallback mentioned in #16 has been fixed in the meantime).

    swentel’s picture

    Status: Needs work » Fixed

    Ok, went ahead and dropped the base64 functions and use the zs_ field.
    Let me know if this is completely wrong and should be reverted :)

    Status: Fixed » Closed (fixed)

    Automatically closed -- issue fixed for 2 weeks with no activity.

    Anonymous’s picture

    Issue summary: View changes

    Updated issue summary.

    nils.destoop’s picture

    Version: 7.x-2.x-dev » 7.x-1.x-dev
    Issue summary: View changes
    Status: Closed (fixed) » Patch (to be ported)

    As apache solr support has been broken for 7.1.x, this patch should be ported.
    If this is not possible (looks like a huge patch), #1516764: Has Apache Solr update 7.x-1.0-beta19 broken ds search should be fixed another way.

    attiks’s picture

    Status: Patch (to be ported) » Needs review
    StatusFileSize
    new1.35 KB

    Quick fix needed for one of my clients who couldn't update to ds 2.x

    • swentel committed 04c9102 on 8.x-3.x
      - #1337610 followup by milesw: use zs_* field so we can drop...
    • swentel committed 1beebf6 on 8.x-3.x
      - #1337610 by nick_vh and swentel: New apachesolr search support for...