agile-ajax

Speed Up Ferret/acts_as_ferret Bulk Indexing

Those of you using ferret 0.11.6 (the latest released gem) and acts_as_ferret 0.4.3 (the latest stable version) may have noticed that rebuilding an index can be painfully slow when working with a large number of documents. Even if each document contains a relatively small amount of text, indexing crawls with a large set of documents. The problem is a result of how bulk update works; "bulk indexing" processes a single document at a time! Fortunately, there is a simple patch which will provide a significant speed boost.

There is a fairly old trac ticket where Francois Lagunas posted a clever patch which will make bulk indexing process documents as a group. Here is a monkey patch based on what he submitted as a patch (in Rails, just drop this as a file into config/initializers).

class Ferret::Index::Index

  def update_batch(docs)
    @dir.synchrolock do
      ensure_writer_open()
      commit = false
      docs.each do |id, value|
        delete(id)
        commit = true if id.is_a?(String) or id.is_a?(Symbol)
      end
      if commit
        @writer.commit
      end
      ensure_writer_open()
      docs.each do |id, new_doc|
        @writer << new_doc
      end
      flush() if @auto_flush
    end
  end

end

class ActsAsFerret::BulkIndexer

  def index_records(records, offset)
    docs = {}
    batch_time = measure_time {
      records.each { |rec| docs[rec.id] = rec.to_doc if rec.ferret_enabled?(true) }
      @index.update_batch(docs)
    }.to_f
    @work_done = offset.to_f / @model_count * 100.0 if @model_count > 0
    remaining_time = ( batch_time / @batch_size ) * ( @model_count - offset + @batch_size )
    @logger.info "#{@reindex ? 're' : 'bulk '}index model #{@model.name} : #{'%.2f' % @work_done}% complete : #{'%.2f' % remaining_time} secs to finish"

  end

end

If you are using a newer version of ferret by building the gem yourself, the ferret side of this patch is already included (although, you do need to make a slight change on the acts_as_ferret side). Stay tuned for another post about how to do this.

Comments: 1 so far

  1. great work!

    Comment by Thomas, Wednesday, November 5, 2008 @ 10:25 pm

Leave a comment

Powered by WP Hashcash

Who is Pathfinder?

Topics

Search

WordPress

Comments about this site: info@pathf.com