Configure ImageUpdateFactory

The class de.pixolution.solr.update.processor.ImageUpdateFactory is responsible for loading images from various sources via HTTP, HTTPS or FILE protocol and storing the calculated image descriptor in the index. This is triggered when doing <add> queries to the /update handler. The ImageUpdateFactory is also used when doing visual search queries using URL referenced images as search input. Therefore the processor.loader.* options will also affect direct user input and should help to safeguard against malicious queries like DoS attacks.

Declare the processor imageloader and add it to the processor chain of the UpdateRequestHandler:

<updateRequestProcessorChain name="imageloader" default="true">
  <processor class="solr.DistributedUpdateProcessorFactory"/>
  <processor class="de.pixolution.solr.update.processor.ImageUpdateFactory">
    <str name="pixolution.fieldname.prefix">pxl_</str>
    <int name="processor.loader.kBytesPerSecPerCon">1000</int>
    <int name="processor.loader.maxConnections">100</int>
    <int name="processor.loader.fileLimitkBytes">5000</int>
  </processor>
  <processor class="solr.RunUpdateProcessorFactory"/>
  <processor class="solr.LogUpdateProcessorFactory"/>
</updateRequestProcessorChain>

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.processor">imageloader</str>
  </lst>
</requestHandler>

pixolution.fieldname.prefix

This parameter is optional, but becomes mandatory if you wish to use a prefix for pixolution flow related fieldnames. The example above sets pxl_ as prefix. pixolution flow expects all necessary fieldnames to have this prefix (fieldname=[prefix][prescribed_fieldname]). For example, the prescribed fieldname imagedescriptor in the schema must then be named as pxl_imagedescriptor.

processor.loader.kBytesPerSecPerCon

This property restricts the bandwidth usage for image loading. The value sets the maximum allowed kBytes per second, that can be used for downloading an image.

For unlimited bandwidth usage (as fast as possible) set this value to zero or a negative number.

Note: This value counts per connection. If you add n documents in parallel queries, the maximum bandwidth usage for one core will be n * processor.loader.kBytesPerSecPerCon.

Default: 500

processor.loader.maxConnections

This parameter restricts the maximum allowed HTTP connections used at a time. If more connections are requested than allowed, the imageloader will wait until other loaders have finished and their connections are released.

In conjunction with processor.loader.kBytesPerSecPerCon you can limit the absolute bandwidth usage per second for one core: processor.loader.maxConnections * processor.loader.kBytesPerSecPerCon.

Default: 50

processor.loader.fileLimitkBytes

To safeguard against misusage you may limit the allowed filesize of the image to download in kBytes. If the file is bigger than allowed, an exception will be thrown. If you do not want to limit the filesize set this value to zero or a negative number.

Default: 5000

Fault tolerant update behaviour

Indexing images is more error-prone than usual text because of the dependencies of external resources like server not reachable, connection loss while loading and so on. Adding a document may succeed once but could fail the next time due to those uncertainties. The TolerantUpdateProcessorFactory can help when doing bulk operations (adding several documents in one request) and you wish to not abort if one or some images cannot be processed properly.

The following example shows how to configure a tolerant update behaviour. For a more in-depth documentation see the official Solr API doc:

<updateRequestProcessorChain name="imageloader" default="true">
  <processor class="solr.TolerantUpdateProcessorFactory">
    <int name="maxErrors">10</int>
  </processor>
  <processor class="solr.DistributedUpdateProcessorFactory"/>
  <processor class="de.pixolution.solr.update.processor.ImageUpdateFactory">
    [...]
  </processor>
  <processor class="solr.RunUpdateProcessorFactory"/>
  <processor class="solr.LogUpdateProcessorFactory"/>
</updateRequestProcessorChain>