Release 3.0.0

This major release ships with a lot of new features improvements and changes.

Upgrading from 2.1.x

To upgrade to this release it is necessary to do configuration changes and reindexing.

  1. If present, remove all occurencies of the searchComponent VisualArrangementComponent in solrconfig.xml.
  2. Delete the following deprecated field definitions from schema.xml:
<fieldType name="featureType" class="de.pixolution.solr.schema.FeatureFieldType" />
<field name="feature" type="featureType" />
<dynamicField name="ts_*" type="boolean" indexed="true" stored="true" multiValued="false"/>
  1. Add the following new field definitions to schema.xml :
<fieldType name="docValuesBinary" class="de.pixolution.solr.schema.ImageDescriptorFieldType" />
<field name="pxl_imagedescriptor" type="string" indexed="false" stored="true" multiValued="false" />
<dynamicField name="pxl_descriptorfilter_*" type="trieInt" indexed="true" stored="false" multiValued="true" />
<dynamicField name="pxl_descriptorparts_*" type="docValuesBinary" indexed="false" stored="false" multiValued="false" docValues="true" />
<dynamicField name="pxl_textspace_*" type="boolean" indexed="true" stored="false" multiValued="false"/>

Note: the types trieInt and boolean might be changed according to your actual fieldtype names.

  1. Add the new param to your queryParser PixolutionParserPlugin and your updateRequestProcessorChain using ImageUpdateFactory in your solrconfig.xml:
<str name="pixolution.fieldname.prefix">pxl_</str>
  1. Remove the deprecated param from your queryParser PixolutionParserPlugin in your solrconfig.xml:
<str name="pixolution.feature.fieldname">feature</str>
  1. Remove the deprecated params from your updateRequestProcessorChain using ImageUpdateFactory in your solrconfig.xml:
<str name="processor.feature.fieldname">feature</str>
<str name="processor.textSpace.fieldname">ts_*</str>
  1. Adjust query params if necessary due to changed fieldnames when searching and indexing (e.g. the fieldname feature is now pxl_imagedescriptor, ts_* is now pxl_textspace_*).
  2. Creating image descriptors is much more complex than in previous versions and results in slower indexing performance. Make sure to use jblas math backend (see section Native math backends for faster indexing in the reference guide). You may also consider to precalculate the image descriptors outside of Solr to just index the already calculated image descriptors (see section Image descriptor generation outside of Solr in the reference guide).
  3. Reindex the complete collection.

New Features

New image descriptor and format

Thanks to the integration of the new pixolution lib 4.0.1, pixolution flow introduces a new image descriptor which actually is a container format for different smaller descriptors. Each of them describes different aspects of what’s important in an image. We call this container pixolution descriptor (or in short just descriptor). From a user perspective a pixolution descriptor is just a long string like the one below:

gCCB7vjKBy8NxtwNNRT7/+waDhTqCBIh7voW+PobGf3y6vYfJAkCCBPj/O4CFPDfAAYXBvH18QMA8gnvD/3+A4AAzfiOzciRhauG2YOZmIeDypi/gBD2AgUR6/om2fcL6Pj1BPuPqs/OAJCvnaLam5OjiZi2ioAxAACAQCYQaQYyRQJ3UERjVVkMXA8=

However, it is not necessary to understand the internals. Each descriptor part will be stored in separate fields in the index for an efficient I/O usage while searching and filtering. Therefore more fields are necessary to store the portions.

A pixolution descriptor represents an image in just 140 bytes. pixolution flow stores those 140 bytes and the string representation (which is about 190 additional bytes). In total storing the descriptor in Solr consumes about 330 bytes. However, due to the seperate storage of the descriptor portions, only those bytes are read that are important for the current search.

New param rank.mode

With the powerful new param rank.mode you can control the way images are scored. This easily enables you optimize scoring based on your use case. pixolution flow ships with four presets each with its advantages and disadvantages depending on the image set you are using: semantic, visual, balanced and colors.

If these presets are not enough there is an expert API where you can set the importance of each descriptor part per query. See the reference guide for details.

Smart Filter

The new smart filter rank.smartfilter is a feature to drastically boost search performance when searching for similar images. Scoring in Solr is a two step process: collecting relevant documents (number of found documents) and ranking/scoring them.

Doing visual similarity search the scoring process is expensive when millions of images have to be scored. The smart filter drastically reduces the amount of documents that have to be scored thus improving performance tremendously while keeping quality loss to a minimum.

The smart filter does not rely on any metadata. Relevant images are determined via a precalculated cluster descriptor. You can control intensity of filtering and thus performance gain and quality loss. The param rank.smartfilter accepts different filter levels: normal, high, veryhigh and extreme.

Command Line Interface: Bulkprocessing and Multithreading support

Since starting a Java Virtual Machine instance is quite expensive when used for just one image, supporting bulk processing and multithreading makes the command line interface (CLI) much more powerful. You can now calculate descriptors of images in bulks by giving a file of image URLs/paths. The processing can be done in parallel by specifying the number of threads to be used.

Optimizations

When searching for similar images (rank.by=id or rank.by=url), pixolution flow uses the rank.mode=balanced by default. Under the hood pixolution flow uses the new image descriptor with involvement of the visual appearance of an image. This leads to higher result accuracy and reduces dependency on metadata (e.g. keywords) very well.

To restore the old scoring behaviour set rank.mode=visual.

Color search (rank.by=color) is now faster since less data must be read from the index. We reduced the amount of bytes per image by 65%. Only 20 bytes per image must be read from index to score the images on color information.

Better and faster Keyword Suggester

By default the Keyword Suggester uses rank.mode=semantic and rank.smartfilter=normal. This leads to faster suggestions (smart filter) and better results (semantic rank mode). You can disable or modify these defaults by setting the params accordingly.

Bug Fixes

Configuration of Auto Context not optional

The configuration of Auto Context was not optional. A logic bug forced user to configure the pixolution.autoContext.fieldname.readFrom and pixolution.autoContext.fieldname.searchIn fields. This bug is fixed and Auto Context configuration is optional now.

Other Changes

Removed Visual Arrangement

Due to too little demand for the visual arrangement functionality, we have removed this function from pixolution flow version 3.0.0 on. If configured, the VisualArrangementComponent must be removed from solrconfig.xml.

pixolution lib 4.0.1 integration

This release integrates the new pixolution lib version 4.0.1 providing a completly new image descriptor and enabling a bunch of new features in pixolution flow.

Highlight is our new machine learning based image descriptor, which can be calculated on CPU. It improves image search quality. Image descriptors created with former pixolution flow versions are not compatible to this version anymore.

Easier field configuration

Fieldnames that are used for storing descriptor parts are now predefined. This enables you to just add the required fields with its corresponding names and no additional mandatory linking to these fields in the solrconfig.xml is required anymore. pixolution flow will find all required fields automatically.

However, you can set user defined prefix for all pixolution flow related fields in order to keep them logically in a group or to solve problems with already assigned fieldnames. See pixolution.fieldname.prefix in the reference guide.

New wording: feature is now descriptor

Starting with this release the wording feature (describing a pixolution representation of an image) is now replaced by descriptor. Doing so we try to avoid confusion about feature as functionality and at the same time using a more descriptive name.

New param name rank.by=descriptor

To be consistent with the new naming scheme we changed the param name rank.by=feature to rank.by=descriptor.