This major release ships with a lot of new features improvements and changes.
Upgrading from 2.1.x
To upgrade to this release it is necessary to do configuration changes and reindexing.
- If present, remove all occurencies of the searchComponent
- Delete the following deprecated field definitions from
<fieldType name="featureType" class="de.pixolution.solr.schema.FeatureFieldType" /> <field name="feature" type="featureType" /> <dynamicField name="ts_*" type="boolean" indexed="true" stored="true" multiValued="false"/>
- Add the following new field definitions to schema.xml :
<fieldType name="docValuesBinary" class="de.pixolution.solr.schema.ImageDescriptorFieldType" /> <field name="pxl_imagedescriptor" type="string" indexed="false" stored="true" multiValued="false" /> <dynamicField name="pxl_descriptorfilter_*" type="trieInt" indexed="true" stored="false" multiValued="true" /> <dynamicField name="pxl_descriptorparts_*" type="docValuesBinary" indexed="false" stored="false" multiValued="false" docValues="true" /> <dynamicField name="pxl_textspace_*" type="boolean" indexed="true" stored="false" multiValued="false"/>
Note: the types
boolean might be changed according to your actual fieldtype names.
- Add the new param to your queryParser
- Remove the deprecated param from your queryParser
- Remove the deprecated params from your
<str name="processor.feature.fieldname">feature</str> <str name="processor.textSpace.fieldname">ts_*</str>
- Adjust query params if necessary due to changed fieldnames when searching and indexing (e.g. the fieldname
- Creating image descriptors is much more complex than in previous versions and results in slower indexing performance.
Make sure to use
jblasmath backend (see section Native math backends for faster indexing in the reference guide). You may also consider to precalculate the image descriptors outside of Solr to just index the already calculated image descriptors (see section Image descriptor generation outside of Solr in the reference guide).
- Reindex the complete collection.
New image descriptor and format
Thanks to the integration of the new pixolution lib 4.0.1, pixolution flow introduces a new image descriptor which actually is a container format for different smaller descriptors. Each of them describes different aspects of what’s important in an image. We call this container pixolution descriptor (or in short just descriptor). From a user perspective a pixolution descriptor is just a long string like the one below:
However, it is not necessary to understand the internals. Each descriptor part will be stored in separate fields in the index for an efficient I/O usage while searching and filtering. Therefore more fields are necessary to store the portions.
A pixolution descriptor represents an image in just 140 bytes. pixolution flow stores those 140 bytes and the string representation (which is about 190 additional bytes). In total storing the descriptor in Solr consumes about 330 bytes. However, due to the seperate storage of the descriptor portions, only those bytes are read that are important for the current search.
New param rank.mode
With the powerful new param
rank.mode you can control the way images are scored.
This easily enables you optimize scoring based on your use case.
pixolution flow ships with four presets each with its advantages and disadvantages depending on the image set you are using:
If these presets are not enough there is an expert API where you can set the importance of each descriptor part per query. See the reference guide for details.
The new smart filter
rank.smartfilter is a feature to drastically boost search performance when searching for similar images.
Scoring in Solr is a two step process: collecting relevant documents (number of found documents) and ranking/scoring them.
Doing visual similarity search the scoring process is expensive when millions of images have to be scored. The smart filter drastically reduces the amount of documents that have to be scored thus improving performance tremendously while keeping quality loss to a minimum.
The smart filter does not rely on any metadata.
Relevant images are determined via a precalculated cluster descriptor.
You can control intensity of filtering and thus performance gain and quality loss.
rank.smartfilter accepts different filter levels:
Command Line Interface: Bulkprocessing and Multithreading support
Since starting a Java Virtual Machine instance is quite expensive when used for just one image, supporting bulk processing and multithreading makes the command line interface (CLI) much more powerful. You can now calculate descriptors of images in bulks by giving a file of image URLs/paths. The processing can be done in parallel by specifying the number of threads to be used.
Better image search
When searching for similar images (
rank.by=url), pixolution flow uses the
rank.mode=balanced by default.
Under the hood pixolution flow uses the new image descriptor with involvement of the visual appearance of an image.
This leads to higher result accuracy and reduces dependency on metadata (e.g. keywords) very well.
To restore the old scoring behaviour set
Faster color search
Color search (
rank.by=color) is now faster since less data must be read from the index.
We reduced the amount of bytes per image by 65%.
Only 20 bytes per image must be read from index to score the images on color information.
Better and faster Keyword Suggester
By default the Keyword Suggester uses
This leads to faster suggestions (smart filter) and better results (semantic rank mode).
You can disable or modify these defaults by setting the params accordingly.
Configuration of Auto Context not optional
The configuration of Auto Context was not optional.
A logic bug forced user to configure the
This bug is fixed and Auto Context configuration is optional now.
Removed Visual Arrangement
Due to too little demand for the visual arrangement functionality, we have removed this function from pixolution flow version 3.0.0 on.
If configured, the
VisualArrangementComponent must be removed from
pixolution lib 4.0.1 integration
This release integrates the new pixolution lib version 4.0.1 providing a completly new image descriptor and enabling a bunch of new features in pixolution flow.
Highlight is our new machine learning based image descriptor, which can be calculated on CPU. It improves image search quality. Image descriptors created with former pixolution flow versions are not compatible to this version anymore.
Easier field configuration
Fieldnames that are used for storing descriptor parts are now predefined.
This enables you to just add the required fields with its corresponding names and no additional mandatory linking to these fields in the
solrconfig.xml is required anymore.
pixolution flow will find all required fields automatically.
However, you can set user defined prefix for all pixolution flow related fields in order to keep them logically in a group or to solve problems with already assigned fieldnames.
pixolution.fieldname.prefix in the reference guide.
New wording: feature is now descriptor
Starting with this release the wording feature (describing a pixolution representation of an image) is now replaced by descriptor. Doing so we try to avoid confusion about feature as functionality and at the same time using a more descriptive name.
New param name rank.by=descriptor
To be consistent with the new naming scheme we changed the param name