Skip to content

Installation on Linux

Installation on Linux is only possible for users with a commercial Flow license.

If you want to go pro check out our license plans or just use our free docker version.

It offers more freedom than the pre-configured Docker image, but set up is a little more complex.

Requirements

Flow requires third party software to run. Please note that the licenses are different to the Flow license. To use Flow you need to accept the licenses of each required third party software.

List of dependencies

Flow 5.0.2 depends on the following 3rd party software.

Software Version Artifacts License Repo
Solr 9.4.1 solr-core Apache-2.0
Tensorflow 2.10.1 tensorflow-core-api, tensorflow-core-platform Apache-2.0
Fork of DJL 0.22-ndarray-1 api, tensorflow-engine Apache-2.0
Protobuf 3.19.4 protobuf-java BSD-3-Clause
JavaCPP 1.5.8 javacpp Apache-2.0
NDArray 0.4.0 ndarray Apache-2.0
GSON 2.8.9 gson Apache-2.0

Architecture

Flow uses highly optimized native code for some operations which require x86_64 CPUs (also named amd64). Other architectures like ARM, PowerPC etc. are not supported.

Linux

Use a modern and stable GNU/Linux Server operation system like Debian 11 or Ubuntu Server 22.04 LTS.

Java

You need the Java Runtime Environment (JRE) version 11 or higher. Check your Java version like this (your output may differ):

java -version
openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode, sharing)

Due to a bug in Java affecting Apache Lucene that can cause JVM crashes it is recommended to use Java versions 11.0.19, 17.0.7 or 19.0.2 (or later).

Machine learning framework

Flow uses the Deep Java Library (DJL) in conjunction with Tensorflow as engine for model inference and efficient matrix operations.

Apache Solr

Flow not only provides an advanced search API, but also extends the inner search process of Apache Solr, including data prefetching and scoring. This leads to deep integration and the use of many internal Solr APIs that may change from one version to another. For this reason, the provided Flow binaries are tested and compiled against a specific Solr version and can only be used with that version.

We provide Flow binaries for the following Solr versions:

Version Solr 6.6 Solr 7.7 Solr 8.11 Solr 9.0 Solr 9.2 Solr 9.3 Solr 9.4
Flow 3.4.X
Flow 4.0.0 - 4.0.2
Flow 4.0.4 - 4.0.5
Flow 5.0.X

You can check which Solr version (SOLR_VERSION) the Flow binaries (FLOW_VERSION) are compatible with in two places.

  1. The Flow JAR filename pixolution-flow-[FLOW_VERSION]-solr-[SOLR_VERSION].jar or
  2. the META-INF/MANIFEST.MF file in this JAR file:
    Solr-Version: [SOLR_VERSION]
    Specification-Version: [FLOW_VERSION]
    

We only declare the SOLR_VERSION as major and minor version (e.g. 9.4). You can safely upgrade your Solr instance as long as only the subminor version changes (9.4.X).

Installation

We assume you already got the Flow package compatible to a specific Solr version. Now, unpack the zip archive:

unzip pixolution-flow-5.0.2-solr-9.4.zip
Copy & paste the installation steps described in the QUICKSTART.txt to get up and running:

cat QUICKSTART.txt

Java memory limits

Since we load AI models into memory at startup, we need to increase the available memory that Solr can allocate. The limits are set in the configuration file [SOLR_INSTALL_FOLDER]/bin/solr.in.sh and default to 512MB if not set. We recommend setting the limit to at least 2GB or higher.

  1. Open [SOLR_INSTALL_FOLDER]/bin/solr.in.sh
  2. Append the following rule:
    SOLR_JAVA_MEM="-Xmx2g"
    

Instead of manually adding the permission rules you can also run the following bash commands.

  1. Your working directory must contain the solr-9.4.1 folder.
  2. Execute the following command:
    sed -i 's|.*SOLR_JAVA_MEM.*|SOLR_JAVA_MEM="-Xmx2g"|g' solr-9.4.1/bin/solr.in.sh
    

Java security policy

Starting with Solr 9 the Java security policies are active by default. Flow requires the permission to load libraries at runtime. Therefore you have to add permission rules to the security.policy file shipped with Solr:

  1. Open [SOLR_INSTALL_FOLDER]/server/etc/security.policy
  2. add the following rules and replace [USERNAME] with the user that is running the Solr process.
    grant { permission java.lang.RuntimePermission "loadLibrary.*"; };
    grant { permission java.lang.RuntimePermission "createSecurityManager"; };
    grant { permission java.io.FilePermission "/tmp/.javacpp-[USERNAME]/cache", "read,execute,write"; };
    grant { permission java.io.FilePermission "/tmp/.javacpp-[USERNAME]/cache/-", "read,execute,write"; };
    grant { permission java.io.FilePermission "/home/[USERNAME]/.javacpp/cache", "read,write,execute"; };
    grant { permission java.io.FilePermission "/home/[USERNAME]/.javacpp/cache/-", "read,write,execute"; };
    

Instead of manually adding the permission rules you can also run the following bash commands.

  1. Your working directory must contain the solr-9.4.1 folder.
  2. Execute the following commands:
    echo 'grant { permission java.lang.RuntimePermission "loadLibrary.*"; };' >> solr-9.4.1/server/etc/security.policy
    echo 'grant { permission java.lang.RuntimePermission "createSecurityManager"; };' >> solr-9.4.1/server/etc/security.policy
    echo "grant { permission java.io.FilePermission \"/tmp/.javacpp-${USER}/cache\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy
    echo "grant { permission java.io.FilePermission \"/tmp/.javacpp-${USER}/cache/-\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy
    echo "grant { permission java.io.FilePermission \"/home/${USER}/.javacpp/cache\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy
    echo "grant { permission java.io.FilePermission \"/home/${USER}/.javacpp/cache/-\", \"read,execute,write\"; };" >> solr-9.4.1/server/etc/security.policy
    

Configuration

Flow is configured using the solrconfig.xml and the schema.xml or managed-schema file.

solrconfig.xml

The solrconfig.xml contains all operational configuration like request handlers, search components, update processors and such.

Components

<!-- Pixolution components -->
<queryParser name="pxlParser" class="de.pixolution.solr.search.PixolutionParserPlugin" />
<searchComponent name="pxlFlow" class="de.pixolution.solr.handler.component.PixolutionComponent" />
<queryResponseWriter name="html" default="false" class="de.pixolution.solr.response.HtmlResponseWriter" />
<!-- Required by tagging handler -->
<searchComponent name="taggingComponentPre" class="de.pixolution.solr.handler.component.TaggingComponentPre"/>
<searchComponent name="taggingComponentPost" class="de.pixolution.solr.handler.component.TaggingComponentPost"/>

Special request handler

<requestHandler name="/pixolution" class="de.pixolution.solr.handler.component.PixolutionHandler">
  <!-- special defaults: only Flow global params can be set here -->
  <lst name="defaults">
    <str name="fieldname.prefix"></str>
    <str name="fieldname.default.image">image</str>
    <str name="fieldname.default.text">labels</str>
    <str name="parser.name">pxlParser</str>
  </lst>
</requestHandler>

<requestHandler name="/analyze" class="de.pixolution.solr.handler.component.AnalyzeHandler">
  <lst name="defaults">
    <str name="echoParams">none</str>
  </lst>
</requestHandler>

<requestHandler name="/tag" class="de.pixolution.solr.handler.component.TaggingHandler">
  <lst name="defaults">
    <str name="echoParams">none</str>
    <str name="q">*:*</str>
    <str name="rank.threshold">0.6</str>
    <str name="tagging.field">labels</str>
    <!-- Optimized for suggesting a single tag (like category) -->
    <str name="tagging.max">1</str>
    <str name="tagging.inspect">3</str>
    <str name="tagging.field">labels</str>
    <str name="tagging.inspect.minterms">1</str>
  </lst>
</requestHandler>

All global Flow parameters can only be set in solrconfig.xml within the defaults list of the /pixolution request handler.

fieldname.default.image

This parameter defines the field from which to obtain the image URLs when indexing documents or return HTML responses. The default is image.

<str name="fieldname.default.image">image</str>
fieldname.prefix

This parameter adds the given prefix to all fields, fieldtypes and pseudo-fields which are created by Flow. The default is without prefix.

Changing fieldname.prefix requires reindexing.

<str name="fieldname.prefix"></str>

Using a prefix avoids fieldname collisions with existing fields and allows easier identification which fields belong to Flow. Setting a prefix fieldname.prefix=pxl_ would change the field color_names to pxl_color_names.

Setting field.prefix also changes import fieldname

If field.prefix=example_ then the pseudo-field would be example_import.

analysis.threads

This parameter defines the threads available to analyze images when using the /analyze and /update endpoint. The default is automatically set to the number of available CPUs.

<int name="analysis.threads">[number of available CPUs]</int>
downloader.fileLimitkBytes

This parameter limits the maximum file size allowed when downloading images. The default is set to 20480 (20MB).

<int name="downloader.fileLimitkBytes">20480</int>

To safeguard against misuse you may limit the allowed file size of the image to download. If the file is bigger than allowed, an exception is thrown. If you do not want to limit the file size set this value to zero or a negative number.

downloader.userAgent

This parameter sets the given user agent in the HTTP request when downloading images. This can be useful when an image server allows access based on the user agent. The default is Flowbot.

<str name="downloader.userAgent">Flowbot</str>

Update processor chain

<!-- Pixolution update processor configuration default="true" is mandatory.-->
<updateRequestProcessorChain name="imageloader" default="true">      
  <processor class="solr.DistributedUpdateProcessorFactory" />
  <!-- Placed after distributed to distribute image analysis load across the cloud, if any. -->
  <processor class="de.pixolution.solr.update.processor.PixolutionUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
  <processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- Pixolution update processor configuration default="true" is mandatory.
    Enabled field type guessing for unknown fields.-->
<updateRequestProcessorChain name="imageloader" default="true">
  <processor class="solr.RemoveBlankFieldUpdateProcessorFactory" />
  <processor class="solr.ParseDateFieldUpdateProcessorFactory">
    <arr name="format">
      <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
      <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
      <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
      <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
      <str>[EEE, ]dd MMM yyyy HH:mm[:ss] z</str>
      <str>EEEE, dd-MMM-yy HH:mm:ss z</str>
      <str>EEE MMM ppd HH:mm:ss [z ]yyyy</str>
    </arr>
  </processor>
  <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
    <lst name="typeMapping">
      <str name="valueClass">java.lang.String</str>
      <str name="fieldType">string</str>
      <lst name="copyField">
        <!-- copy all incoming string field in the default search field-->
        <str name="dest">text</str>
      </lst>
      <!-- Use as default mapping instead of defaultFieldType -->
      <bool name="default">true</bool>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Boolean</str>
      <str name="fieldType">boolean</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.util.Date</str>
      <str name="fieldType">date</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Long</str>
      <str name="valueClass">java.lang.Integer</str>
      <str name="fieldType">long</str>
    </lst>
    <lst name="typeMapping">
      <str name="valueClass">java.lang.Number</str>
      <str name="fieldType">double</str>
    </lst>
  </processor>
  <processor class="solr.DistributedUpdateProcessorFactory" />
  <!-- Placed after distributed to distribute image analysis load across the cloud, if any. -->
  <processor class="de.pixolution.solr.update.processor.PixolutionUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
  <processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/select" class="solr.SearchHandler" default="true" initParams="flow">
  <arr name="first-components">
    <str>pxlFlow</str>
  </arr>
</requestHandler>

<requestHandler name="/duplicate" class="solr.SearchHandler" initParams="flow">
  <lst name="defaults">
    <str name="rank.mode">duplicate</str>
    <str name="rank.threshold">0.7</str>
  </lst>
  <arr name="first-components">
    <str>pxlFlow</str>
  </arr>
</requestHandler>

<requestHandler name="/image" class="solr.SearchHandler" initParams="flow">
  <lst name="defaults">
    <str name="rank.mode">content</str>
  </lst>
  <arr name="first-components">
    <str>pxlFlow</str>
  </arr>
</requestHandler>

<requestHandler name="/color" class="solr.SearchHandler" initParams="flow">
  <lst name="defaults">
    <str name="rank.mode">color</str>
  </lst>
  <arr name="first-components">
    <str>pxlFlow</str>
  </arr>
</requestHandler>

<initParams name="flow">
  <lst name="defaults">
    <str name="echoParams">none</str>
    <int name="rows">10</int>
    <str name="q">*:*</str>
    <str name="fl">id,image,score,color_*</str>
    <str name="rank.threshold">0.5</str>
    <!-- automatically filled with created string fields via copy directive in update chain -->
    <str name="df">text</str>
    <str name="facet.mincount">1</str>
  </lst>
</initParams>

Schema

The schema.xml or managed-schema file represents the storage structure by defining known fields and their fieldtypes.

<?xml version="1.0" ?>
<schema name="pixolution-example-schema" version="1.6">

    <uniqueKey>id</uniqueKey>

    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
    <field name="image" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="labels" type="string" indexed="true" stored="true" multiValued="true" />
    <!-- default search field with all text copied into, filled by copyField directive in field adding updatechain -->
    <field name="text" type="general_text" indexed="true" stored="false" multiValued="true" />
    <field name="location" type="location" indexed="true" stored="true" multiValued="false" />
    <field name="_version_" type="long" indexed="false" stored="false" docValues="true"/>
    <dynamicField name="random_*" type="random" />


    <fieldType name="random" class="solr.RandomSortField" indexed="true" />
    <fieldType name="string" class="solr.StrField" indexed="true" stored="true" />
    <fieldType name="long" class="solr.LongPointField" indexed="true" stored="true" />
    <fieldType name="double" class="solr.DoublePointField" indexed="true" stored="true" />
    <fieldType name="boolean" class="solr.BoolField" indexed="true" stored="true" />
    <fieldType name="date" class="solr.DatePointField" indexed="true" stored="true" />
    <fieldType name="location" class="solr.LatLonPointSpatialField" indexed="true" stored="true" />
    <fieldType name="general_text" class="solr.TextField">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

    <!-- preconfigured copy fields -->
    <copyField source="color_names" dest="text" />
    <copyField source="labels" dest="text" />


    <!-- Flow modules preconfigured fields to remove the need for calling pixolution handler on startup -->
    <fieldtype name="bin" class="de.pixolution.solr.schema.DescriptorFieldType" />
    <fieldtype name="int" class="org.apache.solr.schema.IntPointField" />
    <fieldtype name="float" class="org.apache.solr.schema.FloatPointField" />

    <field name="color_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="color_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
    <field name="color_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="color_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="color_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
    <field name="color_names" type="string" stored="true" docValues="false" multiValued="true" indexed="true" />
    <field name="color_isolated" type="boolean" stored="true" docValues="false" multiValued="false" indexed="true" />
    <field name="color_palette_hex" type="string" stored="true" docValues="false" multiValued="true" indexed="false" />
    <field name="color_palette_freq" type="float" stored="true" docValues="false" multiValued="true" indexed="false" />
    <field name="copyspace" type="int" stored="true" docValues="false" multiValued="true" indexed="true" />
    <field name="duplicate_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="duplicate_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
    <field name="duplicate_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="duplicate_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="duplicate_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
    <field name="content_descriptor" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="content_cluster" type="int" stored="false" docValues="false" multiValued="false" indexed="true" />
    <field name="content_nclusters" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="content_lopq" type="bin" docValues="true" multiValued="false" indexed="false" />
    <field name="content_rerank" type="long" docValues="true" multiValued="false" indexed="false" />
</schema>

Enable image loading in HTML responses

To display images from search results directly in your browser when requesting HTML responses via wt=html, you must relax the Jetty server's content security policy (CSP) to allow your browser to load images from arbitrary domains.

In the jetty.xml config file change the Content-Security-Policy for image resources to img-src * data:.

sed -i "s/img-src 'self'/img-src */" /opt/solr/server/etc/jetty.xml

Next Steps

The query examples and explanations in this documentation refer to the above configuration. If your configuration is different, this may also affect the examples documented here (e.g. field names, available request handlers, etc.).