Keep the index of Solr in sync with your search system, so that all image IDs exist in both search systems and address equal images (ID 1 in your system references the same image in the index of Solr). Therefore, when adding or deleting images in your search system also add or delete those images in the Solr index.
No distributed support
The Universal Connector does not support a distributed Solr environment and works only with a single sharded index.
The Universal Connector consists of the following parts:
UniversalConnectorHandleris the endpoint and parses the provided image IDs from your existing search system
UniversalConnectorComponentconverts the given image ID list to an internal Solr filter, causing Solr only to return images that are part of the given list.
- Optional Cache for fast internal docid lookup. There is a
DocIdCacheWarmerfor auto warming the cache at Solr start up and a
DocIdCacheRegeneratorfor regenerating the cache when commits to the index are made.
UniversalConnectorHandler supports content-streams in queries, parses the IDs from the request POST body as stream and creates an internal data structure used by
The example below shows the configuration of the
UniversalConnectorHandler as request handler.
UniversalConnectorComponent is referenced by its name to enable the processing of the parsed IDs.
Configure Universal Connector endpoint with the name
<requestHandler name="/subset" class="de.pixolution.solr.handler.component.UniversalConnectorHandler"> <arr name="first-components"> <str>pixolutionComponent</str> <str>universalConnectorComponent</str> </arr> </requestHandler>
UniversalConnectorComponent creates a filter using the the parsed image IDs to restrict the response to only return images with IDs that are part of the given ID set.
The next example shows a configuration of the
UniversalConnectorComponent within the query section of
UniversalConnectorComponent with an associated cache:
<searchComponent name="universalConnectorComponent" class="de.pixolution.solr.handler.component.UniversalConnectorComponent"> <str name="cache.name">universalConnectorCache</str> </searchComponent>
Optional parameter. Name of the referenced cache that should be used to cache and lookup image ID → docID mappings. See the next section for how to configure a cache. It is highly recommended to use a cache.
The configured cache is used by
The cache is used to lookup docIds that are used internally in Solr, instead of the image IDs that were send in a request.
Therefore the cache will be filled with mappings of image IDs to internal docIds.
Since the index lookup of docIds is fairly slow, the cache is vital to speed up this process.
Cache configuration with
DocIdCacheRegenerator as cache regenerator:
<cache name="universalConnectorCache" class="solr.LRUCache" size="100000" initialSize="100000" autowarmCount="100%" regenerator="de.pixolution.solr.search.DocIdCacheRegenerator"/>
All cache parameters are Solr specific. Anyhow, we explain the params with regard to the Universal Connector usage.
Name of the cache, that is used by
DocIdCacheWarmer to identify the correct cache.
The actual type of cache. All Solr cache implementations may be used as cache.
Maximum number of cached image ID → docID mappings. The higher this value, the better the performance will be at the expense of RAM consumption. For best performance it is recommended to cache the complete docIds of your collection. If you have 100 000 images in your collection you may set the size to 100 000 or even higher if your collection will grow over time.
Boost search performance
We recommend to cache all image ID → docID mappings. This consumes a lot of RAM but speeds up search performance tremendously. Looking up internal docIDs consumes up to 95% of the complete query processing. You can speed up performance by a factor of 20 when caching all image ID → docID mappings! To do so set the size param to the number of documents in your index or even higher, if you add documents to the index in the future.
The consumed RAM of the cache depends on the number of cached elements and the number of concurrently opened searchers (see Performance hints when using Universal Connector). As an indication you should reserve the following RAM: 1 million elements = 500MB, 5 million elements = 2,5GB and 10 million elements = 5GB etc. When using the JRE
-Xmxparam you should add additional space according to the remaining configuration of
The initial cache size. The cache can grow to a maximum of size elements.
This value does not mean, that the cache will initially be filled.
If you fill the complete cache when auto warming, set
initialSize to the same value as size.
If an index change is made visible (
optimize) the old cache is invalidated and a new one will be created.
autowarmCount you can define how many items of the old cache should be regenerated into the new cache.
If set to 100% all elements of the old cache will be regenerated.
This parameter supports specific numbers as well as percentage.
The higher this value the longer the regeneration will take and therefore the visibility of index changes delay.
Changes to the index will only be visible after regeneration has finished. While regenerating there are shortly two caches which consume twice as much RAM: the old cache that still serves requests and the new cache, that get currently filled.
Keep in mind, that every commit or optimize will cause the invalidation of the old cache and the regeneration of a new one. See Performance hints when using Universal Connector for a suitable commit strategy.
DocIdCacheRegenerator must be used in order to fill the new cache with the correct data expected by
DocIdCacheRegenerator looks up the new internal docIds that might have changed due to an index change and renews the mapping of image ID → docID elements up to
DocIdCacheRegenerator will be called internally by the cache implementation.
To auto warm the cache when Solr starts, you can configure a
DocIdCacheWarmer that will fill the associated cache with current image ID → docID mappings without user interactions.
In the example below the
DocIdCacheWarmer is configured as event listener that will be triggered when the event
firstSearcher is fired.
This event will be fired once at start up time of Solr.
DocIdCacheWarmer as event listener for the
<listener event="firstSearcher" class="de.pixolution.solr.schema.DocIdCacheWarmer"> <str name="cache.name">universalConnectorCache</str> <int name="cache.size">100000</int> </listener>
DocIdCacheRegenerator warms caches after index changes,
DocIdCacheWarmer warms a cache initially at startup.
Avoid double warming
If the event type is set to
DocIdCacheWarmerwill warm the cache after every index change. If you have also configured a
DocIdCacheRegeneratorthe cache will then be warmed/regenerated twice causing more RAM consumption and longer warming time. Avoid this behaviour by using the
DocIdCacheWarmeronly as a startup cache warmer with the event
Mandatory parameter. The name of the associated cache that should be warmed. This must the cache name configured above.
Set how many elements should be warmed.
If not set, the complete index will be iterated and all IDs will be cached.
If the cache is not big enough, the cache will overwrite already cached elements.
Therefore this value should not be greater than the
size of the cache.
It is recommended to fill the complete cache. Although this will slow down Solr startup, even first queries can benefit from cache lookup.