Release 1.6.1

Optimizations

URL parsing more robust and uniform

When adding documents or when rank.by URL queries are done, the processing of those URLs is improved. File and folder names of valid URLs may consist of reserved, unreserved and other characters (for further information read Percent-encoding and RCF3986).

Older versions may fail if file or folder names of an URL contains reserved or other chars (only unreserved chars were safe). We now implemented a much more robust and uniform parsing module that can cope with various types of nasty URLs. For example a URL can look like this:

http://localhost:1234/a folder/=ä@>d0?! (pk.

as long as the chars are properly percent-encoded (Note that the / is not encoded, since it indicates a folder hierarchy and has therefore special meaning):

http://localhost:1234/a%20folder/%3D%C3%A4%40%3Ed0%3F%21%20%28pk.

When percent-encoding characters of URLs, use UTF-8 since pixolution flow decodes the URL with UTF-8 internally. Please refer to the API documentation for further information on URL handling.

Exceptions more meaningful

If in the PixolutionParser or ImageUpdateFactory configuration fields are referenced that don’t exist, pixolution flow now returns more meaningful error messages.# Earlier versions answered feature.fieldname configuration errors of PixolutionParser or ImageUpdateFactory with misleading messages like

Error adding field 'feature'='http://example.org/img/13495ebb6906a02336a4cx.jpg' msg=String length must be a multiple of four.

The new version answers with a proper error message like

Field with name "feature " does not exists. Please check if the property "processor.feature.fieldname" of "ImageUpdateFactory" in your updateRequestProcessorChain is properly set in your solrconfig.xml.

We also improved the error messages when images could not be loaded and do now returns the HTTP status code. Additionally exceptions are more meaningful when processing invalid URLs.

Other Changes

Changed URL handling

URLs given in rank.by queries or when indexing new documents must contain a valid protocol (HTTP or FILE). URLs referencing a resource via file must be absolute paths.

Note that there is a potential security risk due to information exposure through an error message. A user may test whether local files exist or not via rank.by queries with FILE as protocol by analyzing the returned error message. This may either reveal sensitive information which may be used for a later attack or private information stored in the server. Make sure Solr has restricted access to the filesystem or is not queryable with user defined requests via Internet.