Solr Dimension Token Filter
After reading this article you will have the context around the dimensions logic in the eCommerce domain will know the value they provide for search and also get enough technical details to modify, apply, or remove the filter from a solr instance.
What is a Dimension?
Some products have their size as a part of their name in the format of width x length x-height.Let’s say one product has (66 x18 mm 5.4m) is known as product dimensions. Product dimensions can have 2 or three parts and the x is optional to put.
On the search page, customers might search these products with the dimensions by providing other measuring units, or providing a different dimension parts such as:
- 6.6x1.8x540cm
- 0.066x5.4m
- 0.018m
This can be achieved by implementing a custom token filter that we will explain in the next sections.
A brief overview of Solr analyzers, tokenizers, and token filters
Analyzer
An Analyzer examines the text of fields and generates a token stream.
They are used both during ingestion, when the document is indexed and at query time.
Analyzers may be a single class or them maybe composed of a series of tokenizer and filter classes.
Tokenizer
Tokenizer break field data into lexical units, or tokens.
for example, a tokenizer breaks the "Bosch 650W Impact Drill" to the following tokens:
"Bosch", "650W", "Impact", "Drill"
Token Filter
Filter examine a stream of tokens and keep them, transform or discard them, or create new ones.
Tokenizers and filters may be combined to form pipelines, or chains, where the output of one is input to the next.
Such a sequence of tokenizers and filters is called an analyzer and the resulting output of an analyzer is used to match query results or build indices.
For example, the lower case token filter converts the "Bosch" to "bosch"
Analyzers can be applied on fields and are defined in the schema.xml file in the config set:
An example of an analyzer in the project:
Implementation of dimension filter
All of the tokenizers or filters above have been implemented in java and their source code is available in Apache GitHub repository (apache/lucene-solr).
For instance SynonymGraphFilter code.
For Dimension filter our implementation is available in the solr config set repository.
It is tried to decouple solr from the dimension the logic so that even without any solr knowledge, the logic of dimension generation can be modified.
The DimensionGenerator class in the code accumulates dimension parts into one size class for instance "1cm x 2cm x 3cm" and then generates all of the equivalent sizes indifferent units
For example, if the field value contains "1cm x 2mm" this class will generate
the following 36 tokens to be available for searching:
the following 36 tokens to be available for searching:
For 3-Dimensional sizes, this class creates tokens having all of the permutations in a similar manner having the same logic.
i.e., 10cm x 25mm x 1.5m => 234 tokens including 10x2.5x150cm an so on.
After the implementation, the code is built and a jar file as output is generated which is stored in the repository besides the code.
How to deploy the dimension filter into a Solr instance
Deploying the dimension filter comprises three main steps:
1- Deploying the jar file
The process of putting the jar file in the solr instances is manual at the time of creating this document (October 2018). It's planned to use Solr Cloud facilities for this purpose. (Adding Custom Plugins in SolrCloud Mode)
For now, the process is to take the jar file from XXX-solr-configset repository and copy it to each solr node. It's a one time task per solr installation, or after having a new version of the file available.
Source
\XXX-solr-config-sets\solr-analysis\jar\solr-analysis-6.6.2
Target
\solr\solr-6.6.0\dist
2- Loading the jar file
This step is done via configuration in the solrconfig.xml file in the solr-configset repository. This task is also a one time task and it is already done in the repository.
3- Adding the custom filter to the analyzer
Now that we have the jar file copied and loaded, the filters inside that can be used in any analyzer. The dimension filter is used in the text_with_keyword_analyzer.
How to disable the dimension filter
If the jar file is not copied to servers or for any reason the dimension filter should be disabled, commenting out the filter would all that needs to be done.
Of course, a deployment of Solr config set is required for the change to take effect.
Modifying the Dimension filter
The java project is created and added to the config set repository.
In order to build the project take the following steps:
In order to build the project take the following steps:
1- IDE installation
Install the IntelliJ Idea which is an IDE for java projects. During the installation, it will guide you through JDK installation.
2- Open the project
Using the IDE open the solr-analysis folder under XXX-solr-configset repository.
It should be immediately ready for run as a test console app is added to the solution to execute the dimension filter and show the results on the screen.
3- Execute
Hit Ctrl+F9 to build the project when you opened the project in your IDE.
Then, there is a java console app in the project just for executing and testing the code that you can find in the src/Program/Main class.
Then, there is a java console app in the project just for executing and testing the code that you can find in the src/Program/Main class.
Select the file and choose the Run in the menu or press Shift+F10 to execute the console app and you shall see the output of the application:
No comments:
Post a Comment