Saturday, November 16, 2019

Solr - Basic algorithm for TFIDF, LTR and common functions.

It's really very interesting to understand how Solr by default is giving you a result in a particular order.



Let's say if you search for a keyword BbQ ( B - Capital Letter, b -Small Letter and Q as capital letter), How you are getting the result and why you are getting a few result on top and what all are options available to change the order of the results.

So if you are queries to understand the whole flow, THIS BLOG IS FOR YOU :) 


First,We should understand the Solr query flow.



Here is a high-level view of existing Solr Algorithm, mainly it uses term frequency and inverse document frequency as a based and BM25 as base.




Solr by default use Lucene as a core and the default ranking model is known as tf.idf model.


First, Let's understand what is this model in general.


 tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection

 or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf–idf value increases proportionally to the number of times
 a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently 
in general. tf–idf is one of the most popular term-weighting schemes today; 83% of text-based recommender systems in digital libraries use tf–idf.

Term FrequencyThe weight of a term that occurs in a document is simply proportional to the term frequency.


Inverse Document Frequency - The specificity of a term can be quantified as an inverse function of the number of documents in which it occurs.



Here is a list of all Solr available functions -


Few useful functions are  - 



  1. docfreq(field,term) returns the number of documents that contain the term in the field.
  2. termfreq(field,term) returns the number of times the term appears in the field for that document.
  3. idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  4. tf(field,term) returns the term frequency factor for the given term, using the Similarity for the field.
  5. norm(field) returns the “norm” stored in the index, the product of the index time boost and then length normalization factor.
  6. maxdoc() returns the number of documents in the index, including those that are marked as deleted but have not yet been purged.
  7. numdocs() returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged.


For more reference - 



  1. https://lucene.apache.org/solr/guide/7_7/function-queries.html
  2. https://lucidworks.com/post/solr-relevancy-function-queries/


Ranking of query results is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the "best" results appear early in the result list displayed to the user.

Query Re-Ranking - Query Re-Ranking allows you to run a simple query (A) for matching documents and then re-rank the top N documents using the scores from a more complex query (B).

Here is the high level flow diagram.


LIBSVM and LIBLINEAR are two popular open source machine learning libraries - We can start with a simple development for LIBLINEAR

Before going further , basically we need to define the proper model, feature. common feature to implement the ML.
An example below:-

Steps to define the Features, stores and Models
After plugin the above libraries - We can define model and use in the search queries like this
http://localhost:8983/solr/collectionname/query?q=test&rq={!ltr model= *currentModel* *reRankDocs* =100}&fl=id,score,[*features* store= *nextFeatureStore*]
`model= *currentModel*`
`store= *nextFeatureStore*`
`*features*`
Sample store/ Common feature -
{"store": "commonFeatureStore","name": "documentRecency","class": "org.apache.solr.ltr.feature.SolrFeature","params": {"q": "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"}}

Sample Model
{"store": "commonFeatureStore","name": "ModelA","class": "org.apache.solr.ltr.model.LinearModel","features": [{"name": "FeatureA"},{"name": "FeatureB"},{"name": "FeatureC"}],"params": {"weights": {"documentRecency": 1,"isBook": 0.1,"originalScore": 0.5}}}

For more details -
How are documents scored
By default, a "TF-IDF" based Scoring Model is used. The basic scoring factors:
  • tf stands for term frequency - the more times a search term appears in a document, the higher the score
  • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
  • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
  • lengthNorm - matches on a smaller field score higher than matches on a larger field
  • query clause boost - a user may explicitly boost the contribution of one part of a query over another.
For details can be found here - Solr Wiki for Ranking

A simple example -

TF- Term Frequency-




TF(w)=(Number of times  word w appears in a document/ total number of words in the document)

IDF- Inverse document frequency-


DF(w)= log (total number of documents/ Number of documents with word w)

TF-IDF is the multiplication of Term frequency and inverse document frequency.

sentence 1– earth is the third planet from the sun
sentence 2– earth is the largest planet




TF IDF is zero for stop word and it's being configured here -




There is a open source library available to implement the TFIDF - https://code.google.com/archive/p/tfidf/


There is a drawback in this algorithm, As discussed here,  Basically  when we have more documents it's recommended to split those documents in multiple shards, there are a few example here like when you will decide to created multiple shards - 

Particular to this example , Let's say if you search for the keyword - unique jacket

These two terms may have different TFIDF and may effect the final outcome in case of huge data.

Let me know if you have any questions :)

Reference - 


don't forget to check this  blog for a quick python example for TFIDF.

I hope you have enjoyed these details, Please let me know if you have any questions.


Sitecore XC 9.2 - Postman setup and API walk through.


Sitecore commerce SDK has provided an API collection which we can easily import in the postman and see all request and response details, Here are the few steps to configure and a high-level view.

References -


  1.  https://doc.sitecore.com/developers/90/sitecore-experience-commerce/en/execute-sample-api-calls-in-postman.html
  2. https://doc.sitecore.com/developers/90/sitecore-experience-commerce/en/list-of-sitecore-xc-postman-collections.html


Step 1-  Check the Sitecore installation folder


Step 2 - Setup postman - You can download it from here

Turn off the SSL setting for the local environment.





this is just for the test/local environment.



There are two folders for the import.


Step 3 -  Import environment settings.





Step 4 -  AntiForgeryEnabled settings - by default it's true


This setting is just for the local environment to check the API call, We need to keep this on in the live environment.

Step 5 - API walkthrough and flow.

Please make sure to update the environment variable based on your local environment settings, Like username and password and local authoring URL etc.


Generate token -



Step 6 - Place an order through the API -





Conclusion - Sitecore has provided out of the box well-structured API setup, We can use this API to call directly to perform the operation, debugging purpose and can wrap these APIs and expose them to the front end for the development.

Sunday, November 10, 2019

Sitecore XC 9.2 - Merchandising products and how they are mapped with CatalogItemsScope Solr Index.

Sitecore XC manages all Merchandising product data in Solr CatalogItemsScope Index.

Here are the details.


Solr Index -


If you see all the fields in Solr Index like variantid,variantdisplayname,displayname, name are coming from the CatalogItemsScope  Index.

Sitecore XC 9.2 - Add a custom view in order detail page.

Sitecore XC has provided and extended feature where the author can define a custom view template and can define a few fields to provide more details about the product.

Here are the simple steps

Step 1 - Add a new view section


Step 2 - Define some new fields for the view.



Step 3 - Add the comments and instruction and it will be attached to the order,  you can also define this change as a template.


It's very easy to add a new view and useful during order processing.

Sitecore XC 9.2 - Place a new order and data flow.


Step 1 - Search for a product.




Step 2 - Select a product.



Step 3 - Add to cart and view cart.



Step 4 - Add to cart and view cart.


Step 5 - Billing Information.



Step 6 - Validate Card. You can use sample value from the Braintreepayment Here, Sample value
Card Number 4217651111111119 
Expire Date 01/2022 

Payment review - 



Step 6 - Order confirmation.




Step 6 - Verify order in Order Dashboard -



The Default order is in the pending stat -



Let's check what's is inside the DB and Solr Index at this time.



Order Summary -



Line Item details- 





Saturday, November 9, 2019

Sitecore XC 9.2 - Customer data flow.

It's a very important part to understand how and were Sitecore XC stores the customer data.

Here is the high-level flow -

Step 1 -

Register a new customer in Sitecore XC.



Step 2

We can see the data in Sitecore authoring --> Business tools --> customer data section


Step 3

As per the Sitecore documentation for the customer registration data here, they have mentioned that the new registered customer data will be stored in core database.


you can run the below script to see all tables in core and web database for the customer data reference.

Declare @SearchStr nvarchar(100)

SET  @SearchStr='jitusonijk@gmail.com' BEGIN

CREATE TABLE #Results3 (ColumnName nvarchar(370), ColumnValue nvarchar(3630))

SET NOCOUNT ON

DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128),
 @SearchStr2 nvarchar(110)  SET  @TableName = ''    SET @SearchStr2 =
 QUOTENAME('%' + @SearchStr + '%','''')

WHILE @TableName IS NOT NULL    
BEGIN       
  SET @ColumnName = ''      
  SET @TableName =  (
    SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' +
    QUOTENAME(TABLE_NAME)) FROM INFORMATION_SCHEMA.TABLES 
    WHERE
    TABLE_TYPE = 'BASE TABLE'
    AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
    AND OBJECTPROPERTY(
      OBJECT_ID(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)),
        'IsMSShipped') = 0)

  WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)      
  BEGIN
    SET @ColumnName = (
      SELECT MIN(QUOTENAME(COLUMN_NAME))
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
        AND TABLE_NAME = PARSENAME(@TableName, 1)
      AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
      AND QUOTENAME(COLUMN_NAME) > @ColumnName)
      IF @ColumnName IS NOT NULL            
      BEGIN
      INSERT INTO #Results3
      EXEC
      (
        'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + 
          ', 3630) FROM ' + @TableName + ' (NOLOCK) ' +
        ' WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
      )             
      END       
    END     
  END


  SELECT ColumnName, ColumnValue FROM #Results3 END



Core database - Tables-

Table - [aspnet_Membership]

ApplicationId UserId Password PasswordFormat PasswordSalt MobilePIN Email LoweredEmail PasswordQuestion PasswordAnswer IsApproved IsLockedOut CreateDate LastLoginDate LastPasswordChangedDate LastLockoutDate FailedPasswordAttemptCount FailedPasswordAttemptWindowStart FailedPasswordAnswerAttemptCount FailedPasswordAnswerAttemptWindowStart Comment
D1A11AC5-63B0-40A7-9320-3C88981A590C 004A77C7-7751-423C-A21D-1A472015F4DE 7Ai3EHC83gQbeuVZQdnr6tOTBlk= 1 gFFL5GuWSj8B8nvZHRtu3g== NULL jitusonijk@gmail.com jitusonijk@gmail.com NULL NULL 1 0 2019-10-29 07:00:57.000 2019-10-29 07:00:58.910 2019-10-29 07:00:57.000 1754-01-01 00:00:00.000 0 1754-01-01 00:00:00.000 0 1754-01-01 00:00:00.000

Table [dbo].[aspnet_Users]

ApplicationId UserId UserName LoweredUserName MobileAlias IsAnonymous LastActivityDate
D1A11AC5-63B0-40A7-9320-3C88981A590C 004A77C7-7751-423C-A21D-1A472015F4DE Storefront\jitusonijk@gmail.com storefront\jitusonijk@gmail.com NULL 0 2019-11-10 06:51:30.047

Table [dbo].[EventQueue]  

Id EventType InstanceType InstanceData InstanceName RaiseLocally RaiseGlobally UserName Stamp Created

648A35AF-19C0-4C76-8FED-72D3EEE35CBD Sitecore.Eventing.Remote.UserUpdatedRemoteEvent, Sitecore.Kernel, Version=13.0.0.0, Culture=neutral, PublicKeyToken=null Sitecore.Eventing.Remote.UserUpdatedRemoteEvent, Sitecore.Kernel, Version=13.0.0.0, Culture=neutral, PublicKeyToken=null {"UserName":"Storefront\\jitusonijk@gmail.com"} sitecore-SC921sc.dev.local 0 1 Storefront\jitusonijk@gmail.com 0x0000000000047CA8 2019-11-10 06:51:28.600


I didn't see any reference in the Web database.

The minion servie should pick the update from the database and update these data in Solr CustomersScope Index.


For the initial registration, You wouldn't see data in  CustomersScope Index


and finally, how customer manages the orders. A newly registered customer will be login and the CD role authenticates the customer against the ASP.NET membership tables in the core database.

Sitecore flow -




 I'm excited to see data in Solr and what all are the OOTB option to customize the data and integration with third party customer data managment systme like Gigya.

Desc


Monday, November 4, 2019

Sitecore Experience Commerce 9.2 how to protect the Solr admin page





Sitecore commerce latest release 9.2 does support the authentication plugin for the Solr, It's very interesting and I'm very excited to see this option there and would thanks to Sitecore for including this option :)

Here is the full context.

If you first-time setup the Solr - Standalone or Solr cloud there wouldn't be any by default security option available, you have to explicitly turn on this feature. and to do that there are a few options -

Available Authentication Plugins within Solr

Solr has the following implementations of authentication plugins:
Available Authorization Plugins within Solr
Solr has one implementation of an authorization plugin:
Now, How we can use this option in Sitecore Commerce 9.2, I have been using and extending Solr.Net for more than 4 years now and happy to see that you can extend this within Sitecore. Basically, we can control how Sitecore communicates with Solr by specifying the implementation of the IHttpWebRequestFactory interface of SolrNet that Sitecore uses.
We can specify the implementation by patching the Sitecore.ContentSearch.DefaultSolrConfiguration.config file. 

The factory class for Solr HTTP requests is in the  node.
for example, If we want to support basic authentication for accessing Solr, We must specify the solrHttpWebRequestFactory value in the following way:


solrHttpWebRequestFactory type="HttpWebAdapters.BasicAuthHttpWebRequestFactory, SolrNet">
  USERNAME
  PASSWORD




That's easy right :)