Showing posts with label Sitecore Search. Show all posts
Showing posts with label Sitecore Search. Show all posts

Wednesday, October 16, 2024

Indexing Shared Data Items from Search Result Pages in Sitecore

We ran into an issue while trying to index shared data items—like individual cards from a listing page—as separate items in Sitecore Search. Despite the extractor logic appearing correct, some of these items simply weren’t getting indexed.

Scenario

  • The search result page lists shared data items (not individual pages, so they don’t have distinct URLs).
  • The goal is to index each card on the page as an independent item in Sitecore Search with additional tags and logic for extra data values.
  • A flag was added to bypass the "Load More" functionality (?showAll=true) to make all cards visible on a single page for indexing.

Extractor Code

Here’s the extractor being used:


function extract(request, response) { const $ = response.body; const results = []; if (request.url.includes('image-gallery')) { $('div.row.layout').each((i, row) => { $(row).find('div.col-12.column.col-lg-4[gallery-layout="true"]').each((j, col) => { const dataAnchorTitle = $(col).attr('data-anchor-title'); const imgSrc = $(col).find('div.gallery figure img').attr('src'); if (dataAnchorTitle && imgSrc) { results.push({ title: dataAnchorTitle, image: imgSrc, type: 'gallery' }); } }); }); } return results; }

Issues Observed

  1. Only One Item Indexed: Despite having multiple items on the page, only one item is being indexed.
  2. Mandatory Fields: Missing mandatory fields in the index configuration could cause items to fail indexing.
  3. Order of Extractors: The sequence of extractors (JavaScript extractor vs. XPath extractor) might be causing conflicts.
  4. ID Generation: The ID for items was not being generated consistently, leading to indexing failures.

Resolution Steps

  1. Validate Extractor Logic:

    • Ensure the extractor captures all items on the page.
    • Hardcode mandatory fields temporarily to check if items get indexed correctly.
  2. Adjust Extractor Sequence:

    • Reorder the JavaScript Document Extractor to run before other extractors (e.g., XPath extractor) or remove unnecessary extractors.
  3. Generate IDs Programmatically:

    • Add logic to generate unique IDs for each item in the JavaScript extractor:

      const id = `${dataAnchorTitle}-${i}-${j}`; results.push({ id, title: dataAnchorTitle, image: imgSrc, type: 'gallery' });
  4. Re-Index and Validate:

    • Re-index with only the relevant extractor and verify results.
  5. Address Errors:

    • If errors like "heartbeat error" occur, retry indexing after resolving underlying connectivity or configuration issues.

Outcome

After implementing these changes:

  • All cards on the page were indexed as independent items.
  • Proper field mappings and unique ID generation ensured data integrity.
  • The reordering of extractors resolved conflicts during the indexing process.

These steps should help streamline indexing shared data items from listing pages in Sitecore Search.