Wednesday, October 16, 2024

Indexing Shared Data Items from Search Result Pages in Sitecore

We ran into an issue while trying to index shared data items—like individual cards from a listing page—as separate items in Sitecore Search. Despite the extractor logic appearing correct, some of these items simply weren’t getting indexed.

Scenario

  • The search result page lists shared data items (not individual pages, so they don’t have distinct URLs).
  • The goal is to index each card on the page as an independent item in Sitecore Search with additional tags and logic for extra data values.
  • A flag was added to bypass the "Load More" functionality (?showAll=true) to make all cards visible on a single page for indexing.

Extractor Code

Here’s the extractor being used:


function extract(request, response) { const $ = response.body; const results = []; if (request.url.includes('image-gallery')) { $('div.row.layout').each((i, row) => { $(row).find('div.col-12.column.col-lg-4[gallery-layout="true"]').each((j, col) => { const dataAnchorTitle = $(col).attr('data-anchor-title'); const imgSrc = $(col).find('div.gallery figure img').attr('src'); if (dataAnchorTitle && imgSrc) { results.push({ title: dataAnchorTitle, image: imgSrc, type: 'gallery' }); } }); }); } return results; }

Issues Observed

  1. Only One Item Indexed: Despite having multiple items on the page, only one item is being indexed.
  2. Mandatory Fields: Missing mandatory fields in the index configuration could cause items to fail indexing.
  3. Order of Extractors: The sequence of extractors (JavaScript extractor vs. XPath extractor) might be causing conflicts.
  4. ID Generation: The ID for items was not being generated consistently, leading to indexing failures.

Resolution Steps

  1. Validate Extractor Logic:

    • Ensure the extractor captures all items on the page.
    • Hardcode mandatory fields temporarily to check if items get indexed correctly.
  2. Adjust Extractor Sequence:

    • Reorder the JavaScript Document Extractor to run before other extractors (e.g., XPath extractor) or remove unnecessary extractors.
  3. Generate IDs Programmatically:

    • Add logic to generate unique IDs for each item in the JavaScript extractor:

      const id = `${dataAnchorTitle}-${i}-${j}`; results.push({ id, title: dataAnchorTitle, image: imgSrc, type: 'gallery' });
  4. Re-Index and Validate:

    • Re-index with only the relevant extractor and verify results.
  5. Address Errors:

    • If errors like "heartbeat error" occur, retry indexing after resolving underlying connectivity or configuration issues.

Outcome

After implementing these changes:

  • All cards on the page were indexed as independent items.
  • Proper field mappings and unique ID generation ensured data integrity.
  • The reordering of extractors resolved conflicts during the indexing process.

These steps should help streamline indexing shared data items from listing pages in Sitecore Search.

No comments:

Post a Comment