Tuesday, December 3, 2024

Handling Special Characters in Sitecore Search: Current Limitations and Challenges

I recently contacted Sitecore Support to explore whether it’s possible to configure search functionality to better handle queries containing only special characters. Specifically, I wanted to ensure that such queries return no results without requiring front-end adjustments or UI-level changes.

Current Search Behavior

At present, Sitecore’s search system does not manage this scenario effectively. When a search query consists solely of special characters, the platform doesn’t filter them out and ends up returning all results. While Sitecore's search analyzer uses a predefined list of stop words to exclude common terms, there is no equivalent mechanism to handle special characters.

Sitecore’s Response

After a detailed review, Sitecore Support confirmed that there is no built-in functionality to address this issue. Filtering out queries with only special characters is not supported out-of-the-box and would require custom development or alternative workarounds beyond Sitecore's default capabilities.

Next Steps and Considerations

Given the lack of a native solution, I have opted to close the support ticket for now. However, this is an area where Sitecore could consider future enhancements to improve search accuracy and behavior.

In the meantime, teams facing similar challenges may need to explore custom implementations or front-end validation to handle such search queries effectively.

Thursday, November 14, 2024

Pages Rich Text Editor (RTE) Lacks Internal Sitecore Link Functionality

While working with the Rich Text Editor (RTE) in Pages, we noticed a limitation: it doesn’t allow the creation of internal Sitecore links. This makes it harder for the marketing team to seamlessly add internal links directly within the Pages interface, disrupting their workflow and efficiency.

Current Status

Sitecore Support has confirmed that this functionality is not currently available in the Pages RTE.

  • A feature request (PGS-1236) has been created for consideration in future releases.
  • In the meantime, the recommended workaround is to add internal links directly through the Content Editor.

Workaround

For now, internal links can be added by:

  1. Navigating to the Content Editor.
  2. Editing the Rich Text field and manually inserting internal links.

Next Steps

  • Monitor the status of the feature request (PGS-1236) via the Sitecore Support portal.
  • Consider educating content authors on the temporary workaround until the feature is available in Pages.

We hope to see this functionality in upcoming releases to streamline the authoring experience for internal linking directly in Pages.

Wednesday, November 13, 2024

Custom Class Not Retained in Rich Text Editor (RTE) in Page Editor Mode (XMCloud)

In Sitecore XM Cloud's Page Editor, the Rich Text Editor (RTE) has an issue where custom class attributes applied to HTML elements, such as a <span>, are not preserved. After saving and reopening the RTE, these elements are automatically converted to <p> tags, causing the custom class to be stripped away.

Example:

Input:


<span class="small-text">Sample Text</span>

After saving and reopening:


<p>Sample Text</p>

Investigation and Findings

  • The issue occurs exclusively in Pages' RTE, which uses Quill.js as the editor.
  • In the Content Editor's RTE, custom attributes like <span> and class are preserved as expected.
  • This behavior is linked to Quill's optimization process, which standardizes and simplifies HTML during editing.

Recommended Resolution

To address this issue, Sitecore Support recommends switching to the latest CKEditor RTE, which resolves the problem and supports retaining custom attributes like classes.

Steps to Enable CKEditor in Sitecore XM Cloud

  1. Update Configuration:

    • Replace Quill.js with CKEditor as the default Rich Text Editor for the Page Editor.
  2. Validate Custom HTML:

    • Ensure that CKEditor preserves custom HTML elements and attributes in your environment.
  3. Test and Deploy:

    • Test the new RTE configuration thoroughly in a staging environment before rolling it out to production.

For detailed instructions, refer to the official Sitecore documentation: Enabling CKEditor in Sitecore XM Cloud.

Workaround

If switching to CKEditor is not immediately feasible, you can manage custom HTML via the Content Editor RTE, where the issue does not occur.

Outcome

By implementing CKEditor, you can retain custom classes and resolve this limitation in the Page Editor's RTE, ensuring better flexibility and alignment with custom styling requirements.

Tuesday, November 12, 2024

Sitecore API Pagination and Sorting Errors

The Sitecore API was experiencing critical issues impacting pagination and sorting functionalities. Whenever users attempted to navigate to subsequent pages or apply sorting, the API returned internal server errors (500), rendering the features unusable.



Issue Summary

  1. Internal Server Errors (500):
    When pagination or sorting requests were sent, the API intermittently responded with:



    { "widgets": [ { "rfk_id": "test_widget_1", "type": "content_grid", "errors": [ { "message": "Internal Server Error", "code": 1001, "type": "internal_server_error", "severity": "HIGH" } ] } ] }
  2. Sorting Failures:
    Sorting configurations such as title_asc_asc or title_desc_desc failed to produce expected results. Responses either returned incorrect data or triggered the same server error.

  3. Inconsistent Pagination Behavior:
    For example, attempting to navigate to page 2 through /search?page=2 consistently produced the following:


    { "message": "Internal Server Error", "code": 1001, "type": "internal_server_error" }

Root Cause (Identified by Sitecore Team)

The Sitecore support team determined that the issue was caused by:

  1. A combination of boost rules and pagination settings that conflicted under certain API configurations.
  2. Faulty API handling logic when fetching large datasets while applying custom sorting and pagination rules.

Resolution

  1. System Update:
    Sitecore implemented a patch to rectify how boost rules interact with pagination requests.
  2. Configuration Adjustments:
    The team updated internal settings to ensure sorting worked seamlessly with pagination.
  3. Verification:
    After the fix was deployed, multiple test cases were executed to confirm that:
    • Pagination worked correctly.
    • Sorting options applied accurately without triggering server errors.

Wednesday, October 16, 2024

Indexing Shared Data Items from Search Result Pages in Sitecore

We ran into an issue while trying to index shared data items—like individual cards from a listing page—as separate items in Sitecore Search. Despite the extractor logic appearing correct, some of these items simply weren’t getting indexed.

Scenario

  • The search result page lists shared data items (not individual pages, so they don’t have distinct URLs).
  • The goal is to index each card on the page as an independent item in Sitecore Search with additional tags and logic for extra data values.
  • A flag was added to bypass the "Load More" functionality (?showAll=true) to make all cards visible on a single page for indexing.

Extractor Code

Here’s the extractor being used:


function extract(request, response) { const $ = response.body; const results = []; if (request.url.includes('image-gallery')) { $('div.row.layout').each((i, row) => { $(row).find('div.col-12.column.col-lg-4[gallery-layout="true"]').each((j, col) => { const dataAnchorTitle = $(col).attr('data-anchor-title'); const imgSrc = $(col).find('div.gallery figure img').attr('src'); if (dataAnchorTitle && imgSrc) { results.push({ title: dataAnchorTitle, image: imgSrc, type: 'gallery' }); } }); }); } return results; }

Issues Observed

  1. Only One Item Indexed: Despite having multiple items on the page, only one item is being indexed.
  2. Mandatory Fields: Missing mandatory fields in the index configuration could cause items to fail indexing.
  3. Order of Extractors: The sequence of extractors (JavaScript extractor vs. XPath extractor) might be causing conflicts.
  4. ID Generation: The ID for items was not being generated consistently, leading to indexing failures.

Resolution Steps

  1. Validate Extractor Logic:

    • Ensure the extractor captures all items on the page.
    • Hardcode mandatory fields temporarily to check if items get indexed correctly.
  2. Adjust Extractor Sequence:

    • Reorder the JavaScript Document Extractor to run before other extractors (e.g., XPath extractor) or remove unnecessary extractors.
  3. Generate IDs Programmatically:

    • Add logic to generate unique IDs for each item in the JavaScript extractor:

      const id = `${dataAnchorTitle}-${i}-${j}`; results.push({ id, title: dataAnchorTitle, image: imgSrc, type: 'gallery' });
  4. Re-Index and Validate:

    • Re-index with only the relevant extractor and verify results.
  5. Address Errors:

    • If errors like "heartbeat error" occur, retry indexing after resolving underlying connectivity or configuration issues.

Outcome

After implementing these changes:

  • All cards on the page were indexed as independent items.
  • Proper field mappings and unique ID generation ensured data integrity.
  • The reordering of extractors resolved conflicts during the indexing process.

These steps should help streamline indexing shared data items from listing pages in Sitecore Search.

Page Editor Doesn’t Show Current or Shared Site Name in Data Source

While working with the Page Editor, we noticed an inconsistency in how data source information is displayed compared to the Content Editor. In the Content Editor, data sources clearly indicate whether they belong to the "Current site" or a "Shared site." For example:

  • Carousels (Current site)
  • Carousels (Shared: Shared)
  • Media carousel 1
  • Data (Current site)

However, in the Page Editor, the same data sources are displayed without this distinction:

  • Carousels
  • Carousels
  • Media carousel 1
  • Data

This lack of clarity makes it challenging to differentiate between site-specific and shared data sources within the Page Editor.

Update from Sitecore

Sitecore Support confirmed that the Page Editor currently does not display the "Current site" or "Shared site" labels in the data source selection dialog, unlike the Content Editor.

To address this, Sitecore has created a Feature Request (PGS-2562) to enhance the Page Editor with similar functionality in a future update.

Next Steps

While we wait for the feature to be implemented:

  1. Cross-Verify in Content Editor: Use the Content Editor to confirm the origin of data sources until this feature is available in the Page Editor.
  2. Track Updates: Monitor the progress of the feature request (PGS-2562) with Sitecore Support.

This improvement will bring much-needed clarity and consistency to the Page Editor, simplifying workflows for content editors.

Tuesday, October 15, 2024

Persistent "Heartbeat Error" in Sitecore Search Indexing Jobs

We’ve been encountering a recurring issue during Sitecore search indexing jobs where they fail with the error: "Job failed due to heartbeat error." This error disrupts indexing and impacts production, creating a significant bottleneck in search functionality.



Key Observations

  1. Threshold Limitation:

    • The issue often relates to a threshold where if more than 30% of documents fail (e.g., due to 404 errors), the entire indexing job is discarded.
    • This threshold was confirmed by Sitecore Support as a hard limit with no current configuration option to adjust it.
  2. Unclear Error Reporting:

    • Errors like the "heartbeat error" provide no detailed context, making it difficult to determine whether the issue is due to resource limitations, 404 errors, or other failures.
  3. Sitemap Issues:

    • Delayed sitemap refreshes sometimes lead to invalid or outdated URLs being crawled, resulting in multiple 404 errors.
  4. Random Behavior:

    • In some cases, even with valid configurations and no 404s, jobs fail due to the heartbeat error without any clear cause.

Temporary Fixes Applied

  1. Adjusting Sitemap:

    • Updated and validated the sitemap to ensure all links are functional and up to date.
    • Excluded problematic URLs and broken links to reduce 404 errors.
  2. Simplified Extractor Configuration:

    • Removed unnecessary extractors to streamline the indexing process.
    • Combined document extractors to manage resources more efficiently.
  3. Retry Mechanism:

    • Reran jobs after addressing sitemap and threshold-related issues, which resolved some occurrences of the error.

Discussed below pionts with the Sitecore Team

During our investigation and resolution efforts for the persistent "heartbeat error," we requested the Sitecore team to address the following critical questions:

  1. Can the 30% Threshold Be Adjusted?

    • The current threshold causes the entire indexing job to fail if more than 30% of documents encounter issues (e.g., 404 errors). We inquired whether this threshold is configurable to accommodate different scenarios and prevent unnecessary failures.
  2. Improved Error Reporting:

    • We requested more descriptive error messages to clearly identify the root cause of failures, such as whether the issue is due to the 30% threshold, resource capacity, or another specific reason.
  3. Clarification on Heartbeat Error:

    • We asked whether the heartbeat error is solely tied to the threshold or if other factors, such as resource limitations or system configurations, could contribute to this issue.
  4. Sitemap Handling:

    • Given the potential delays in sitemap refreshes, we asked if there are recommendations for ensuring the crawler processes the most up-to-date sitemap without being affected by temporary 404 errors.
  5. Proactive Notifications:

    • We raised the need for proactive notifications for global incidents affecting crawlers, such as the ones reported in the Sitecore Status portal, to minimize downtime and ensure teams are informed promptly.
  6. Future Roadmap for Improvements:

    • We requested insights into Sitecore's plans for enhancing the search platform, including:
      • Allowing configurable thresholds.
      • Improving error handling and reporting.
      • Addressing known bugs or feature requests related to crawlers and extractors.