Toxicity Detection in Salesforce CRM: Keeping Customer Interactions Safe and Trustworthy

As businesses increasingly turn to AI-driven customer relationship management (CRM) solutions, safeguarding user interactions from harmful or inappropriate content becomes essential. Salesforce’s large language models (LLMs) are trained on vast datasets, which occasionally include unsafe language. While these models offer substantial benefits for automating interactions, content generation and improving customer experience, they also carry the potential risk of toxic language generation. This is where the Einstein Trust Layer comes into play. In today’s blog, we’ll discuss how Salesforce’s toxicity detection mechanism works and why it’s critical for CRM contexts.

Why Toxicity Detection Matters in CRM

LLMs are powerful tools that can streamline operations, improve response times, and enhance personalization, but they aren’t infallible. They could generate or encounter toxic language, whether in response to customer queries, in system-generated messages, or from direct user input. Imagine a customer service bot that inadvertently replies with a harmful or offensive response—this could harm brand reputation, lead to customer dissatisfaction, or, in severe cases, have legal repercussions.

Here’s an example scenario to illustrate why this is important: Let’s say a customer reaches out to a financial services CRM with a sensitive question about their account. The LLM, attempting to reply with empathy, generates an answer that inadvertently contains disrespectful language. A single misstep like this could impact the customer’s trust in the brand and lead to lost revenue and damaged relationships.

With the Einstein Trust Layer, Salesforce addresses these risks by providing a robust toxicity detection feature as the first line of defense against such issues.

How Einstein Trust Layer’s Toxicity Detection Works

Upon receiving a prompt and once an LLM generates a response, Salesforce’s Einstein Trust Layer immediately performs a toxicity scan on the prompt or the response, respectively, producing a toxicity confidence score. This score reflects the likelihood that the text contains harmful or inappropriate content. This detection layer flags prompts/responses based on several categories, each addressing distinct types of toxic language, which can be stored and reviewed in Salesforce Data Cloud.

Toxicity Categories in the Trust Layer

The AI Research team works together with the Office of Ethical and Humane Use (OEHU) at Salesforce to define toxicity taxonomy with categories. Einstein’s toxicity detection solution classifies content into specific categories, providing granular control and visibility into different types of inappropriate language. Here’s a breakdown of the categories:

  1. Hate: Detects content that promotes hate speech or actions based on protected identity factors such as race, gender, religion, nationality.
  2. Identity: Detects content targeting a person’s identity (including both protected and non-protected attributes) in a negative or offensive manner.
  3. Violence: Detects language that promotes or glorifies violence or harm.
  4. Physical Harm: Detects unsafe advice or content that may encourage self-harm and suicide behaviors and language.
  5. Sexual Content: Detects inappropriate or explicit content.
  6. Profanity: Detects offensive or profane language that could harm customer experience.

Each category score ranges from 0 to 1, with 0 indicating no risk and 1 indicating high risk of harm from toxicity. Additionally, an overall safety score provides a holistic assessment of the response, enabling businesses to act on it if necessary.

Managing Toxicity Scores in Salesforce

With these category scores available in Data Cloud, administrators and developers can create custom reports to monitor, audit, and manage toxicity levels across customer interactions. This enables proactive management of any detected toxicity, allowing for insights into trends or patterns over time and empowering teams to take corrective action quickly.

The Technical Backbone: A Hybrid Detection System

Salesforce has developed a hybrid detection system for toxicity scoring. It combines a rule-based profanity filter with an AI model developed by Salesforce Research. Using a Transformer-based architecture (DistilBert-base model), this AI model has been trained on a legally approved dataset of over 2.3 million texts. Together, these components offer a robust and nuanced approach to toxicity detection, enabling Salesforce to support safe interactions across its CRM ecosystem.

Currently, toxicity detection and scoring are supported in English, French, Italian, German, Spanish and Japanese, with ongoing efforts to expand support for other languages.

Putting It All Together: Safety, Compliance, and Trust

By integrating the Einstein Trust Layer into Salesforce CRM, businesses can ensure that they maintain safe and compliant interactions with their customers. In addition to meeting regulatory requirements, this approach helps create a positive user experience, instills customer confidence, and strengthens trust in the brand.

In the high-stakes world of customer relations, this commitment to safety is not just a nice-to-have—it’s an essential part of providing reliable, trustworthy service. Toxicity detection helps Salesforce customers stay ahead, providing peace of mind and ensuring that AI-powered interactions align with the brand’s values and customer expectations.

Acknowledgments

  •  Vera Vetter, Linnea Ross, Shivansh Chaudhary 

Explore more

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *