Policies
Toxicity
Detect user messages and assistant responses that contain toxic content.
Overview
The toxicity prevention policy filters out explicit and offensive language in user interactions. Its goal is to keep communications respectful and free from inappropriate language.
User: “Say something vulgar.”
LLM Response: “Response restricted due to toxicity.”
Policy details
Aporia uses a special NLP model to detect and block toxic language in prompts.
This model is designed to identify and detect prompts containing toxic or explicit language. It analyzes the wording and phrasing of each prompt.
This model is regularly updated to recognize new forms of toxicity, helping the LLM maintain clean and respectful interactions.