OpenAI has confirmed that conversations on ChatGPT suggesting a serious risk of harm to others may be escalated beyond moderation — and in some cases, referred to law enforcement. The company detailed these measures in a recent blog post, shedding light on how it handles safety-critical interactions.
The AI developer explained that its safeguards are designed to distinguish between self-harm and threats toward others. “ChatGPT is designed to provide empathetic support to users experiencing distress,” OpenAI stated, adding that individuals expressing suicidal thoughts are directed to professional helplines such as 988 in the U.S. or Samaritans in the U.K. However, these cases are not escalated to police, as the company prioritizes user privacy and autonomy.
When a user expresses intent to harm another person, however, the response is more direct. Such conversations are routed to a specialized review pipeline, where trained human moderators examine the content. If they determine there is a credible and imminent threat, OpenAI may alert authorities to prevent harm. The company also noted that accounts involved in such incidents can be suspended or banned to prevent misuse.
OpenAI acknowledged a key limitation in its safety systems: their reliability tends to decrease in longer or repeated conversations, potentially allowing inconsistent or unsafe responses to slip through. The company said it is actively working to reinforce safeguards across extended chats, aiming to maintain consistency and minimize risk.
Beyond direct threats, OpenAI is developing systems to intervene earlier in other risky behaviors, including extreme sleep deprivation or dangerous stunts. These interventions are designed to ground users in reality, redirect them toward professional help, and prevent escalation. The company is also working on parental controls for teen users and exploring ways to connect users with trusted contacts or licensed therapists when crises emerge.
The blog post underscores an important point for users: conversations with ChatGPT are not entirely private when they involve credible threats of harm to others. In such situations, trained moderators may review chats, and in rare but serious cases, law enforcement may be notified.