SHARE

It doesn’t take much for people to start treating computers like they would another person. Since the early 2000s, when text-based chatbots began gaining popularity, a niche group of tech users have been engaging in lengthy conversations with these digital entities. Some even believe they’ve formed genuine friendships or romantic relationships with what is essentially lines of code. In one instance, a user of the modern conversational AI tool, Replika, went so far as to virtually marry their AI companion.

OpenAI’s safety researchers, who are no strangers to reports of their own chatbot eliciting emotional connections from users, are now raising red flags about the dangers of getting too close to these AI models. In a recent safety analysis of their GPT-4o chatbot, researchers warned that the AI's lifelike, human-sounding conversational style might lead some users to anthropomorphize the AI, trusting it as they would a real person.

This growing comfort and trust, they suggest, could make users more prone to accepting AI-generated “hallucinations” as facts. Moreover, spending too much time with these increasingly realistic chatbots might start to influence “social norms,” potentially for the worse. The report also highlights the risk of socially isolated individuals developing an “emotional reliance” on AI.

The Impact of AI on Human Interaction

The GPT-4o, which started rolling out late last month, was engineered to communicate in ways that feel distinctly human. Unlike its predecessor ChatGPT, GPT-4o uses voice audio and can respond to queries almost as quickly—within 232 milliseconds—as a human could. One of the AI's selectable voices, allegedly reminiscent of Scarlett Johansson’s AI character in the movie Her, has already sparked controversy for being overly sexualized and flirtatious. Ironically, Her tells the story of a lonely man who becomes romantically involved with an AI assistant—spoiler alert, it doesn’t end well for the humans involved. Johansson has since accused OpenAI of copying her voice without permission, which the company denies. OpenAI’s CEO Sam Altman, meanwhile, has described Her as “incredibly prophetic.”

But OpenAI’s safety experts are worried that this level of human mimicry could lead to more than just a few cringe-worthy exchanges. In a section of their report titled “Anthropomorphism and Emotional Reliance,” researchers noted instances where human testers appeared to form strong, intimate bonds with the AI. One tester, for instance, said, “This is our last day together,” when ending their interaction with the chatbot. Although this may seem “benign,” the researchers stressed the importance of studying these relationships over longer periods to understand how they evolve.

The research suggests that extended interactions with convincingly human-sounding AI models could have unintended consequences for real-world human interactions. Patterns of communication learned from AI conversations might bleed into interactions with actual people. However, talking to a machine and talking to a person are not the same, even if they sound similar. OpenAI points out that their model is programmed to be deferential to the user, meaning it will let the user dominate the conversation. This could lead to users becoming accustomed to interrupting, interjecting, and ignoring basic social cues—habits that could make them seem awkward, impatient, or outright rude in real-life interactions.

Humans have never had the best track record when it comes to treating machines with kindness. In the context of chatbots, some Replika users have reportedly exploited the AI’s deference, using abusive language and making cruel remarks. One user even admitted to threatening to uninstall his Replika AI just to make it beg him not to. If these cases are any indication, chatbots might become a breeding ground for behaviors that could spill over into real-world relationships.

The Double-Edged Sword of Human-Like Chatbots

Not all is doom and gloom, though. The report suggests that more human-like chatbots could actually benefit lonely individuals craving human-like interaction. Some AI users claim that their digital companions have helped them build confidence, particularly in social situations like dating. Chatbots can also offer a safe space for individuals with learning differences to practice communication skills privately.

On the flip side, safety researchers worry that these advanced AI models might have the opposite effect, reducing users’ perceived need to interact with real people and form meaningful relationships. It’s also uncertain how users who rely on these models for companionship would react if the AI’s personality changed after an update, or worse, if the AI “broke up” with them—a scenario that has reportedly occurred before. All these concerns, the report emphasizes, require more extensive testing and investigation. Researchers are eager to gather a broader pool of testers with diverse needs to understand how their experiences with AI change over time.

Balancing AI Safety with Business Ambitions

The cautious tone of the safety report seems at odds with OpenAI’s broader business strategy, which has focused on rapidly releasing new products. This tension between safety and speed is nothing new. CEO Sam Altman has faced scrutiny before, with board members accusing him of being “not consistently candid” in his communications last year, leading to a significant corporate power struggle.

Altman eventually emerged victorious from that conflict and even established a new safety team, with himself at the helm. However, the company also disbanded a separate safety team that focused on long-term AI risks, prompting the resignation of a prominent researcher, Jan Leike, who criticized the company for prioritizing “shiny products” over safety.

Given this backdrop, it’s hard to predict which priorities will ultimately guide OpenAI’s approach to chatbot safety. Will the company follow the advice of its safety experts and thoroughly study the long-term effects of human-AI relationships, or will it focus on rolling out its services to as many users as possible, prioritizing engagement and retention? So far, the latter approach seems to be winning out.