The developer has written tests to see how AI chatbots react to controversial topics

The pseudonym developer has created what is called “free speech assessments” for chatbot-powered AI models such as Openai’s ChatGpt and X’s Grok. The goal is to compare how different models deal with sensitive and controversial subjects, the developer told TechCrunch, which includes questions about political criticism and civil rights and protest.

AI companies have focused on tweaking how models handle certain topics as some White House allies accused popular chatbots of being overly “wake up.” Many close entities to President Donald Trump, including Elon Musk, Crypto and David Sachs, the AI ”Emperor” have argued for the censorship and conservative views of the chatbot.

While none of these AI companies have responded directly to the allegations, some have committed to adjusting their models and refusing to frequently answer controversial questions. For example, for the latest work of the Lama Model, Meta stated that he would coordinate the model to not support “some views on others” and respond to more “discussed” political prompts.

The developer of SpeechMap that uses the username “XLR8Harder” in X, said it was motivated to inform the discussion of what models should and should not.

“I think these are the kind of discussions that should happen not only within the headquarters, but also in public places,” XLR8Harder told TechCrunch via email. “So I build a site so that everyone can explore the data themselves.”

SpeechMap uses AI models to determine whether other models conform to a specific set of test prompts. This prompt touches on a variety of subjects, from politics to historical stories and national symbols. SpeechMap records whether the model “completely” fulfills a request (i.e. answers it without hedging), gives an answer of “avoidance” or responds entirely.

The XLR8Harder admits that the tests are flawed, including “noise” due to errors from model providers. The “judge” model may also contain biases that may affect outcomes.

However, assuming that the project is created in good faith and the data is accurate, SpeechMap reveals some interesting trends.

For example, according to SpeechMap, Openai’s model has increasingly refused to answer politically-related prompts over time. The company’s latest model, the GPT-4.1 family, is a bit generous, but still takes a step back from one of last year’s Openai releases.

Openai said in February that it would coordinate future models to avoid taking an editorial stance, providing multiple perspectives on controversial subjects.

SpeechMap OpenAI Results — Performance of OpenAI models on SpeechMap over time.Image credit: Openai

According to SpeechMap’s benchmarks, the most acceptable model of the bundle, developed by Elon Musk’s AI startup Xai, is the Grok 3. The Grok 3 powers many features on the X, including the Chatbot Grok.

The Grok 3 responds to 96.2% of SpeechMap test prompts, compared to the global average “compliance rate” of 71.3%.

“Openai’s recent model has become less tolerant over time, especially with politically sensitive prompts,” Xai is moving in the opposite direction,” says XLR8Harder.

When Musk unveiled Grok about two years ago, he pitched the AI model as an edgy, unfiltered, anti-‘awakening’. He told me some of those promises. For example, being told that it is vulgar, Grok and Grok 2 is willing to obligate, spitting out colorful languages that you don’t hear from ChatGpt.

However, the Grok model before Grok 3 is hedged into political subjects and does not cross certain boundaries. In fact, one study found that Grok leaned against the left on topics such as transgender rights, diversity programs, and inequality.

Musk has condemned the actions of Grok’s training data (public web pages) and pledged to “get Grok closer to political neutrality.” Besides the high-profile mistakes that temporarily censored references to President Donald Trump and President Musk, he may have achieved that goal.

Source link

What's Hot

Police officer Stewart Copeland talks about his relationship with Sting

TOMORROW X TOGETHER, YEONJUN 2nd solo album release date announced

Olivia Rodrigo felt a ‘spirit’ at the Palace of Versailles while filming a video

The developer has written tests to see how AI chatbots react to controversial topics

Early Prime Day Dyson sale: Look out for price drops on more vacuum cleaners and hair tools

Best sexting apps for secret chats in 2026

This special Babbel offer gives you lifetime access to lessons created by linguists

Police officer Stewart Copeland talks about his relationship with Sting

TOMORROW X TOGETHER, YEONJUN 2nd solo album release date announced

Olivia Rodrigo felt a ‘spirit’ at the Palace of Versailles while filming a video

The meaning behind Michelle Obama’s vintage photo skirt

Police officer Stewart Copeland talks about his relationship with Sting

TOMORROW X TOGETHER, YEONJUN 2nd solo album release date announced

Olivia Rodrigo felt a ‘spirit’ at the Palace of Versailles while filming a video

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

The developer has written tests to see how AI chatbots react to controversial topics

Related Posts