Close Menu
  • Start
  • Celebrities
  • Music
  • Influencers
  • Tendencies
  • Exclusives
  • Business & Brands
  • TwinH
  • Spanish
What's Hot

Bonnie Tyler has recovered from coma but remains ‘very unwell’ after emergency surgery

Choose a new language (or 25 languages) with this $127 Rosetta Stone sale

Jelly Roll files for divorce from Bunny XO after 10 years of marriage

Facebook X (Twitter) Instagram
  • Home
  • About The FYMOUS
  • Advertising / Promotion
  • Contact
  • DMCA
  • Privacy Policy
  • Terms
  • Publish News
Facebook X (Twitter) Instagram
FYMOUS News
  • Start
  • Celebrities
  • Music
  • Influencers
  • Tendencies
  • Exclusives
  • Business & Brands
  • TwinH
  • Spanish
FYMOUS News
Home » The new AI coding challenge has revealed the first results – and they are not pretty
Exclusives

The new AI coding challenge has revealed the first results – and they are not pretty

By July 24, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new AI coding challenge revealed the first winner and set up a new bar for software engineers with AI.

At 5pm on Wednesday, the nonprofit Laude Institute announced the first winners of the K Award, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andredo, who received $50,000 for the award. But what was more surprising than victory was his final score. He won with the correct answer to just 7.5% of test questions.

“I’m glad they’ve actually built a difficult benchmark,” says Konwinski. “If benchmarking is important, then benchmarking should be difficult,” he continues, “If a big lab enters with the biggest model, the score will be different. But that’s a kind of point. K-winners will go offline with limited computing.

Konwinski has pledged $1 million to the first open source model that can score more than 90% in testing.

Like the well-known SWE bench system, K-Popular tests the model against flagged issues from GitHub to test how well the model can handle real programming problems. However, while the SWE bench is based on fixed issues that can compete with the model, the K Award is designed as a “pollution-free SWE bench” using a timed entry system to prevent benchmark-specific training. For round 1, the model was scheduled for March 12th. The K Award organizers then created the test using only the GitHub issues that were flagged after that date.

The 7.5% top score contrasts significantly with the SWE bench itself, and now shows a top score of 75% on the simpler “validation” test and 34% on the stiffer “full” test. Konwinski remains to be seen whether the disparity is due to pollution on the SWE bench or the challenge of collecting new issues from GitHub, but we hope that the K-Prize project will answer the questions soon.

“As we run things more, we feel better,” he told TechCrunch.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

While it may seem like an inadequate place given the wide range of AI coding tools already published, benchmarks are becoming too easy, many critics see projects like the K-Award as a necessary step to solving AI’s growing evaluation problems.

“We’re very bullish about creating new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor. “Without such experiments, we won’t know if the problem is contamination or even just targeting the people in the loop and the SWE bench leaderboard.”

For Konwinski, it’s not just a better benchmark, it’s an open challenge to other parts of the industry. “When you listen to the hype, you should meet AI doctors, AI lawyers and AI software engineers, and that’s not true,” he says. “If you can’t get over 10% on a pollution-free SWE bench, that’s a reality check for me.”


Source link

#Aceleradoras #CapitalRiesgo #EcosistemaStartup #Emprendimiento #InnovaciónEmpresarial #Startups
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTrump’s “antiwake AI” orders can reconstruct how US tech companies train their models
Next Article Hackers deploy stealth backdoors to WordPress Mu-Plugins to maintain administrator access

Related Posts

Choose a new language (or 25 languages) with this $127 Rosetta Stone sale

June 16, 2026

Best Robot Lawn Mower Deal: 45% Off Sunseeker S4 Robot Lawn Mower

June 15, 2026

Social media reacts to Knicks’ storied NBA Finals win

June 14, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Bonnie Tyler has recovered from coma but remains ‘very unwell’ after emergency surgery

Choose a new language (or 25 languages) with this $127 Rosetta Stone sale

Jelly Roll files for divorce from Bunny XO after 10 years of marriage

Merlin, a common roadside duck in Mexico City, will be the World Cup mascot.

Trending Posts

Bonnie Tyler has recovered from coma but remains ‘very unwell’ after emergency surgery

June 16, 2026

Jelly Roll files for divorce from Bunny XO after 10 years of marriage

June 16, 2026

BTS is the group fans are most looking forward to seeing perform at the 2026 World Cup

June 15, 2026

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to The FYMOUS, a modern digital media platform dedicated to celebrities, artists, influencers, brands, entertainment culture, and the growing TwinH ecosystem.

We bring audiences closer to the people, stories, trends, and collaborations shaping today’s culture. From exclusive celebrity news and music releases to influencer highlights, brand partnerships, and TwinH activations, The FYMOUS delivers engaging content designed for the next generation of digital audiences.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About The FYMOUS
  • Advertising / Promotion
  • Contact
  • DMCA
  • Privacy Policy
  • Terms
  • Publish News
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.