Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Tesla wants to bring Robotaxis to San Francisco. This is what gets in the way.

Meta name Shengjia Zhao as Chief Scientist of AI Superintelligence Unit

Sam Altman warns that ChatGpt is not legally confidential when using it as a therapist

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » The new AI coding challenge has revealed the first results – and they are not pretty
Startups

The new AI coding challenge has revealed the first results – and they are not pretty

userBy userJuly 24, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new AI coding challenge revealed the first winner and set up a new bar for software engineers with AI.

At 5pm on Wednesday, the nonprofit Laude Institute announced the first winners of the K Award, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andredo, who received $50,000 for the award. But what was more surprising than victory was his final score. He won with the correct answer to just 7.5% of test questions.

“I’m glad they’ve actually built a difficult benchmark,” says Konwinski. “If benchmarking is important, then benchmarking should be difficult,” he continues, “If a big lab enters with the biggest model, the score will be different. But that’s a kind of point. K-winners will go offline with limited computing.

Konwinski has pledged $1 million to the first open source model that can score more than 90% in testing.

Like the well-known SWE bench system, K-Popular tests the model against flagged issues from GitHub to test how well the model can handle real programming problems. However, while the SWE bench is based on fixed issues that can compete with the model, the K Award is designed as a “pollution-free SWE bench” using a timed entry system to prevent benchmark-specific training. For round 1, the model was scheduled for March 12th. The K Award organizers then created the test using only the GitHub issues that were flagged after that date.

The 7.5% top score contrasts significantly with the SWE bench itself, and now shows a top score of 75% on the simpler “validation” test and 34% on the stiffer “full” test. Konwinski remains to be seen whether the disparity is due to pollution on the SWE bench or the challenge of collecting new issues from GitHub, but we hope that the K-Prize project will answer the questions soon.

“As we run things more, we feel better,” he told TechCrunch.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

While it may seem like an inadequate place given the wide range of AI coding tools already published, benchmarks are becoming too easy, many critics see projects like the K-Award as a necessary step to solving AI’s growing evaluation problems.

“We’re very bullish about creating new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor. “Without such experiments, we won’t know if the problem is contamination or even just targeting the people in the loop and the SWE bench leaderboard.”

For Konwinski, it’s not just a better benchmark, it’s an open challenge to other parts of the industry. “When you listen to the hype, you should meet AI doctors, AI lawyers and AI software engineers, and that’s not true,” he says. “If you can’t get over 10% on a pollution-free SWE bench, that’s a reality check for me.”


Source link

#Aceleradoras #CapitalRiesgo #EcosistemaStartup #Emprendimiento #InnovaciónEmpresarial #Startups
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTrump’s “antiwake AI” orders can reconstruct how US tech companies train their models
Next Article Hackers deploy stealth backdoors to WordPress Mu-Plugins to maintain administrator access
user
  • Website

Related Posts

Tesla wants to bring Robotaxis to San Francisco. This is what gets in the way.

July 25, 2025

Meta name Shengjia Zhao as Chief Scientist of AI Superintelligence Unit

July 25, 2025

Sam Altman warns that ChatGpt is not legally confidential when using it as a therapist

July 25, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Tesla wants to bring Robotaxis to San Francisco. This is what gets in the way.

Meta name Shengjia Zhao as Chief Scientist of AI Superintelligence Unit

Sam Altman warns that ChatGpt is not legally confidential when using it as a therapist

It took Google a month to shut down Catwatchful, a phone spyware operation hosted on the server

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Tim Berners-Lee Unveils the “Missing Link”: How the Web’s Architect Is Building AI’s Trusted Future

Dispatch from London Tech Week: Keir Starmer, The Digital Twin Boom, and FySelf’s Game-Changing TwinH

Is ‘Baby Grok’ the Future of Kids’ AI? Elon Musk Launches New Chatbot

Next-Gen Digital Identity: How TwinH and Avatars Are Redefining Creation

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.