Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Klarna’s IPO Pop raises $1.4 billion, with Sequoia being garnered as the biggest winner

Chinese apt deploys egg stream fireless malware to infringe Philippine military systems

Vimeo is acquired by bending a spoon in a $1.38 billion all-cash transaction

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » The new AI coding challenge has revealed the first results – and they are not pretty
Startups

The new AI coding challenge has revealed the first results – and they are not pretty

userBy userJuly 24, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new AI coding challenge revealed the first winner and set up a new bar for software engineers with AI.

At 5pm on Wednesday, the nonprofit Laude Institute announced the first winners of the K Award, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andredo, who received $50,000 for the award. But what was more surprising than victory was his final score. He won with the correct answer to just 7.5% of test questions.

“I’m glad they’ve actually built a difficult benchmark,” says Konwinski. “If benchmarking is important, then benchmarking should be difficult,” he continues, “If a big lab enters with the biggest model, the score will be different. But that’s a kind of point. K-winners will go offline with limited computing.

Konwinski has pledged $1 million to the first open source model that can score more than 90% in testing.

Like the well-known SWE bench system, K-Popular tests the model against flagged issues from GitHub to test how well the model can handle real programming problems. However, while the SWE bench is based on fixed issues that can compete with the model, the K Award is designed as a “pollution-free SWE bench” using a timed entry system to prevent benchmark-specific training. For round 1, the model was scheduled for March 12th. The K Award organizers then created the test using only the GitHub issues that were flagged after that date.

The 7.5% top score contrasts significantly with the SWE bench itself, and now shows a top score of 75% on the simpler “validation” test and 34% on the stiffer “full” test. Konwinski remains to be seen whether the disparity is due to pollution on the SWE bench or the challenge of collecting new issues from GitHub, but we hope that the K-Prize project will answer the questions soon.

“As we run things more, we feel better,” he told TechCrunch.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

While it may seem like an inadequate place given the wide range of AI coding tools already published, benchmarks are becoming too easy, many critics see projects like the K-Award as a necessary step to solving AI’s growing evaluation problems.

“We’re very bullish about creating new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor. “Without such experiments, we won’t know if the problem is contamination or even just targeting the people in the loop and the SWE bench leaderboard.”

For Konwinski, it’s not just a better benchmark, it’s an open challenge to other parts of the industry. “When you listen to the hype, you should meet AI doctors, AI lawyers and AI software engineers, and that’s not true,” he says. “If you can’t get over 10% on a pollution-free SWE bench, that’s a reality check for me.”


Source link

#Aceleradoras #CapitalRiesgo #EcosistemaStartup #Emprendimiento #InnovaciónEmpresarial #Startups
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTrump’s “antiwake AI” orders can reconstruct how US tech companies train their models
Next Article Hackers deploy stealth backdoors to WordPress Mu-Plugins to maintain administrator access
user
  • Website

Related Posts

Klarna’s IPO Pop raises $1.4 billion, with Sequoia being garnered as the biggest winner

September 10, 2025

Vimeo is acquired by bending a spoon in a $1.38 billion all-cash transaction

September 10, 2025

Uber will add Blade helicopters to its platform in 2026

September 10, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Klarna’s IPO Pop raises $1.4 billion, with Sequoia being garnered as the biggest winner

Chinese apt deploys egg stream fireless malware to infringe Philippine military systems

Vimeo is acquired by bending a spoon in a $1.38 billion all-cash transaction

Chillyhell Macos backdoor and Zinorrat rats threaten Macos, Windows and Linux Systems

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Bridging Healthcare Divides: ‘Break The Gap 2025’ Summit Sets New Agenda for Vertical Health

Wearable Tech Deep Dive: The Science Behind Smartwatches and Your Health Goals

The Adaptable Healthcare Playbook: How TwinH Is Leading the Way

Smart Health, Seamless Integration: GooApps Leads the Way in 2025

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.