Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
What's Hot

Top tech startup funding news for today, May 8, 2025

Who is Pope Leo XIV, the first American papal craftsman? |Religious News

Student protester Mohsen Mahdawi announces legal defense funds for immigrants | Donald Trump News

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
Fyself News
Home » Mistral adds a new API that turns PDF documents into AI-enabled markdown files
Startups

Mistral adds a new API that turns PDF documents into AI-enabled markdown files

userBy userMarch 6, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

On Thursday, French leading language model (LLM) developer Mistral launched a new API for developers who work with complex PDF documents. Mistral OCR is an optical character recognition (OCR) API that converts any PDF into a text file, making it easier to ingest AI models.

Supporting popular Genai tools like Openai’s ChatGpt, LLMS works particularly well with RAW text. So, companies that want to create their own AI workflows know that it is extremely important to store and index data in a clean format so that this data can be reused for AI processing.

Unlike most OCR APIs, Mistral OCR is a multimodal API. This means that it can be detected when there are diagrams or photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output.

Mistral OCR doesn’t just output a large wall of text. The output is formatted with Markdown, a formatting syntax that developers use to add links, headers and other formatting elements to plain text files.

LLMS relies heavily on the markdown of the training dataset. Similarly, using AI assistants like Mistral’s Le Chat or Openai’s ChatGPT will generate Markdowns to create bullet points, add links, and boldly place some elements. The assistant app seamlessly formats markdown output into rich text output. As such, as Genai is booming, Raw Text (and Markdown) has become more important in recent years.

“Over the years, organizations have accumulated a large number of documents, often in PDF or slide formats, which do not allow access to LLMS, particularly RAG systems. Guillaume Lample, co-founder and chief science officer of Mistral, said:

“This is an important step towards widespread recruitment of AI assistants for companies that need to simplify access to the vast amount of internal documents,” he added.

Mistral OCR is available on Mistral’s proprietary API platform or cloud partners (such as AWS, Azure, Google Cloud Vertex). Additionally, for businesses using classified or sensitive data, Mistral offers on-premises deployments.

According to the Paris-based AI company, Mistral OCR is better than the APIs of Google, Microsoft and Openai. The company tested its OCR models with complex documents that include mathematical formulas (latex formats), advanced layouts, or tables. Additionally, using non-English documents is expected to improve performance.

Image credit: Mistral

Given that Mistral OCR only does one thing and one thing, the company thinks it’s faster than what’s out there. Compared to multimodal LLMs like the GPT-4O, that’s not surprising when compared to OCR functions (among many other features).

Mistral uses Mistral OCR for its own AI Assistant LE Chat. When a user uploads a PDF file, the company uses what is in the document in the background before processing the text.

Companies and developers will likely use multimodal documents as input in LLM using Mistral OCR with a RAG (also known as searched generation) system. And there are many potential use cases. For example, you could assume that a law firm would use it to help a large number of documents quickly.

RAG is a technique used to retrieve data and use it as the context in generated AI models.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAn intangible, uncoded 3D creation tool for filmmakers and game designers raises $4 million
Next Article Openai’s Ex-Policy Lead criticizes the company for “rewriting” the AI ​​safety history
user
  • Website

Related Posts

China’s Geely moves to make EV startup Zeekr private in the trade war with us

May 8, 2025

Sequoia leads a $1.5 billion tender offer for sales automation startup clay

May 8, 2025

Bill Gates says based on using it all by 2045

May 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Top tech startup funding news for today, May 8, 2025

Who is Pope Leo XIV, the first American papal craftsman? |Religious News

Student protester Mohsen Mahdawi announces legal defense funds for immigrants | Donald Trump News

He was appointed as the only finalist for former president of Governor Li Li University in Florida.

Trending Posts

Who is Pope Leo XIV, the first American papal craftsman? |Religious News

May 8, 2025

Student protester Mohsen Mahdawi announces legal defense funds for immigrants | Donald Trump News

May 8, 2025

The documentary sheds light on Biden’s reaction to the murder of Shireen Abuakure | News in the Occupy West Bank

May 8, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Top tech startup funding news for today, May 8, 2025

Health Technology Startup Kouper emerges from $10 million stealth in funding to transform the patient care transition

Did Figma kill Webflow and Framer with the release of the Figma site?

Metaworld Congress 2025: Madrid Takes Center Stage in Digital Innovation

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.