On Thursday, French leading language model (LLM) developer Mistral launched a new API for developers who work with complex PDF documents. Mistral OCR is an optical character recognition (OCR) API that converts any PDF into a text file, making it easier to ingest AI models.
Supporting popular Genai tools like Openai’s ChatGpt, LLMS works particularly well with RAW text. So, companies that want to create their own AI workflows know that it is extremely important to store and index data in a clean format so that this data can be reused for AI processing.
Unlike most OCR APIs, Mistral OCR is a multimodal API. This means that it can be detected when there are diagrams or photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output.
Mistral OCR doesn’t just output a large wall of text. The output is formatted with Markdown, a formatting syntax that developers use to add links, headers and other formatting elements to plain text files.
LLMS relies heavily on the markdown of the training dataset. Similarly, using AI assistants like Mistral’s Le Chat or Openai’s ChatGPT will generate Markdowns to create bullet points, add links, and boldly place some elements. The assistant app seamlessly formats markdown output into rich text output. As such, as Genai is booming, Raw Text (and Markdown) has become more important in recent years.
“Over the years, organizations have accumulated a large number of documents, often in PDF or slide formats, which do not allow access to LLMS, particularly RAG systems. Guillaume Lample, co-founder and chief science officer of Mistral, said:
“This is an important step towards widespread recruitment of AI assistants for companies that need to simplify access to the vast amount of internal documents,” he added.
Mistral OCR is available on Mistral’s proprietary API platform or cloud partners (such as AWS, Azure, Google Cloud Vertex). Additionally, for businesses using classified or sensitive data, Mistral offers on-premises deployments.
According to the Paris-based AI company, Mistral OCR is better than the APIs of Google, Microsoft and Openai. The company tested its OCR models with complex documents that include mathematical formulas (latex formats), advanced layouts, or tables. Additionally, using non-English documents is expected to improve performance.

Given that Mistral OCR only does one thing and one thing, the company thinks it’s faster than what’s out there. Compared to multimodal LLMs like the GPT-4O, that’s not surprising when compared to OCR functions (among many other features).
Mistral uses Mistral OCR for its own AI Assistant LE Chat. When a user uploads a PDF file, the company uses what is in the document in the background before processing the text.
Companies and developers will likely use multimodal documents as input in LLM using Mistral OCR with a RAG (also known as searched generation) system. And there are many potential use cases. For example, you could assume that a law firm would use it to help a large number of documents quickly.
RAG is a technique used to retrieve data and use it as the context in generated AI models.
Source link