Startups

Mistral adds a new API that turns PDF documents into AI-enabled markdown files

By userMarch 6, 2025No Comments3 Mins Read

On Thursday, French leading language model (LLM) developer Mistral launched a new API for developers who work with complex PDF documents. Mistral OCR is an optical character recognition (OCR) API that converts any PDF into a text file, making it easier to ingest AI models.

Supporting popular Genai tools like Openai’s ChatGpt, LLMS works particularly well with RAW text. So, companies that want to create their own AI workflows know that it is extremely important to store and index data in a clean format so that this data can be reused for AI processing.

Unlike most OCR APIs, Mistral OCR is a multimodal API. This means that it can be detected when there are diagrams or photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output.

Mistral OCR doesn’t just output a large wall of text. The output is formatted with Markdown, a formatting syntax that developers use to add links, headers and other formatting elements to plain text files.

Image credit: Mistral

Previous ArticleAn intangible, uncoded 3D creation tool for filmmakers and game designers raises $4 million

Next Article Openai’s Ex-Policy Lead criticizes the company for “rewriting” the AI safety history

user

Leave A Reply