A Short Introduction to Text Recognition

The text recognition capabilities of AI models have significantly improved in the last few years (and particularly since the widespread introduction of LLMs of the GPT type). These advances apply both to printed texts in obsolete fonts and to handwritten text recognition (HTR, or HWR, i.e., handwriting recognition), thus allowing for the automated transcription of large source corpora. As a result, a steadily growing number of historical sources is made accessible to researchers and the interested public.Aside from researchers from fields such as history, philology or literature studies, many users are researching their family history with the help of Transkribus: https://www.transkribus.org/genealogy That said, the recent success and the marketing hype with exaggerated claims about the potential of consumer ai models, have raised unrealistic expectations, leaving new users disillusioned with the underwhelming first results. The quality of the transcription, as beginners will find out soon, can wildly vary depending on document properties such as language, script or period. Despite all the improvements, accurate text recognition still requires a considerable amount of work to do and time to invest. However, a basic understanding of the operating mode and key functionalities of AI-based text recognition, helps achieve significantly better results. By following a few ground rules, users can develop a workflow specific to the source material they are working on.

There is no such thing as the "best" text recognition model.—The accuracy of AI text recognition, measured in character error rate (CER) and word error rate (WER), depends on the extent to which a model has been trained on source corpora from a specific language, script, style of penmanship, and time period.⊕ Transkribus Model Card
Model card for the Transkribus model DiJeSt3.0, trained on sources in Hebrew, Ladino, and Yiddish (https://app.transkribus.org/models/public/text/dijest-30).

Think of AI text recognition as automation—the AI model compares and identifies patterns based on a) the training data, and b) in case that you use a customized model, on your own transcription work. This (mostly human) groundwork determines the quality of the results at least as much as the inherent predictive power of the AI system.

Know your tools—All text recognition systems have their specific limitations and challenges (see below the sections on Transkribus and Google's Gemini.

Two Approaches

As of January 2026, there are two methodological approaches to tackle the problem of AI text recognition/transcription: Transkribus, the pioneer in AI transcription for researchers, relies on specialist models with a narrow scope: A text model is an AI algorithm that has been trained on a specific set of data, including images and transcriptions. Its purpose is to accurately determine the most likely sequence of characters for each section of handwritten text. There isn't a universal model that applies to all types of handwriting. Therefore, it is essential to choose the most suitable model for the script, language and time period of your documents. (https://help.transkribus.org/automatically-transcribing-your-documents)

The second approach is the one used by the competition: general purpose/multimodal AI models with household names such as ChatGPT or Google Gemini. Trained on a vast amount of data (sometimes of legally grey origin), this model type promises to master a whole range of tasks. The question which model type will prevail—the custom-made model, designed for a specific purpose, or the off-the-shelf all-rounder—has been subject of a long-standing debate in machine learning. In an often cited essay (The Bitter Lesson, 2019), the computer scientist Richard Sutton argues in favour of the generalists: :

Sudelbuch

A Short Introduction to Text Recognition

Two Approaches

Transkribus

Gemini (Google AI Studio)

References:

Software, AI Models, and Repositories: