Sudelbuch


Transcribing a 1710 Case History with AI Tools

The transcription of manuscripts has always required certain skills: familiarity with historical handwriting scripts, domain knowledge, and experience with the particularities of a source corpus. This is still the case but the recent advances in AI text recognition have the potential to automate the workflow to some extent and reduce hours of tedious labour. In this post, I will discuss the capabilities and challenges of AI supported transcription, using the example of a medical case history from early modern Spain (for a general overview of the topic, see A Short Introduction to Text Recognition).

The Source

The samples below are part of Francisco Fernández Navarrete’s manuscripts in the archive of the Real Academia Nacional de Medicina (Manuscript M-10-07-M-01). Fernández Navarrete hold the prime Chair of Medicine (cátedra de Prima) at the University of Granada and he was a court physician, i.e., a member of the medical team in charge of taking care of king Philip V. and the royal family (for more on Fernández Navarrete, see To Read Like an Inquisitor and Measuring Public Health with the Barometer).

The papers offer meaningful insights into theory and practice of Spanish medicine in the early modern period: Prescriptions and formulations, case histories, excerpts and summaries of medical literature. The diagnostic methods reflect the early interpretation of Hippocratic and Galenic ideas, with a focus on the core principles of their humoral theory: the predisposition for certain diseases according to body type and character, the reading the color and texture of the tongue, or the examination of the composition and coloration of bodily excretions. The treatments administered by Fernández Navarrete, were accordingly: bloodletting (sangrı́a) as go-to remedy and medications to stimulate the excretion of pathogenic matter, i.e., purgatives, diuretics, and diaphoretics.

From a codicological perspective, the papers—comprising of four fascicles—appear to be in good condition except for minor (water?) damages. The cataloging, however, was presumably done in a hasty way: An examination of the content indicates that the binding does not always follow a continous chronological order, the dating of the "1703" volume is clearly incorrect given that passages in several folios refer to later dates, and someone began an index but did not get beyond the first few letters of the alphabet.The index is written in a different hand, likely by Tomás Francisco Monleón y Ramiro who is listed in the metadata of the manuscript and was an academy member at the time of Fernández Navarrete’s death.

Transcribing the Source Text (the Traditional Way)

Transcription tasks are an exercise in historical semantics: The meaning and connotations of words—especially scientific terms—do change over time, sometimes fundamentally and in unexpected ways. As for the early modern period, authors often had to create neologisms to find a language for new scientific concepts, phenomena, and technologies. However, not all novel terms became part of the modern scientific vocabulary, quite a few of them were only for some time in use and disappeared subsequently. Historians, therefore, rely on a number of resources to identify the specific usage of a term in different domains, regions, and time periods. Being familiar with the particular vocabulary of, say, physicians in early 18th century Spain, is, of course, of great help to the researcher trying to identify handwritten texts (especially if confronted with illegible penmanship). As for Fernández Navarrete's papers, I have listed some of the relevant resources in the references below: semantic databases, historical dictionaries, or studies of the medical and scientific vocabulary of the time (Gómez de Enterrı́a 2015; Garcı́a de Cortázar 2023). Here at first my transcription of the first page of a case history, documented by Fernández Navarrete in 1710:

D.D.=Don Josseph Morales de Edad de 34. Años, temperamento
Colerico, pulso fuerte y veloz, Sugeto gracil,
cabeza pequeña, pecho grande, pelo rubio, promptitud
de ingenio, ligereza de cuerpo, iracundo; por el mes deFernández de Navarrete: Consultatio, 1703: 145r (PDM 1.0)
Fernández de Navarrete 1703: 145r (click to enlarge)

Julio deste Año de 1710 aviendo precedido una no ligera
passion de animo y con una leve sospecha de crudeza,
alas nueve de la noche tubo un frio que le duro por
media hora con algun fastidio en el estomago y nauceas;
entrole Calentura no mui fuerte, que duro nueve
horas y se termino sin Sudor. Lengua humeda amarga,
orina flava, y cruda. Es muy facil de Vientre, y
despues de quitada la calentura tomó en una babaza
de Zargatona Jarave de Carthamo ℥ijß. con que en
tres horas, hizo ocho cursos con gran benignidad.
estubo alibiado, y al dia siguiente alas onze del
dia Siguiente dia le entró otro frio algo menor con
dos Vomitos de colera flava, y dos cursos de lo mismo.
Siguiesse la Calentura mucho maior, con grandes
inquietudes, pulso velosissimo y crebro, orinas crudas.
Francisco Fernández de Navarrete, Consultatio (1703: 145r)


AI Transcription (Gemini CLI, Transkribus)

I have prompted then the same page as a transcription task to two different AI systems: Google's multimodal model Gemini CLI (https://geminicli.com/ and the latest release of the text recognition model Text Titan by Transkribus (https://www.transkribus.org/). Here are the outputs generated by the two models, plus my own transcription:

Transkribus Gemini Human
Don Josieph Morales de Erdad de 34 An.os, tempe-
ramento Colerico, pulso fuerte y veloz, Ingeto gracil,
cabeza pequiuna, pecho grande, pelo rubio, promptitud
de ingenio, ligereza de cuerpo, iracundo, por el mes de
Julio desti Anno de 1710. aviendo precedido una no ligera
passión de animo, y con una leue sospecha de crudero,
a las nueve de la noche tubo un frío que le duro por
media hora con algún fastidio en el estomago y nances
entrole Calentura no mui fuerte, que duro nueve
horas y se termino sin sudar. Lenqua humeda amorga
Orina Flava, y cruda. Es muy fácil de vientre,
después de quitada la calentura, tomo en una babase
de Zargatona Iarane de Carthamo huß. Con que en
tres horas, hizo Ocho, cursos con gran benignidad.
estubo aliviado, y al dia siguiente alas onze del
dia siguiente dia le entro otro frío algo menor con
dos Vomitos de Colera Flava, y dos cursos de lo mismo.
Siguiosse la calentura mucho maior, con grandes in
quietudes, pulsos velocissimos y crebros orinas crudas

D. Josseph Morales de Edad de 34. Años tempe-
ramento Colerico, pulso fuerte y Veloz, Sugeto grad.
cabeza pequeña, pecho grande,pelo rubio, promptitud
de ingenio, ligereza de cuerpo, iracundo; por el mes de
Julio deste Año de 1710 aviendo precedido una no ligera
passion de animo, y con una leve Sospecha de crudeza
alas nueve de la noche tubo un frio que le duro por
media hora con algun fastidio en el estomago y nauceas;
entrole Calentura no mui fuerte, que duro nueve
horas, y se termino Sin Sudor. Lengua humeda amarga,
orina flava, y cruda. Es muy facil de Vientre, y
despues de quitada la calentura tomó en Una babaza
de Zargatona Jarave de Carthamo ℥ij s. conque en
tres horas, hizo ocho cursos con gran benignidad.
estubo alibiado, y al dia Siguiente alas onze del
dia Siguiente dia le entró Otro frio algo menor con
dos Vomitos de colera flava, y dos cursos de lo mismo.
Siguiesse la Calentura mucho maior, con grandes in-
quietudes, pulso velosissimo y crebras orinas crudas.

Josseph Morales de Edad de 34. Años, tempe-
ramento Colerico, pulso fuerte y veloz, Sugeto gracil,
cabeza pequeña, pecho grande, pelo rubio, promptitud
de ingenio, ligereza de cuerpo, iracundo; por el mes de
Julio deste Año de 1710 aviendo precedido una no ligera
passion de animo y con una leve sospecha de crudeza,
alas nueve de la noche tubo un frio que le duro por
media hora con algun fastidio en el estomago y nauceas;
entrole Calentura no mui fuerte, que duro nueve
horas y se termino sin Sudor. Lengua humeda amarga,
orina flava, y cruda. Es muy facil de Vientre, y
despues de quitada la calentura tomó en una babaza
de Zargatona Jarave de Carthamo ℥ijß. con que en
tres horas, hizo ocho cursos con gran benignidad.
estubo alibiado, y al dia siguiente alas onze del
dia Siguiente dia le entró otro frio algo menor con
dos Vomitos de colera flava, y dos cursos de lo mismo.
Siguiesse la Calentura mucho maior, con grandes
inquietudes, pulso velosissimo y crebro, orinas crudas.

A brief performance comparison: Gemini CLI vs Transkribus (Text Titan I ter):

Error rate
Gemini produces slightly but constantly superior results, probably due to its significantly bigger size of training data (it's Google after all, with access to the biggest database of web content).
Domain-specific terminology
Likely for the same reason, Gemini performs better when it comes to the correct identification of early modern medical terminology.
Special symbols:
Gemini did recognize (unlike Transkribus) the ounce symbol ℥ and the corresponding apothecary numbers (with a minor error, though). Neither model did display the strikethrough ("dia Siguiente").
Improvability
Users can influence the output quality of both systems with different methods: The performance of Transkribus models can be improved by further training while Gemini CLI can be provided with additional instructions stored in Markdown files (e.g., read this sequence of characters as xyz).

In short, Gemini CLI achieves a better performance (likely thanks to its superior training data volume). Both systems are struggling with domain-specific symbols such as historical apothecary units or alchemical symbols, and require additional user input. As for the tested source corpora (early modern Spanish and Latin), Gemini as well as Transkribus are capable of producing results of sufficient quality to be deployed in research projects as automation tools. However, additional input/training and revision by knowledgeable users are indispensable. To put it bluntly: Use AI as a transcription tool only if you have the ability to detect the inevitable mistakes the system will produce.

Finally, the main features of Gemini CLI and Transkribus in comparison:

Feature Gemini CLI Transkribus
AI Multimodal Text recognition
User Interface Terminal (CLI) Website
Interaction Prompts Upload, Feedback
Extras Agentic "Lightweight"
Customization Prompts, md-file Model selection, training
Hardware & software requirements 16GB+ RAM for large projects; Linux, macOS, Windows; Node.js 20+
Privacy/Safety Free tier: data may be used (depending on the respective legal regulations); secure path, docker sandbox EU law
Pricing Fixed price subscription, Pay-as-you-go 99 EUR per year
Free tier 1000 model requests / user / day 50 token per month (1 token = 1-2 pages)
Docs https://geminicli.com/docs/get-started/installation/ https://help.transkribus.org/


References:


Fernández de Navarrete, Francisco José (1703?): Varii medici ac phylosophici labores quos in laudem & honorem nusquam pro merito Catolicis laudibus esstati. Real Academia Nacional de Medicina, M-10-07-M-01 (Manuscript). http://bibliotecavirtual.ranm.es/ranm/es/consulta/registro.do?id=1240.


- Chinchilla, Anastasio (1841): Anales históricos de la medicina en general, y biográfico-bibliográficos de la Española en particular. Vol. 5. Valencia: Lopez y Compañia. https://archive.org/details/b2933858x_0005/.

- Corpus del Diccionario histórico de la lengua española (CDH) https://www.rae.es/banco-de-datos/cdh.

- Diccionario de la lengua española (DRAE). https://dle.rae.es/.

- Garcı́a de Cortázar Nebreda, Margarita (2023): La construcción de la fı́sica moderna en la sociedad española del siglo XVIII: obras, autores y públicos. Available at https://hdl.handle.net/10550/87887. PhD thesis. Valencia: Universitat de València.

- Gómez de Enterrı́a, Josefa (2015): El vocabulario de la medicina en el español del siglo XVIII. In: Actas del IX Congreso Internacional de Historia de la Lengua: Cádiz, 2012. Ed. by José Garcı́a Martı́n and Teresa Bastardı́n Candón. Frankfurt a. M., Madrid: Vervuert Verlagsgesellschaft, pp. 361–392. .

- Hernández Morejón, Antonio (1850): Historia bibliográfica de la medicina española, tomo VI. Madrid: Viuda de Jordán e Hijos.

– Hernández Morejón, Antonio (1852): Historia bibliográfica de la medicina española, tomo VII. Madrid: Viuda de Jordán e Hijos.

- Humphries, Mark (2025): Gemini 3 Solves Handwriting Recognition and it’s a Bitter Lesson. Nov 25, 2025. https://generativehistory.substack.com/p/gemini-3-solves-handwriting-recognition [retrieved Dec 14, 2025]

- Mackenzie, David (1997). A Manual of Manuscript Transcription for the Dictionary of the Old Spanish Language. Fifth Edition Revised and Expanded by Ray Harris-Northall. Madison: Hispanic Seminary of Medieval Studies. https://hispanicseminary.org/manual-en.htm.

- López Terrada, Marı́a Luz, José Luis Fresquet Febrer, and Carla Pilar Aguirre Marco (2008): Hernández Morejón, Anastasio Chinchilla y la Historia de la Medicina Española. Cuadernos Valencianos de Historia de la Medicina y de la Ciencia, Vol. LVII. Valencia: Instituto de Historia de la Medicina y de la Ciencia, Universitat de València. http://hdl.handle.net/10261/83439.

- Nomdedeu Rull, Antoni and Sandra Iglesia Martı́n (2013): Diccionario Histórico del Español moderno de aparatos de fı́sica experimental: documentación de los términos del siglo XVIII. In: Asclepio 65.2. doi:10.3989/asclepio.2013.20.

- Perseus Digital Library, Corpus of Greek and Latin texts. Tufts University. https://www.perseus.tufts.edu/hopper/.

- Sutton, Richard (2019): The Bitter Lesson. March 13, 2019. http://www.incompleteideas.net/IncIdeas/BitterLesson.html [retrieved Nov 19, 2025]

- Terras Melissa, Anzinger B., Gooding P. et al. (2025): The artificial intelligence cooperative: READ-COOP, Transkribus, and the benefits of shared community infrastructure for automated text recognition. Open Res Europe 2025, 5:16, https://doi.org/10.12688/openreseurope.18747.2.


AI Models


- Transkribus Public AI Model Hub https://app.transkribus.org/models/public


- HTR Model Spanish Gothic Incunabula (HSMS) (Zenodo Repository; CC BY-NC 4.0) https://zenodo.org/records/14171448

- Gemini CLI https://geminicli.com/ // https://google-gemini.github.io/gemini-cli/

- Google AI Lab: Gemini API https://ai.google.dev/gemini-api/

(Jürgen Stowasser)