Transcribing a 1710 Case History with AI Tools
The transcription of manuscripts has always required certain skills: familiarity with historical handwriting scripts, domain knowledge, and experience with the particularities of a source corpus. This is still the case but the recent advances in AI text recognition have the potential to automate the workflow to some extent and reduce hours of tedious labour. In this post, I will discuss the capabilities and challenges of AI supported transcription, using the example of a medical case history from early modern Spain (for a general overview of the topic, see A Short Introduction to Text Recognition).
The Source
The samples below are part of Francisco Fernández Navarrete’s manuscripts in the archive of the Real Academia Nacional de Medicina (Manuscript M-10-07-M-01). Fernández Navarrete hold the prime Chair of Medicine (cátedra de Prima) at the University of Granada and he was a court physician, i.e., a member of the medical team in charge of taking care of king Philip V. and the royal family (for more on Fernández Navarrete, see To Read Like an Inquisitor and Measuring Public Health with the Barometer).
The papers offer meaningful insights into theory and practice of Spanish medicine in the early modern period: Prescriptions and formulations, case histories, excerpts and summaries of medical literature. The diagnostic methods reflect the early interpretation of Hippocratic and Galenic ideas, with a focus on the core principles of their humoral theory: the predisposition for certain diseases according to body type and character, the reading the color and texture of the tongue, or the examination of the composition and coloration of bodily excretions. The treatments administered by Fernández Navarrete, were accordingly: bloodletting (sangrı́a) as go-to remedy and medications to stimulate the excretion of pathogenic matter, i.e., purgatives, diuretics, and diaphoretics.
From a codicological perspective, the papers—comprising of four fascicles—appear to be in good condition except for minor (water?) damages. The cataloging, however, was presumably done in a hasty way: An examination of the content indicates that the binding does not always follow a continous chronological order, the dating of the "1703" volume is clearly incorrect given that passages in several folios refer to later dates, and someone began an index but did not get beyond the first few letters of the alphabet.The index is written in a different hand, likely by Tomás Francisco Monleón y Ramiro who is listed in the metadata of the manuscript and was an academy member at the time of Fernández Navarrete’s death.
Transcribing the Source Text (the Traditional Way)
Transcription tasks are an exercise in historical semantics: The meaning and connotations of words—especially scientific terms—do change over time, sometimes fundamentally and in unexpected ways. As for the early modern period, authors often had to create neologisms to find a language for new scientific concepts, phenomena, and technologies. However, not all novel terms became part of the modern scientific vocabulary, quite a few of them were only for some time in use and disappeared subsequently. Historians, therefore, rely on a number of resources to identify the specific usage of a term in different domains, regions, and time periods. Being familiar with the particular vocabulary of, say, physicians in early 18th century Spain, is, of course, of great help to the researcher trying to identify handwritten texts (especially if confronted with illegible penmanship). As for Fernández Navarrete's papers, I have listed some of the relevant resources in the references below: semantic databases, historical dictionaries, or studies of the medical and scientific vocabulary of the time (Gómez de Enterrı́a 2015; Garcı́a de Cortázar 2023). Here at first my transcription of the first page of a case history, documented by Fernández Navarrete in 1710:
D.D.=Don Josseph Morales de Edad de 34. Años, temperamento
Colerico, pulso fuerte y veloz, Sugeto gracil,
cabeza pequeña, pecho grande, pelo rubio, promptitud
de ingenio, ligereza de cuerpo, iracundo; por el mes de
Fernández de Navarrete 1703: 145r (click to enlarge)
Julio deste Año de 1710 aviendo precedido una no ligera
passion de animo y con una leve sospecha de crudeza,
alas nueve de la noche tubo un frio que le duro por
media hora con algun fastidio en el estomago y nauceas;
entrole Calentura no mui fuerte, que duro nueve
horas y se termino sin Sudor. Lengua humeda amarga,
orina flava, y cruda. Es muy facil de Vientre, y
despues de quitada la calentura tomó en una babaza
de Zargatona Jarave de Carthamo ℥ijß. con que en
tres horas, hizo ocho cursos con gran benignidad.
estubo alibiado, y al dia siguiente alas onze del
dia Siguiente dia le entró otro frio algo menor con
dos Vomitos de colera flava, y dos cursos de lo mismo.
Siguiesse la Calentura mucho maior, con grandes
inquietudes, pulso velosissimo y crebro, orinas crudas.
Francisco Fernández de Navarrete, Consultatio (1703: 145r)
AI Transcription (Gemini CLI, Transkribus)
I have prompted then the same page as a transcription task to two different AI systems: Google's multimodal model Gemini CLI (https://geminicli.com/ and the latest release of the text recognition model Text Titan by Transkribus (https://www.transkribus.org/). Here are the outputs generated by the two models, plus my own transcription:
| Transkribus | Gemini | Human |
|---|---|---|
| Don Josieph Morales de Erdad de 34 An.os, tempe- ramento Colerico, pulso fuerte y veloz, Ingeto gracil, cabeza pequiuna, pecho grande, pelo rubio, promptitud de ingenio, ligereza de cuerpo, iracundo, por el mes de Julio desti Anno de 1710. aviendo precedido una no ligera passión de animo, y con una leue sospecha de crudero, a las nueve de la noche tubo un frío que le duro por media hora con algún fastidio en el estomago y nances entrole Calentura no mui fuerte, que duro nueve horas y se termino sin sudar. Lenqua humeda amorga Orina Flava, y cruda. Es muy fácil de vientre, después de quitada la calentura, tomo en una babase de Zargatona Iarane de Carthamo huß. Con que en tres horas, hizo Ocho, cursos con gran benignidad. estubo aliviado, y al dia siguiente alas onze del dia siguiente dia le entro otro frío algo menor con dos Vomitos de Colera Flava, y dos cursos de lo mismo. Siguiosse la calentura mucho maior, con grandes in quietudes, pulsos velocissimos y crebros orinas crudas |
D. Josseph Morales de Edad de 34. Años tempe- ramento Colerico, pulso fuerte y Veloz, Sugeto grad. cabeza pequeña, pecho grande,pelo rubio, promptitud de ingenio, ligereza de cuerpo, iracundo; por el mes de Julio deste Año de 1710 aviendo precedido una no ligera passion de animo, y con una leve Sospecha de crudeza alas nueve de la noche tubo un frio que le duro por media hora con algun fastidio en el estomago y nauceas; entrole Calentura no mui fuerte, que duro nueve horas, y se termino Sin Sudor. Lengua humeda amarga, orina flava, y cruda. Es muy facil de Vientre, y despues de quitada la calentura tomó en Una babaza de Zargatona Jarave de Carthamo ℥ij s. conque en tres horas, hizo ocho cursos con gran benignidad. estubo alibiado, y al dia Siguiente alas onze del dia Siguiente dia le entró Otro frio algo menor con dos Vomitos de colera flava, y dos cursos de lo mismo. Siguiesse la Calentura mucho maior, con grandes in- quietudes, pulso velosissimo y crebras orinas crudas. |
Josseph Morales de Edad de 34. Años, tempe- ramento Colerico, pulso fuerte y veloz, Sugeto gracil, cabeza pequeña, pecho grande, pelo rubio, promptitud de ingenio, ligereza de cuerpo, iracundo; por el mes de Julio deste Año de 1710 aviendo precedido una no ligera passion de animo y con una leve sospecha de crudeza, alas nueve de la noche tubo un frio que le duro por media hora con algun fastidio en el estomago y nauceas; entrole Calentura no mui fuerte, que duro nueve horas y se termino sin Sudor. Lengua humeda amarga, orina flava, y cruda. Es muy facil de Vientre, y despues de quitada la calentura tomó en una babaza de Zargatona Jarave de Carthamo ℥ijß. con que en tres horas, hizo ocho cursos con gran benignidad. estubo alibiado, y al dia siguiente alas onze del dos Vomitos de colera flava, y dos cursos de lo mismo. Siguiesse la Calentura mucho maior, con grandes inquietudes, pulso velosissimo y crebro, orinas crudas. |
A brief performance comparison: Gemini CLI vs Transkribus (Text Titan I ter):
- Error rate
- Gemini produces slightly but constantly superior results, probably due to its significantly bigger size of training data (it's Google after all, with access to the biggest database of web content).
- Domain-specific terminology
- Likely for the same reason, Gemini performs better when it comes to the correct identification of early modern medical terminology.
- Special symbols:
- Gemini did recognize (unlike Transkribus) the ounce symbol ℥ and the corresponding apothecary numbers (with a minor error, though). Neither model did display the strikethrough ("
dia Siguiente"). - Improvability
- Users can influence the output quality of both systems with different methods: The performance of Transkribus models can be improved by further training while Gemini CLI can be provided with additional instructions stored in Markdown files (e.g., read this sequence of characters as xyz).
In short, Gemini CLI achieves a better performance (likely thanks to its superior training data volume). Both systems are struggling with domain-specific symbols such as historical apothecary units or alchemical symbols, and require additional user input. As for the tested source corpora (early modern Spanish and Latin), Gemini as well as Transkribus are capable of producing results of sufficient quality to be deployed in research projects as automation tools. However, additional input/training and revision by knowledgeable users are indispensable. To put it bluntly: Use AI as a transcription tool only if you have the ability to detect the inevitable mistakes the system will produce.
Finally, the main features of Gemini CLI and Transkribus in comparison:
| Feature | Gemini CLI | Transkribus |
|---|---|---|
| AI | Multimodal | Text recognition |
| User Interface | Terminal (CLI) | Website |
| Interaction | Prompts | Upload, Feedback |
| Extras | Agentic | "Lightweight" |
| Customization | Prompts, md-file | Model selection, training |
| Hardware & software requirements | 16GB+ RAM for large projects; Linux, macOS, Windows; Node.js 20+ | — |
| Privacy/Safety | Free tier: data may be used (depending on the respective legal regulations); secure path, docker sandbox | EU law |
| Pricing | Fixed price subscription, Pay-as-you-go | 99 EUR per year |
| Free tier | 1000 model requests / user / day | 50 token per month (1 token = 1-2 pages) |
| Docs | https://geminicli.com/docs/get-started/installation/ | https://help.transkribus.org/ |
References:
Fernández de Navarrete, Francisco José (1703?): Varii medici ac phylosophici labores quos in laudem & honorem nusquam pro merito Catolicis laudibus esstati. Real Academia Nacional de Medicina, M-10-07-M-01 (Manuscript). http://bibliotecavirtual.ranm.es/ranm/es/consulta/registro.do?id=1240.
- Chinchilla, Anastasio (1841): Anales históricos de la medicina en general, y biográfico-bibliográficos de la Española en particular. Vol. 5. Valencia: Lopez y Compañia. https://archive.org/details/b2933858x_0005/.
- Corpus del Diccionario histórico de la lengua española (CDH) https://www.rae.es/banco-de-datos/cdh.
- Diccionario de la lengua española (DRAE). https://dle.rae.es/.
- Garcı́a de Cortázar Nebreda, Margarita (2023): La construcción de la fı́sica moderna en la sociedad española del siglo XVIII: obras, autores y públicos. Available at https://hdl.handle.net/10550/87887. PhD thesis. Valencia: Universitat de València.
- Gómez de Enterrı́a, Josefa (2015): El vocabulario de la medicina en el español del siglo XVIII. In: Actas del IX Congreso Internacional de Historia de la Lengua: Cádiz, 2012. Ed. by José Garcı́a Martı́n and Teresa Bastardı́n Candón. Frankfurt a. M., Madrid: Vervuert Verlagsgesellschaft, pp. 361–392. .
- Hernández Morejón, Antonio (1850): Historia bibliográfica de la medicina española, tomo VI. Madrid: Viuda de Jordán e Hijos.
– Hernández Morejón, Antonio (1852): Historia bibliográfica de la medicina española, tomo VII. Madrid: Viuda de Jordán e Hijos.
- Humphries, Mark (2025): Gemini 3 Solves Handwriting Recognition and it’s a Bitter Lesson. Nov 25, 2025. https://generativehistory.substack.com/p/gemini-3-solves-handwriting-recognition [retrieved Dec 14, 2025]
- Mackenzie, David (1997). A Manual of Manuscript Transcription for the Dictionary of the Old Spanish Language. Fifth Edition Revised and Expanded by Ray Harris-Northall. Madison: Hispanic Seminary of Medieval Studies. https://hispanicseminary.org/manual-en.htm.
- López Terrada, Marı́a Luz, José Luis Fresquet Febrer, and Carla Pilar Aguirre Marco (2008): Hernández Morejón, Anastasio Chinchilla y la Historia de la Medicina Española. Cuadernos Valencianos de Historia de la Medicina y de la Ciencia, Vol. LVII. Valencia: Instituto de Historia de la Medicina y de la Ciencia, Universitat de València. http://hdl.handle.net/10261/83439.
- Nomdedeu Rull, Antoni and Sandra Iglesia Martı́n (2013): Diccionario Histórico del Español moderno de aparatos de fı́sica experimental: documentación de los términos del siglo XVIII. In: Asclepio 65.2. doi:10.3989/asclepio.2013.20.
- Perseus Digital Library, Corpus of Greek and Latin texts. Tufts University. https://www.perseus.tufts.edu/hopper/.
- Sutton, Richard (2019): The Bitter Lesson. March 13, 2019. http://www.incompleteideas.net/IncIdeas/BitterLesson.html [retrieved Nov 19, 2025]
- Terras Melissa, Anzinger B., Gooding P. et al. (2025): The artificial intelligence cooperative: READ-COOP, Transkribus, and the benefits of shared community infrastructure for automated text recognition. Open Res Europe 2025, 5:16, https://doi.org/10.12688/openreseurope.18747.2.
AI Models
- Transkribus Public AI Model Hub https://app.transkribus.org/models/public
- HTR Model Spanish Gothic Incunabula (HSMS) (Zenodo Repository; CC BY-NC 4.0) https://zenodo.org/records/14171448
- Gemini CLI https://geminicli.com/ // https://google-gemini.github.io/gemini-cli/
- Google AI Lab: Gemini API https://ai.google.dev/gemini-api/
(Jürgen Stowasser)