Analysing Historical Data with the Help of Digital Methods
Thelemmatizationof the documents improves the accuracy of queries put to the data. For instance, in a lemmatized text search tools can differentiate between Lat. populus(the people) und Lat. pōpulus(the poplar). It is also possible to take into account different cases (e.g. Lat. popul-orum), graphical differences (e.g. Lat.popvlvsinstead of Lat. populus) and syntactic relations (e.g. concord in case, number and gender).
These features allow for the precise corpus-based analysis of linguistic phenomena. For instance, Cassiodorus announces in the preface to his work that he will use three different registers throughout the Variae, in accordance with the status of the respective letter’s addressee. These different styles can be investigated in terms of various formal parameters (e.g. hapax legomena, type-token ratio, lexical density). In a second step, the results of linguistic analyses can be examined with regard to the letters’ content, as specified in the sets of metadata in question (e.g. the addressees’ level of education and/or the presence of digressions or other markers of learnedness).
Moreover, the lemmatization of the Variaeallows for the statistical evaluation of the data, e.g. regarding the distribution of specific words in all 470 letters, which may also be visualized and complemented by analyses of co-occurrence networks. The latter is a corpus-based method designed to detect and evaluate statistically several words (e.g. populusand Romanus) which occur in the same context. On a different note, a comparison of the Variaeto other relevant text corpora may bring to the fore intertextual references and help trace the development of the Latin languagefrom Late Antiquity to the Middle Ages.
Analysis of co-occurrence, example: Lat. populus (EHuDesktop)
Tagging letters with metadata allows scholars to filter large amounts of information as to the demands of their specific research questions. For instance, you may easily find all letters on matters of taxation issued by Theoderic the Great addressing state officials in Liguria between 507 and 513.
Annotating and commentingon, for example, the persons mentioned in the Variae,makes it possible to analyse their relationship in terms of both quantity and quality (e.g. kinship, political affiliations, friendships). On the basis of the data thus amassed, digital tools may create relationship networks pointing scholars to insightful overlaps and deviations.
Commentary on person “Alaric II” (QAnnotate) and annotated letter to Alaric II. (QAnnotate)
What is more, tagging the Variae with information allows for the visualization of relevant data. For this purpose, the annotation data is transformed into a graph model, which contains the respective relationships between letters, persons, formal and social classifications as well as space- and time-related parameters. In this way, spatial and relational relationships between persons and letters can be visualized in the form of geographical and social networks. In addition, filtering and visualization tools can be used to isolate subnetworks and to analyse correlations of various classifications.
Nodegoat-based visualization of social and geographic networks