Early written Latvian texts are important sources not only for humanities, but also in culture and social studies. Unfortunately, being scattered in different libraries and archives (in different countries), they have not been much investigated; they are very much treated isolated and in many cases are used for quite narrow purposes. There was a serious lack of general overviews introducing the sources and studies on them, and more important, even now there are still a few interdisciplinary studies carried out. Fortunately, the last two decades have seen a growth in popularization and dissemination of the early written sources. The 21st c. brought new chances for lesser-used and lesser-studied languages, namely, the era of digitalization has resulted in the development of different general and special corpora.
The diachronic Corpus of early written Latvian was launched in 2003 and is intended to cover the history of written Latvian of the 16th–18th cc. (Andronova 2007). The aim of the corpus is to facilitate studies of early Latvian in general and to serve as the basis for the Historical dictionary of the Latvian language (this is a good example of successful co-operation between linguists and software engineers in creating a new kind of dictionary in Latvian lexicography; 1200 pilot entries are now available on the web: www.korpuss.lv/lvvv).
The development of the corpus has gone through several phases. Early written Latvian texts have been acquired thanks to close co-operation with Latvian and Lithuanian libraries, as well as with researchers across Europe interested in the history of early Latvian texts. Undergraduate students at the University of Latvia and St. Petersburg State University (Russia) have also been involved in the process of transliterating some texts during the compilation of the corpus. This has served to raise the interest of the history of the Latvian language, and subsequently some bachelors’ theses have been defended on the basis of these texts.
The first digitized text copies were handed over to the National Library of Latvia in 2002. Some new sources have been discovered since then: thus e.g. a unique copy of Agenda Parva (1622), earlier reported unknown, has recently been published on the website of the Warmia-Mazury Digital Library (http://wmbc.olsztyn.pl/dlibra/doccontent?id=926). We are presently processing Latvian fragments in this Agenda that will be added to the corpus.
One of the challenges in this work is the crucial need of comparison between different editions of the same source, as well as an analysis showing the tradition of circulation of different parts of religious texts from one source to another.
One of the advantages of this corpus is that it provides the exact location of a word-form (usually the abbreviation of the source, page and line number of the text or the Bible Book, chapter and verse). This makes it easy to accurately cite the corpus data. There is a possibility to look at facsimiles of the sources as well, which gives an extra added-value to this resource.
All sources in the corpus are included in toto, no samples are chosen. Quite a wide range of short texts has either been added to the corpus recently or is presently in the process of being included; these texts can be divided into 3 groups:
1) individual short texts, e.g., occasional poetry, oath texts;
2) Latvian texts found in sources written in other foreign languages, e.g., the prayer Pater Noster published in the 16th c.; sentences in Latvian in several editions of ‘Stratagema oeconomicum oder Akker-Student’ written in German by S. Gubert in the 17th c. or Latvian text in Agenda Parva (1622 and later editions);
3) shorter texts in Latvian appended to some individual Latvian sources.
The description of these three groups and the methodology of their inclusion in the corpus is the topic of the present study.
1. Individual short texts
These include both poetry and certain legal texts (different oaths, laws of war court). The bulk of the sources in this group is occasional poetry, written in the 17th and 18th c.
The beginnings of Latvian occasional poetry have recently been the object of in-depth studies. A broad inspection of the 16th and 17th c. poetry in the cultural context has been carried out by Māra Grudule (2017). The book shows the long way of evolution of this type of texts: they were profoundly influenced by German culture but later little by little turned into Latvian poetry. Three early dedication poems were already added to the corpus in 2016. In 2019 around 70 poems from 15 sources have been collected in different libraries and are now in the process of being included in the corpus. One of these new poems is a unicum kept at the Russian National Library – ‘Mūsu visu upurs tai priecas dienā’ (1791). These new poems are of wide thematic range, covering different occasions – birthday congratulations, wedding songs, popular New Year’s wishes, which can be printed on cards or written in letters, funeral songs and others.
These songs may be interesting not only for literature and linguistic studies, but also in order to examine the culture, history and ethnography in Livonia at that time. One can examine New Years dedication poems in ‘Jaunā Gada vēlēšanas pēc ikkatra gribēšanas’ (1781) and ‘Jaunā Gada vēlēšanas’ (1793) not only for literary analysis, but also to understand the soul, psychology and manners of people. Thus, we would like to encourage not only linguistic, but all other kinds of studies by means of the corpus. These texts will be included in the Corpus as individual sources.
2. Latvian inscriptions in texts written in other languages
This group covers single words, phrases, sentences and longer passages in Latvian in books printed in other languages. Latvian proper names – personal names and places names – have been found in several sources dated to the 15th century (e.g. chronicles). The lists of craftsmen guilds from the 16th c. should be examined and excerpted for the purposes of the corpus). The history of written Latvian rises with the period of Reformation and the claim of Martin Luther to use native language. There are already a number of prayers Pater Noster from the 16th c. in the corpus, before including them a linguistic analysis was performed in order to define which prayer to include (see Vanags 2014).
At the moment 2 new sources are being processed for inclusion in the corpus:
(1) Agenda Parva (1622) with its texts written in Polish, German, Estonian and Latvian. For the needs of the corpus only the Latvian sentences are excerpted and processed, and a Latvian word-list will be created on the basis of this material.
(2) The popular 17th c. book by S. Gubert, ‘Stratagema oeconomicum oder Akker-Student’ (1st ed. 1645 and later editions in the 17th c.), is a good example of so-called Hausväterliteratur and is a valuable source for ethnographical studies among others (e.g. the description of instruments and agriculture cultures known in Livonia at that time; ‘Bauer=Prognosticon’ for weather forecast is often mentioned, later included in the volumes of Latvian beliefs compiled by P. Šmits (1940-1941). In this book we can find Latvian phrases and hymnals at the end (last edition printed in 1757 excludes hymnals). Single words and phrases are encountered within the German sentences, commonly introduced by the phrase ‘die Bauern nennen’, e.g. names of insects (circiņš ‘criket’), names of plants (vavieriņi ‘marsh tea’), phrases like dvēsel laiks liter. ‘time of souls’ meaning ‘time span between Michael’s Day (29th of September) and Martin’s Day (10th of November)). In this case the whole sentence will be copied and marked as German, but only the Latvian phrase will be included in the word list. There are some hymnals added at the end of the book both in German and Latvian (most probably the songs were translated by S. Gubert himself, the last edition printed in 1757 lacks songs). All the songs will be included in the corpus in order to facilitate the analysis of the source text in German and its translation into Latvian.
3. Texts in Latvian added (later) to some individual Latvian sources
At the moment we have only one such source – a letter written by the peasant Anšs to the priest Loder dated June 1771 and added to the transcript of the ‘Lettisches und Teutsches Wörterbuch’ by Ch. Fürecker. This letter has already been included in the Corpus (http://senie.korpuss.lv/static/V1771_SZA.html) as a separate item.
The development of the Corpus of early written Latvian texts ‘SENIE’ is an on-going activity within other research projects; in 2018–2020 it is funded by the State Research program ‘The Latvian Language’ (No. VPP-IZM-2018/2-0002).
Andronova, E.. The Corpus of Early Written Latvian: current state and future tasks. In: Proceedings of Corpus Linguistics, 2007, Birmingham, UK. Available at: http://ucrel.lancs.ac.uk/publications/CL2007/paper/245_Paper.pdf
Grudule Māra. Latviešu dzejas sākotne 16. un 17. gadsimtā kultūrvēsturiskos kontekstos. Rīga (2017).
Vanags Pēteris . Latviešu valodas vēsturiskās vārdnīcas projekts. In: Valodas prakse: Vērojumi un ieteikumi. Rīga (2014), pp. 97–109.