CORPUSLAD
A COLLECTION OF TEXTS FOR STUDYING THE LANGUAGE
In the context of the TALES project on the automated processing of the Ladin language, launched in 1999 in collaboration with the Institute for scientific and technological research (ISTR) in Trento, organized collections of Ladin texts were created, both in the standard and in the single languages. This collection, which is called CORPUSLAD, was later integrated into the TALL platform in order to optimize interaction between the various language tools available. The corpora collected here (concerning all the Ladin Dolomitan variants) contain a total of 15,500,000 words. The texts selected cover a period extending from 1800 to the present day, with a prevalence of texts pertaining to the second half of the 20th century. In order to guarantee a certain balance between the various types, both literary texts (prose, poetry, theatre, memoirs, texts on folklore and traditions, prayer books) and non-literary texts (legal and administrative texts, forms, texts with journalistic and pragmatic information, texts divulging scientific and cultural information and educational texts) were included.
The Fassa text corpus is currently the one at the most advanced stage of elaboration. Its structure, which provides important information for every text (date, place of origin, type of text, author), allows you to refine your search according to a series of predetermined criteria.
The corpora can be viewed using the concordancer, a tool developed ad hoc and aimed above all at linguists and scholars of the Ladin language; it allows you to analyse the texts by seeking concordances, classifications and frequencies based on the KWIC (Keyword In Context method, meaning that the word being sought is displayed with its context).