Refine
Year of publication
- 2012 (12) (remove)
Document Type
- Lecture (5)
- Preprint (3)
- Part of a Book (2)
- Working Paper (2)
Language
- English (12) (remove)
Keywords
- Computerunterstützte Lexikographie (3)
- Historische Lexikographie (3)
- historical lexicography (3)
- Ausbildung (2)
- Beruf (2)
- Digitalisierung (2)
- Driving forces (2)
- Fachkraft (2)
- Geoinformationssystem (2)
- Landnutzung (2)
Has Fulltext
- yes (12)
The FEW is a huge dictionary when we consider the sheer mass of data (25 volumes, 16000 pages) and its exhaustive aims. It has indeed the purpose of registering and etymologizing the whole lexicon, not only of French, but also of earlier stages of the language and of Occitan; of every Gallo-romance dialect; of every technical or professional genre; of every language register, including slang. Summing up, the FEW aims to include and describe every single lexical unit which exists or has existed in the territory of ancient Gaul. The sheer size of this undertaking means two things, which directly influence the digitalisation of the dictionary: Firstly, there is a a huge amount of data; secondly, the presentation and organization of the data is exceedingly complex. The reasons for digitalising the FEW are the easy searches for units, and the carrying out of searches using criteria that are not possible to use with the printed version. However, the fulfillment of these purposes includes some risks, and potentially the cutting of some corners, especially the temptation of renouncing reading.
Numerous high-quality primary text sources—in the context of the curation project described here, this means full-text transcriptions (and corresponding image scans) of German works originating from the 15th to the 19th centuries—are scattered among the web or stored remotely. E.g., transcriptions of historical sources are stored locally on degrading recording media and cannot be found, let alone accessed by third parties. Additionally, idiosyncratic, project-specific markup conventions and uncommon, out-of-date or inflexible storage formats often hinder further usage and analysis of the data. Often, textual resources are accompanied by scarce, insufficient or inaccurate bibliographic information, which is only one further reason why valuable resources, even if available on the web, remain undiscovered by and are of little use to the wider research community. The integration of these dispersed primary text sources into the sustainable, web and centres-based research infrastructure of CLARIN-D will be an important step to solve this problem. The Full Paper illustrates an exemplary approach taken by the »Deutsches Textarchiv« (DTA; www.deutschestextarchiv.de) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) to integrate dispersed textual resources and corresponding image scans from various sources into a large historical text corpus of its own and to insert these into the infrastructure of CLARIN-D.