Refine
Document Type
- Lecture (10) (remove)
Language
- English (10) (remove)
Keywords
- Historische Lexikographie (4)
- historical lexicography (4)
- Computerunterstützte Lexikographie (3)
- Digital Humanities (2)
- Digitalisierung (2)
- Entlehnung (2)
- Eurolatein (2)
- Europa (2)
- Latein (2)
- Latin loanwords (2)
Has Fulltext
- yes (10) (remove)
The Swedish Academy Dictionary (SAOB) is one of the big national dictionary projects started in the 19th century. SAOB is still in production – there are another two volumes out of 38 to printed before 2018. The structure inside the volumes is (of course) varied/varying. There are ten chief editors and five generations of editors involved in the project. In the 1980s the SAOB was OCR-scanned. The result was used for a webversion in the internet from 1997. The webversion is very frequently used but has a lot of shortcomings due to, among other things, a great typographic complexity and a scanning technology of the time. Now the editorial board is discussing the future: redigitalization (in China), updating of the webversion with new search tools, updating of the dictionary itself and some form of editing tool.
Norsk Ordbok is a 12 volume academic dictionary covering Norwegian Nynorsk literature and all Norwegian dialects from 1600 to the present. The dictionary is to be completed in 2014, the year of the bicentenary of the Norwegian constitution. The collection of data started in 1930 and the editing of the dictionary started in 1946. In the 1990s the Norwegian language collections were digitized, and from 2002 onwards Norsk Ordbok has been edited on a digital platform which communicates with a system of relational databases for manuscript storage. These databases include digitized slip archives, a draft manuscript from 1940, glossaries from the period between 1600 and 1850, canonical dictionaries from the period 1870-1910, bibliography, local dictionaries, text corpus (90 mill. words) etc. The source material is linked together in a Meta dictionary (MD). The MD is an electronic index with headwords in standard spelling, and it represents the hub of the language collections, where the source material from the databases is linked to headword nodes. This MD in turn communicates with the editing system and the dictionary database. The electronic linking up of the source material with the dictionary entries secures that the interpretation of data and product of scientific research can be reproducible in a very easy way. This is important to a scholarly dictionary. Further, the MD index system enables us to set a relative dimension for each dictionary entry and to make a master plan for setting alphabet dimensions for the whole dictionary. This is important to all modern dictionary projects with limited resources. The digitized source material, the digital editing platform and the digital dictionary product also point forward to new ways of presenting the data, and they point forward to future lexicographical research. The paper will present the digital resources of the Norsk Ordbok 2014 project, developed in close cooperation with the scientific programmers at the Unit of Digital Documentation at the University of Oslo. It will focus on the Norsk Ordbok 2014 experience with working on a fully digitized editing platform for the last 10 years, and it will also comment briefly on how the developed tools and resources point forward into Norwegian lexicography in the future.
In the last decade, interaction between scholarly lexicography and the public has grown enormously. While in the old days, the lexicographer and in particular, the scholarly lexicographer, had a tendency to describe the lexicon from an ivory tower, in a way that was for the general public rather unaccessible, a change has been evident for some time now. Interaction with the general public is now more and more appreciated and is even being stimulated within the lexicographic community. This holds too for the Algemeen Nederlands Woordenboek (ANW), a project of the Institute for Dutch Lexicology in Leiden. The ANW is an online scholarly dictionary of contemporary Dutch. In its periodization it is the successor of the Woordenboek der Nederlandsche Taal (WNT), which was completed in 2001 and covers the vocabulary of the Netherlands and Flanders up to around 1976. The editorial staff of the ANW would like to create a dictionary that is suitable for different audiences, ranging from language professionals and other academics to pupils, students and language enthusiasts in general. Consequently, interaction with the public is very important to the ANW editorial staff. It is realised in various ways. First, each dictionary article offers users the option to give feedback. Second, the editorial staff uses questions and comments gathered on internet forums, such as Meldpunt Taal (launched in June 2010) and Neo-term. The ANW staff also approaches the public directly through Twitter, with items such as ‘neologism of the week’, facts about spelling and answers to questions about language that have been received. A relatively new initiative is to call upon the public in the search for information for the dictionary, such as synonyms, pictures and the earliest use of words. Language games and word polls are other ways to increase the interest and involvement of the general public in the ANW.
The FEW is a huge dictionary when we consider the sheer mass of data (25 volumes, 16000 pages) and its exhaustive aims. It has indeed the purpose of registering and etymologizing the whole lexicon, not only of French, but also of earlier stages of the language and of Occitan; of every Gallo-romance dialect; of every technical or professional genre; of every language register, including slang. Summing up, the FEW aims to include and describe every single lexical unit which exists or has existed in the territory of ancient Gaul. The sheer size of this undertaking means two things, which directly influence the digitalisation of the dictionary: Firstly, there is a a huge amount of data; secondly, the presentation and organization of the data is exceedingly complex. The reasons for digitalising the FEW are the easy searches for units, and the carrying out of searches using criteria that are not possible to use with the printed version. However, the fulfillment of these purposes includes some risks, and potentially the cutting of some corners, especially the temptation of renouncing reading.
Even a reductionist attempt to define scholarship is clearly fraught with difficulty, but an idealised historical lexicographer-cum-scholar must obviously have – inter alia and at the very least – a profound linguistic and textual knowledge of the language being documented, an ability to understand texts in their historical context and to analyse the meaning or function of lexical items as used in context, an ability to synthesise the results through generalisation and abstraction and to formulate them in a way that is both accurate, i.e. reflects actual usage, and user- or reader-friendly, i.e. is comprehensible to the user/reader. S/he must have encyclopedic or world knowledge and literary skills in order to understand general content words and explain their meaning and their semantic shifts perhaps over many centuries, and technical expertise to understand specialist terms and define their use in specific contexts, again perhaps over time. In respect of etymology s/he must not only have knowledge of older stages of the language and an ability to reconstruct unattested forms, but also knowledge of the other languages that have impacted on the language being documented, or at least familiarity with the scholarly historical dictionaries of those languages. That is a tall order indeed, impossibly tall for any one person today given today‘s demands on and expectations of lexicographers. Teams which include specialists in different areas or at least have access to consultants in such areas alongside generalists are needed if scholarly standards are to be met. The standard of scholarship is primarily a factor of the number and range as well as the knowledge and experience of the lexicographers, as is in large measure the pace of production. In this regard, it cannot be emphasised enough that scholarly historical lexicography of high quality is and will remain very time consuming.
Even a reductionist attempt to define scholarship is clearly fraught with difficulty, but an idealised historical lexicographer-cum-scholar must obviously have – inter alia and at the very least – a profound linguistic and textual knowledge of the language being documented, an ability to understand texts in their historical context and to analyse the meaning or function of lexical items as used in context, an ability to synthesise the results through generalisation and abstraction and to formulate them in a way that is both accurate, i.e. reflects actual usage, and user- or reader-friendly, i.e. is comprehensible to the user/reader. S/he must have encyclopedic or world knowledge and literary skills in order to understand general content words and explain their meaning and their semantic shifts perhaps over many centuries, and technical expertise to understand specialist terms and define their use in specific contexts, again perhaps over time. In respect of etymology s/he must not only have knowledge of older stages of the language and an ability to reconstruct unattested forms, but also knowledge of the other languages that have impacted on the language being documented, or at least familiarity with the scholarly historical dictionaries of those languages. That is a tall order indeed, impossibly tall for any one person today given today‘s demands on and expectations of lexicographers. Teams which include specialists in different areas or at least have access to consultants in such areas alongside generalists are needed if scholarly standards are to be met. The standard of scholarship is primarily a factor of the number and range as well as the knowledge and experience of the lexicographers, as is in large measure the pace of production. In this regard, it cannot be emphasised enough that scholarly historical lexicography of high quality is and will remain very time consuming.
Ediarum is an editing environment designed and implemented by TELOTA at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW, Germany). It is based on two main components: an open-source XML native database (eXistDB) and a widely used commercial XML editor (Oxygen XML editor).
The aim of ediarum is to facilitate the task of encoding texts in TEI format, to store the resulting XML files in eXistDB and to enable collaboration and sharing amongst the members of a team. The central framework of this environment --known as ediarum.BASE.edit-- allows the editor to hide the XML tags and use a number of functions through a toolbar and a menu. In other words, the ediarum.BASE.edit's interface increases the usability of the XML editor and speeds up the encoding process and can be adapted to each project's needs. However, this framework is only available in German language. In other words, the code and the language interface are only accessible for and usable by German-speaking users.
While the original goal of TELOTA was to “bridge the gap” between the markup and the editor (Dumont and Fechner, 2015), the interface language creates a barrier for encoders who do not work in German and impedes potential collaborations with other institutions. In order to break this usability and accessibility barrier, in 2020 Proyecto Humboldt Digital (ProHD), a cooperation project between the BBAW and the Oficina del Historiador de la Ciudad de la Habana (Cuba), engaged with an adaptation process involving the internationalization of the software (developing features and code that are independent of language or locale) and the localization in the Spanish locale (creating resource files containing translations). As a result of this process, the project has developed a localization of ediarum.BASE.edit called ediarum.PROHD.edit that can be downloaded on Github.
This paper aims to present ediarum.PROHD.edit and to reflect on the most important challenges encountered during the software localization. After reviewing what “localization” means in Translation Studies (Pym, 2016; Jiménez Crespo, 2016), I will discuss the process of internationalization of the software (mostly variables written in ediarum's default functions), the localization itself (the translation of terms and descriptions displayed in the interface) and some testing undertaken with the Cuban team of Proyecto Humboldt Digital.
The computer has come to play a transformative role in the ways we model, store, process and study text. Nevertheless, we cannot yet claim to have realised the promises of the digital medium: the organisation and dissemination of scholarly knowledge through the exchange, reuse and enrichment of data sets. Despite the acclaimed interdisciplinary nature of digital humanities, current digital research takes place in a closed environment and rarely surpasses the traditional boundaries of a field. Furthermore, it is worthwhile to continue questioning the models we use and whether they are actually suitable for our scholarly needs. There’s a risk that the affordances and limitations of a prevailing model may blind us to aspects it doesn’t support.
In her talk, Elli Bleeker discusses different technologies to model data with respect to their expressive power and their potential to address the needs of the scholarly community. Within this framework, she introduces a new data model for text, Text-As-Graph (TAG), and it’s reference implementation Alexandria, a text repository system. The TAG model allows researchers to store, query, and analyse text that is encoded from different perspectives. Alexandria thus stimulates new ways of looking at textual objects, facilitates the exchange of information across disciplines, and secures textual knowledge for future endeavours. From a philosophical perspective, the TAG model and the workflow of Alexandria raise compelling questions about our notions of textuality, and prompt us to reconsider how we can best model the variety of textual dimensions.