Refine
Document Type
- Conference Proceeding (12) (remove)
Language
- English (12) (remove)
Keywords
- Korpus <Linguistik> (5)
- Neuägyptisch (2)
- Software (2)
- AEMASE (1)
- Annotation (1)
- Big Data (1)
- CVMA (1)
- Computer-mediated communication (1)
- Computerlinguistik (1)
- Corpus linguistics (1)
Has Fulltext
- yes (12)
Institute
- Akademienvorhaben Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache. Text- und Wissenskultur im alten Ägypten (5)
- ALLEA (2)
- TELOTA - IT/Digital Humanities (2)
- Veröffentlichungen von Akademiemitgliedern (2)
- Akademienunion (1)
- Akademienvorhaben Digitales Wörterbuch der Deutschen Sprache (1)
This paper is an updated presentation of the Ramses project being currently developed at the University of Liège. The first section stresses the main objectives and gives a technical description of the general architecture of Ramses software. The second part describes the encoding procedures and reviews the current state of the annotation. In the third section, some changes brought about by the use of large-scale corpora are discussed from an epistemological viewpoint. The paper ends with the presentation of some new avenues for research that will ensue from the use of a complex multilevel corpus.
quoteSalute strives to make data of digital scholarly editions of letters (DSELs) accessible in a playful fashion by enabling users to integrate salutations from DSELs in their own email correspondence. The foundation of quoteSalute is a curated TEI-XML text corpus which has been created by extracting <salute>-tags from TEI-XML-encoded DSELs. For providing users with fitting salutations, we annotated the data regarding language, level of politeness and intended gender of sender and receiver.
When managing large quantities of data, it is a common solution to utilize a centralized data management software to forge a connection between metadata and the data objects themselves. In case of text-based objects without any attached metadata, it is easy for humans to contextualize these objects by recognizing patterns such as filenames, titles, authors etc. This task becomes a challenge when dealing with non-text-based objects like images in the cultural heritage domain. Without metadata or expert knowledge, it becomes difficult to estimate the creation date of a painting or tell the name of its painter. Thus, the ability to contextualize data depends on whether there is a working connection between the metadata store and the data object itself. This connection fails as soon as the file is moved on the file system without having these changes also applied in the corresponding
data base, or when the file is shared without a reference to its original location. This paper presents an approach to overcome that type of co-dependency by utilizing XMP to embed cultural heritage metadata directly into image files to ensure their location-independent long-term preservation. The “Corpus Vitrearum Medii Aevi” Germany (CVMA) project serves as an example use-case.
Virtually all conventional text-based natural language processing techniques - from traditional information retrieval systems to full-fledged parsers - require reference to a fixed lexicon accessed by surface form, typically trained from or constructed for synchronic input text adhering strictly to contemporary orthographic conventions. Unconventional input such as historical text which violates these conventions therefore presents difficulties for any such system due to lexical variants present in the input but missing from the application lexicon. To facilitate the extension of synchronically-oriented natural language processing techniques to historical text while minimizing the need for specialized lexical resources, one may first attempt an automatic canonicalization of the input text. This paper provides an informal overview of the various canonicalization techniques currently employed by the Deutsches Textarchiv project at the Berlin-Brandenburg Academy of Sciences and Humanities to prepare a corpus of historical German text for part-of-speech tagging, lemmatization, and integration into a robust online information retrieval system.
The article summarizes the contents and the structurtal premises of the “Thesaurus Indogermanischer Text- und Sprachmaterialien” (TITUS), focussing on search functions and facilities and questions of the encoding of ancient languages written in various scripts. Examples are taken from Tocharian, Greek, Vedic Sanskrit, and other ancient Indo-European languages covered by TITUS.
Physical principles underlying biological pattern formation are discussed. In particular, the combination of local self-enhancement and long-range (“lateral”) inhibition (Gierer and Meinhardt, 1972) accounts for de-novo pattern formation, and for striking features of developmental regulation such as induction, spacing and proportion regulation of centers of activation in tissues and cells. Part I explains physical principles of spatial organisation in biological development. Part II demonstrates in mathematical terms that and how short-range activation and long-range inhibition are conditions for the generation of spatial concentration patterns. The conditions can be expressed in terms of ranges, rates and orders of reactions. These conditions, in turn, can also be derived by analysis of dynamic instabilities by means of Fourier waves, showing the neither obvious nor trivial relation between the latter approach and the theory based primarily on autocatalysis and lateral inhibition.
This is the invited evening lecture of the biannual workshop on hydroid development of 1999. Its topic is the role of hydra as a rather puristic model for the de-novo generation of spatial patterns in development, and our work in this field. Emphasis is placed not only on experimental studies, but also on theoretical analysis, because the understanding of spatial order requires a systems approach involving the combination of knowledge on molecules, cells and tissues with mathematical analysis, laws and facts.
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
(2014)
We introduce two corpora gathered on the web and related to computer-mediated communication: blog posts and blog comments. In order to build such corpora, we addressed following issues: website discovery and crawling, content extraction constraints, and text quality assessment. The blogs were manually classified as to their license and content type. Our results show that it is possible to find blogs in German under Creative Commons license, and that it is possible to perform text extraction and linguistic annotation efficiently enough to allow for a comparison with more traditional text types such as newspaper corpora and subtitles. The comparison gives insights on distributional properties of the processed web texts on token and type level. For example, quantitative analysis reveals that blog posts are close to written language, while comments are slightly closer to spoken language.
In 20 articles experts from research, politics and research management discuss current challenges and future advancements of European research infrastructures for the humanities and social sciences, particularly in view of the funding scheme Horizon 2020 and the ESRFI Roadmap update. Starting with an overview of SSH infrastructures it elaborates on four specific areas that increasingly demand a pan-European approach. Drawing from the SSH infrastructure projects´ experience, it then (re-) defines the requirements and potential for next generation infrastructure projects. They highlight the developments and problems they anticipate, focussing in particular on advancing digitalisation in the SSH. The book draws together the insights gained at a conference of the same name, “Facing the Future”, held in Berlin in November 2013. The conference was attended by 70 experts from 19 European countries who met to discuss the new challenges posed by the increasing necessity of integrating digital research tools into everyday working life. It was organised by the European Strategy Forum on Research Infrastructures (ESFRI), the federation of All European Academies (ALLEA), the Union of the German Academies of Sciences and Humanities, and the German Data Forum. It took place as part of a project financed by the German Federal Ministry of Education and Research (BMBF) entitled Survey and Analysis of Basic Research in the Social Sciences and Humanities in Europe (SASSH).
The African European Mediterranean Academies for Science Education (AEMASE) initiative is committed to promoting science outreach to society and to improving the quality and accessibility of science education in schools throughout the eponymous North-South region. To achieve these aims, one of AEMASE’s key activities is implementing IBSE in more schools and supporting the continued professional development of science educators in IBSE methodology and practice. In the long term, the AEMASE partner institutions, which come from all three geographical areas, seek to contribute to the steady development of quality science and innovation systems by focussing on stimulating and supporting the future generations of researchers and innovators. In this context, key AEMASE partner institutions held an international conference on science education in Rome in May 2014, hosted by the venerable Accademia Nazionale dei Lincei. Participants from six continents shared their professional experiences with IBSE and discussed best practices, challenges and future collaboration opportunities. The conference brought together representatives from three crucial areas of expertise: science, education, and policy. The outcomes of this conference are condensed in the report which serves as a testament to the relevance and importance of quality science education for modern societies.