Issue 9: Text Collections

Edited by Ulrike Henny-Krahmer and Frederike Neuber. November 2018. DOI: 10.18716/ride.a.9.

EDITORIAL: Digital Text Collections, the third

1 We are happy to present the third volume on the topic of Digital Text Collections (DTC) which is also the ninth volume in the entire RIDE series. With this issue, we publish the last reviews that we received as a reaction to the first call for reviews on DTCs , together with several additional contributions. The current issue also represents a milestone for RIDE as a whole: With the five reviews of the current issue, a total of 50 reviews are now available in RIDE, including 20 on DTC and 30 on Digital Scholarly Editions. 2 As in the previous volumes, the rationale of the reviews is based on the guidelines for evaluating text collections . In addition, each article is accompanied by a factsheet...

InterCorp – Ein mehrsprachiges Parallelkorpus des Tschechischen Nationalkorpus (Český národní korpus)

InterCorp , Alexandr Rosen, Martin Vavřín, Adrian Zasina (ed.), 2008-2017. http://www.korpus.cz/ (Last Accessed: 09.03.2018). Reviewed by Agnes Kim (Institut für Slawistik, Universität Wien), agnes.kim (at) univie.ac.at . || Abstract This review describes and evaluates the InterCorp, a multilingual parallel corpus with referential character, developed by the Institute of the Czech National Corpus and the Institute of Theoretical and Computer Linguistics at the Charles University (Prague). In its current version 10, which was published in 2017, it comprises 2 108 703 589 tokens of language data in 40 different languages. It is developed according to the translation -principle wit...

Rezension von Europarl

Europarl , Philipp Koehn (ed.), 2001-2012. http://www.statmt.org/europarl/ (Last Accessed: 19.01.2018). Reviewed by Claes Neuefeind (Institute for Digital Humanities, University of Cologne), c.neuefeind (at) uni-koeln.de. || Abstract The Europarl corpus, short for „European Parliament Proceedings Parallel Corpus 1996-2011“, is provided by the School of Informatics, University of Edinburgh. Europarl was first published in 2001, the latest release (May 15th, 2012) includes the parliament proceedings from 1996-2011. There is no dedicated user interface, since Europarl is primarily meant to support research on machine translation, which has to rely on parallel texts. Beyond that, it ...

Review of “ShakespearePlaysPlus Text Corpus”

Shakespeare Corpus: ShakespearePlaysPlus , Mike Scott (ed.), 2006. http://lexically.net/wordsmith/support/shakespeare.html (Last Accessed: 03.07.2018). Reviewed by Katharina Mahler (CCeH, University of Cologne), english.textworks (at) gmail.com. || Abstract ShakespearePlaysPlus is a freely available digital text corpus of William Shakespeare’s plays. The 37 plays were compiled from the Oxford University Press 1916 Edition of “The Complete Works of William Shakespeare” and annotated by Mike Scott for his own research in 2006. The plays are organized in three categories according to their type, i.e., comedies, historical plays and tragedies. The speeches of all characters have been extra...

Review of Papyri.info

Papyri.info , Joshua Sosin (ed.), 2010. http://papyri.info (Last Accessed: 23.10.2018). Reviewed by Lucia Vannini (Institute of Classical Studies), lucia.vannini (at) postgrad.sas.ac.uk. || Abstract Papyri.info , made available by the Duke University, is a text collection of over 50,000 documentary papyri, i.e., Greek and Latin documents, dating back to the IV century BC – VIII century AD, which constitute a fundamental body of evidence for ancient everyday life in the classical antiquity. The collection consists of transcriptions encoded in EpiDoc (a subset of TEI for the representation of ancient documents preserved in inscriptions and in papyri), metadata, links to related resources, a...

Rezension des „Corpus Oral de Referencia de la Lengua Española Contempóranea“

CORLEC , Universidad Autónoma de Madrid (ed.), 1992. http://www.lllf.uam.es/ESP/Corlec.html (Last Accessed: 30.04.2018). Reviewed by Katrin Betz (Universität Bamberg), katrin.betz (at) uni-bamberg.de. || Abstract In this paper we review the „Corpus Oral de Referencia de la Lengua Española Contempóranea“. This corpus is a carefully compiled text collection of orthographically transcribed recordings of oral conversations. As it was compiled as a reference corpus, it provides texts of different styles and registers and has a considerable size of about 1,100,000 words. CORLEC therefore is an important resource for researchers interested in the field of spoken language. The Corpus is free...