NorthEuraLex - Lexicostatistical Database of Northern Eurasia

Welcome to NorthEuraLex 0.9

NorthEuraLex is a large-scale lexicostatistical database which is being compiled within the EVOLAEMP project. It is unique among databases for providing lexical data from more than twenty language families in a unified IPA encoding, which is generated automatically from the orthographies or standard transcriptions, and will continue to be improved in the future. It is intended to serve as a basis for creating new benchmarks in computational historical linguistics, with the purpose of improving computational models of language relationship and language contact.

The current release version 0.9 covers a list of 1,016 concepts across 107 languages of Northern Eurasia, with a focus on Uralic and Indo-European, but also including all the language families conveniently summarized as Altaic/Transeurasian and Paleosiberian, a selection of Caucasian languages, some major contact languages from adjacent families, as well as the most well-known isolates of Northern Eurasia.

IMPORTANT: The current versions of the wordlists have been compiled by non-experts based on available resources, and are therefore guaranteed to contain many errors and inaccuracies. Therefore, they are not adequate for use as a primary reference or data source for any of the languages concerned, but only in computational frameworks where some noise can be dealt with. The next major version (planned for autumn 2020) will contain at least 80 additional languages, a first batch of etymological annotations for the larger families, as well as many updates and corrections based on the feedback of experts and native speakers.

How to cite the dataset

Dellert, J., Daneyko, T., Münch, A. et al. Lang Resources & Evaluation (2019). https://doi.org/10.1007/s10579-019-09480-6 (version 0.9).

The data for this initial release were compiled by: Johannes Dellert, Thora Daneyko, Alla Münch, Alina Ladygina, Armin Buch, Natalie Clarius, Ilja Grigorjew, Mohamed Balabel, Isabella Boga, Zalina Baysarova, Roland Mühlenbernd, Gerhard Jäger, and Johannes Wahle.

Language-specific IPA converters were developed by: Thora Daneyko, Johannes Dellert, and Armin Buch.

The source code of the web application is based on the clld framework, developed by the CLLD project.