The Universal Wordnet (UWN) is a large knowledge graph that aims at describing words, entities, and concepts in over 200 different languages in a large network structure.
For over 1,500,000 words in over 200 languages, UWN provides a corresponding list of meanings and shows how such meanings are semantically related. Additionally, the MENTA extension adds a large-scale hierarchical taxonomy of named entities and their classes, drawing on over 200 different language editions of Wikipedia. This leads to a knowledge base with over 15 million words and names in different languages.
The English word "board" could refer to a wooden panel, to a committee, to a blackboard, as a verb to the process of getting on a vehicle (e.g. "to board a plane"), and so on. The Universal Wordnet provides a list of meanings, and for each of these meanings, one can obtain the corresponding words in different languages, e.g. the committee sense of "board" corresponds to "комитет" in Russian and "委員会" in Japanese. Additionally, meanings are connected to related meanings, e.g. the committee meaning is linked to its generalizations administrative unit, social group, etc., and for each of these meanings one can again obtain corresponding words in different languages.
The English language represents a constantly decreasing fraction of the Web. China and the EU each have greatly surpassed the U.S. in the number of Internet users, and other regions are expected to follow. Multilingual knowledge graphs address this development by covering many different languages and making the semantic connections between words and names in different languages explicit.
This does not mean that language-specific differences are ignored. Rather, we can specifically try to model the differences between different languages. For example, when a language has a word for a concept that are not lexicalized in other languages, we can give it its own new entry, and describe how it relates to concepts that are available in other languages.
While multilingual knowledge graphs have become more popular in recent years, the Universal Wordnet project was the first one to extend WordNet to a large multilingual knowledge graph covering over 100 languages.
We provide a small library for the JVM (Java, Scala, etc.) that can be used with one or more large plugins, which provide the complete data for offline use (i.e., no need to connect to our servers).
Take a look at the example code.
Alternatively, you can work with a raw dump of the UWN Core. We provide a gzip-compressed TSV file, which is best decompressed on the fly while reading for best performance. Each line contains subject, predicate, object, and weight, separated by tabs.
The Universal Wordnet builds on the seminal work on WordNet by Princeton cognitive scientists George Miller and Christiane Fellbaum, extending it to cover over 100 different languages in a single connected network.
Please refer to the following academic papers for more details.
Towards a Universal Wordnet by Learning from Combined Evidence BibTeX Website/UWN API Code Slides
Gerard de Melo, Gerhard Weikum (2009)
In: Proc. 18th ACM Conference on Information and Knowledge Management (CIKM 2009). ACM.
Acceptance rate: 14.5%
MENTA: Inducing Multilingual Taxonomies from Wikipedia BibTeX Website
Gerard de Melo, Gerhard Weikum (2010)
In: Proc. 19th ACM Conference on Information and Knowledge Management (CIKM 2010). ACM.
🏆 Best Interdisciplinary Paper Award (out of 945 full paper submissions)
Acceptance rate: 13.3%
Constructing and Utilizing Wordnets using Statistical Methods BibTeX
Gerard de Melo, Gerhard Weikum (2012)
Language Resources and Evaluation 46:2, 2012, p. 287–311. Springer Verlag.