extJWNL MCR 3.0 2016 Spanish Wordnet Data

Dictionary data for Spanish from MCR 3.0 2016 Unix version

License

License

Princeton WordNet License
Categories

Categories

Data Net
GroupId

GroupId

net.sf.extjwnl.mcr
ArtifactId

ArtifactId

extjwnl-data-spa-mcr30
Last Version

Last Version

1.0.5
Release Date

Release Date

Type

Type

jar
Description

Description

extJWNL MCR 3.0 2016 Spanish Wordnet Data
Dictionary data for Spanish from MCR 3.0 2016 Unix version

Download extjwnl-data-spa-mcr30

How to add to project

<!-- https://jarcasting.com/artifacts/net.sf.extjwnl.mcr/extjwnl-data-spa-mcr30/ -->
<dependency>
    <groupId>net.sf.extjwnl.mcr</groupId>
    <artifactId>extjwnl-data-spa-mcr30</artifactId>
    <version>1.0.5</version>
</dependency>
// https://jarcasting.com/artifacts/net.sf.extjwnl.mcr/extjwnl-data-spa-mcr30/
implementation 'net.sf.extjwnl.mcr:extjwnl-data-spa-mcr30:1.0.5'
// https://jarcasting.com/artifacts/net.sf.extjwnl.mcr/extjwnl-data-spa-mcr30/
implementation ("net.sf.extjwnl.mcr:extjwnl-data-spa-mcr30:1.0.5")
'net.sf.extjwnl.mcr:extjwnl-data-spa-mcr30:jar:1.0.5'
<dependency org="net.sf.extjwnl.mcr" name="extjwnl-data-spa-mcr30" rev="1.0.5">
  <artifact name="extjwnl-data-spa-mcr30" type="jar" />
</dependency>
@Grapes(
@Grab(group='net.sf.extjwnl.mcr', module='extjwnl-data-spa-mcr30', version='1.0.5')
)
libraryDependencies += "net.sf.extjwnl.mcr" % "extjwnl-data-spa-mcr30" % "1.0.5"
[net.sf.extjwnl.mcr/extjwnl-data-spa-mcr30 "1.0.5"]

Dependencies

test (2)

Group / Artifact Type Version
junit : junit jar 4.13.1
net.sf.extjwnl : extjwnl jar 2.0.3

Project Modules

There are no modules declared in this project.

About

extjwnl-data-mcr30 prepackages jars with wordnet data from the Multilingual Central Repository 3.0 (2016 release; currently only the Spanish portion).

A configuration file is included to make it extremely easy to use these resources in your project.

Getting started

In your pom.xml:

<dependency>
    <groupId>net.sf.extjwnl</groupId>
    <artifactId>extjwnl</artifactId>
    <version>2.0.3</version>
</dependency>
<dependency>
    <groupId>net.sf.extjwnl.mcr</groupId>
    <artifactId>extjwnl-data-spa-mcr30</artifactId>
    <version>1.0.5</version>
</dependency>

In your code:

import net.sf.extjwnl.dictionary.*;

Dictionary d = Dictionary.getDefaultResourceInstance();

Mapping Between Dictionaries

extjwnl-data-mcr30 also contains an alignment module which supports loading multiple dictionaries and mapping word senses between them. To use it, you first need the following additional dependency in your pom.xml:

<dependency>
    <groupId>net.sf.extjwnl.mcr</groupId>
    <artifactId>extjwnl-data-alignment-mcr30</artifactId>
    <version>1.0.5</version>
</dependency>

Then you can load the MCR 3.0 Spanish wordnet together with two versions (3.0 and 3.1) of Princeton WordNet:

import net.sf.extjwnl.dictionary.*;
import net.sf.extjwnl.data.mcr30.alignment.*;

Dictionary spa = InterLingualIndex.getDictionary("mcr30", "spa");
Dictionary wn31 = InterLingualIndex.getDictionary("wn31", "eng");
Dictionary wn30 = InterLingualIndex.getDictionary("wn30", "eng");

After that, if you have a Spanish synset, you can find the corresponding English synset (if a mapping exists):

Synset englishSynset = InterLingualIndex.mapSynset(spanishSynset, wn31);

If you need to map lots of synsets, then use the SynsetMapper interface instead for better performance:

SynsetMapper mapper = InterLingualIndex.loadMapper(spa, wn31);
Synset englishSynset1 = mapper.mapSynset(spanishSynset1);
Synset englishSynset2 = mapper.mapSynset(spanishSynset2);
...

For more information, see the javadoc.

Acknowledgements

The data for this package comes from the Multilingual Central Repository (MCR):

Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012) Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.

@InProceedings{Gonzalez-Agirre:Laparra:Rigau:2012,
  author = "Aitor Gonzalez-Agirre and Egoitz Laparra and German Rigau",
  title = "Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base",
  booktitle = "Proceedings of the 6th Global WordNet Conference (GWC 2012)",
  year = 2012,
  address = "Matsue",
}

This package is designed for use with extjwnl. The resource bundling is based on the pattern set by extjwnl-data-wn31 for the English-language Princeton WordNet 3.1.

Princeton University "About WordNet." WordNet. Princeton University. 2010.

MCR data is converted into extjwnl format via a modified version of the wn-mcr-transform script. You can find the modified version here.

The MCR is aligned with Princeton WordNet 3.0, so for realigning to Princeton WordNet 3.1, we use the 3.0->3.1 mapping_wordnet.json from:

@misc{ZendelWordNetConv19,
  author = {Zendel, Oliver},
  title = {WordNet v3.0 vs. v3.1 mapping},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ozendelait/wordnet-to-json}},
  commit = {7521b70937355e826ea7e028a615108cdb18d0ee}
}

Stemming

Language-specific stemming rules are packaged in each data module; for example, here are the Spanish-specific stemming rules.

Exceptional Forms

For Spanish, exceptional forms (irregular verb conjugations, noun pluralizations, and adjective pluralizations) are enumerated using the morphala project. All lemmas from the MCR dictionary are run through morphala's conjugation/pluralization routines. From the resulting derived form, we attempt to reverse-derive the lemma as a base form via the standard DetachSuffixesOperation. When this fails, we treat the derived form as an exception and add it to supplemental_spa.txt.

Future Work

If you are interested in adding support for languages beyond Spanish (such as Portuguese), please open an issue on this project. The bare minimum for a language would be to bring in the language-specific dataset from MCR and also add stemming rules for regular inflections; bonus would be to enhance morphala with the necessary support for generating exceptional forms for that language.

net.sf.extjwnl.mcr

extJWNL

Versions

Version
1.0.5