Francisco Viveros-Jiménez

   
 

Overview

CICWN is a Java WordNet API that allows Java applications to retrieve data from WordNet. CICWN is compatible with WordNet 3.0. Its main features are:

  • Implementation of the WordNet wn command.
  • Parsed glossed using a WordNet lemmatizer and Stanford POS tagger.
  • WordNet lemmatizer.
  • Uploading of SemCor samples.
  • IDF calculation.

The API is licensed under the GNU General Public License (v2 or later). Source is included. SemCor and WordNet are bundled inside the conector.

Please cite the following paper in your work:

  • Viveros-Jiménez, F., Gelbukh, A., Sidorov, G.: Improving Simplified Lesk Algorithm by using simple window selection practices. Submitted.

Download

Current Version 1.0 can be dowloaded here.

Using CICWN in your application

The main class in CICWN is the WordNet class. A sample code of CICWN is:

WordNet.setPath(CICWN/""); //Setting the path of CICWN.
WordNet.loadDataBase("WNGlosses"); //Loading the samples that will form the bag of words.
//You can use any function of the WordNet class in here

In CICWN glosses are treated as they were samples. WordNet.loadDataBase("WNGlosses"); loads the parsed samples from previously parsed resources. CICWN is bundled with three resources:

  • WNGlosses: WordNet glosses.
  • WNSamples: WordNet samples.
  • SemCor: SemCor corpus.

You can use more than one resource, e.g. WordNet.loadDataBase("WNSamples;SemCor");.

You can add your own SemCor format samples by using:

WordNet.loadWordNet();
WordNet.parseSamplesFromSemCor("PathOfYourSemCorFiles");

For loading your own samples you have to use "PathOfYourSemCorFiles" as the name of your resources.

The most useful funcion of CICWN are:

  • WordNetgetLemma(String lemma): This method its similar to wn command of WordNet. getLemma uses a lemma in the format "lemma_P". First, base forms are retrieved with Morphy. Then, senses are retrieved for the corresponding base forms.
  • WordNet.lemmatize(String line,MaxentTagger tagger): Lemmatizer that uses Morphy as morphological processor and Stanford Log-linear Part-Of-Speech Tagger.
  • WordNet.getIDF(String lemma): Retrieves IDF for a lemma.
  • WordNet.Morphy(String morph, String postag): Implementation of WordNet Morphological processor.