|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcic.wordnet.WordNet
public final class WordNet
A simple connector to WordNet 3.0. Before using it you must set the path of CICWN with WordNet.setPath(String). WordNet.loadDataBase(String) its the first method that should be executed. This connector is a little slow to load because of the counts for IDF calculation.
Field Summary | |
---|---|
(package private) static java.util.ArrayList<KeyString> |
exceptions
Contains memory map for irregular morphs. |
(package private) static int |
glossCount
Number of loaded synsets. |
(package private) static int |
maxCollocationSize
Maximum word size from collocations stored in WordNet. |
(package private) static java.lang.String |
path
The path of CICWN |
(package private) static java.util.ArrayList<java.lang.String> |
prepositions
Contains the list of prepositions stored in the preposition file. |
(package private) static java.util.ArrayList<KeyArray> |
synMaps
Contains the mapping between a lemma and its possible synsets. |
(package private) static java.util.ArrayList<ParsedSynset> |
synsets
Contains WordNet synsets with lemmatized glosses. |
(package private) static java.util.ArrayList<KeyString> |
wordCounts
Contains the frequency of each lemma on the loaded samples. |
Constructor Summary | |
---|---|
WordNet()
|
Method Summary | |
---|---|
static java.util.ArrayList<java.io.File> |
getAllFiles(java.io.File source)
Simple utility method for getting all the files nested inside a folder and its subfolders. |
static java.util.ArrayList<KeyString> |
getExceptions()
Returns list with irregular morphs. |
static int |
getGlossCount()
Return a count of the synsets in WordNet. |
static java.util.ArrayList<ParsedSynset> |
getGlosses()
Returns glosses list. |
static double |
getIDF(java.lang.String lemma)
Retrieve IDF for a lemma. |
static java.util.ArrayList<ParsedSynset> |
getLemma(java.lang.String lemma)
This method its similar to wn command of WordNet. getLemma uses a lemma in the format "lemma_P". |
static int |
getMaxCollocationSize()
Returns the maximum word size from collocations stored in WordNet. |
static java.lang.String |
getPOS(int pos)
Returns the corresponding POS tag |
static int |
getPOS(java.lang.String pos)
Returns the corresponding POS tag |
static java.util.ArrayList<java.lang.String> |
getPrepositions()
Returns a list with the prepositions. |
static ParsedSynset |
getSynset(java.lang.String sid)
Retrieve a synset by its synsetId using binary search over glosses mapping. |
static java.util.ArrayList<KeyArray> |
getSynsets()
Returns lemma/synsets memory mapping. |
static boolean |
hasPrepositions(java.lang.String morph)
Method for detecting if a collocation has a preposition in it. |
static java.util.ArrayList<java.util.ArrayList<java.lang.String>> |
lemmatize(java.lang.String line,
edu.stanford.nlp.tagger.maxent.MaxentTagger tagger)
Lemmatizer that uses Morphy as morphological processor and Stanford Log-linear Part-Of-Speech Tagger. |
private static void |
loadCountsFromFile(java.io.FileReader input)
Loads a count file. |
static void |
loadDataBase(java.lang.String sampleSources)
Reads files in the Resources/wordnet folder and creates the memory mapping for all the terms in WordNet. |
private static void |
loadSamplesFromSource(java.io.FileReader input)
Load the samples from a parsed source. |
static void |
loadWordNet()
loadWordNet exist for allowing to parse new SemCor" files. loadWordNet loads synset information, mappings and relations. |
static void |
main(java.lang.String[] args)
|
static java.util.ArrayList<java.lang.String> |
Morphy(java.lang.String morph,
int pos)
Implementation of WordNet Morphological processor. |
static java.util.ArrayList<java.lang.String> |
Morphy(java.lang.String morph,
java.lang.String postag)
Implementation of WordNet Morphological processor. |
private static ParsedSynset |
parseGloss(java.lang.String line,
int pos)
Method for extracting a synset from a line of WordNet data. |
static void |
parseSamplesFromSemCor(java.lang.String source)
This method parses the samples and counts for a valid SemCor format source. |
static void |
parseSamplesFromWordNet()
Utility method for parsing WordNet glosses and samples. |
private static java.util.ArrayList<java.lang.String> |
parseSenseMap(java.lang.String[] tokens,
java.lang.String pos)
Method for extracting the possible synsets of a lemma. |
static void |
setPath(java.lang.String path)
|
static java.util.ArrayList<java.util.ArrayList<java.lang.String>> |
softLemmatize(java.lang.String line,
edu.stanford.nlp.tagger.maxent.MaxentTagger tagger)
Open-class words extracted with the Stanford Log-linear Part-Of-Speech Tagger. |
static java.util.ArrayList<java.lang.String> |
Transform(java.lang.String morph,
int pos)
Implementation of WordNet's Morphy rules of detachment. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
static java.lang.String path
static java.util.ArrayList<KeyString> wordCounts
static java.util.ArrayList<KeyArray> synMaps
static java.util.ArrayList<ParsedSynset> synsets
static java.util.ArrayList<KeyString> exceptions
static java.util.ArrayList<java.lang.String> prepositions
static int glossCount
static int maxCollocationSize
Constructor Detail |
---|
public WordNet()
Method Detail |
---|
public static java.util.ArrayList<KeyArray> getSynsets()
public static java.util.ArrayList<ParsedSynset> getGlosses()
public static java.util.ArrayList<KeyString> getExceptions()
public static java.util.ArrayList<java.lang.String> getPrepositions()
public static int getGlossCount()
public static int getMaxCollocationSize()
public static ParsedSynset getSynset(java.lang.String sid)
sid
- The synsetId in format "Number_P".
public static double getIDF(java.lang.String lemma)
lemma
- The lemma to look for.
public static void loadWordNet() throws java.lang.Exception
java.lang.Exception
public static void loadDataBase(java.lang.String sampleSources) throws java.lang.Exception
sampleSources
- A string with the sources that will form the bag of words.
Some valid examples are: "WNGlosses", "WNGlosses;WNSamples", "WNGlosses;WNSamples;SemCor;yoursamplesource"
java.lang.Exception
private static void loadCountsFromFile(java.io.FileReader input) throws java.lang.Exception
input
- The count file.
java.lang.Exception
public static void parseSamplesFromSemCor(java.lang.String source) throws java.lang.Exception
source
- The name of the file or the folder that contains SemCor valid format files.
An error will be raised if a no SemCor file is mixed in the source folder.
java.lang.Exception
public static java.util.ArrayList<java.io.File> getAllFiles(java.io.File source) throws java.lang.Exception
source
- The analyzed folder.
java.lang.Exception
public static void parseSamplesFromWordNet() throws java.lang.Exception
java.lang.Exception
private static void loadSamplesFromSource(java.io.FileReader input) throws java.lang.Exception
input
- The loaded source.
java.lang.Exception
public static java.util.ArrayList<ParsedSynset> getLemma(java.lang.String lemma)
lemma
- The lemma to look for.
public static java.util.ArrayList<java.lang.String> Morphy(java.lang.String morph, java.lang.String postag)
morph
- The word to be processed.postag
- The POS tag of the word ("N","V","A","R").
public static java.util.ArrayList<java.lang.String> Morphy(java.lang.String morph, int pos)
morph
- The word to be processed.pos
- The POS tag of the word ("N=0","V=1","A=2","R=3").
public static java.util.ArrayList<java.lang.String> Transform(java.lang.String morph, int pos)
morph
- The word to be processed.pos
- The POS tag of the word ("N=0","V=1","A=2","R=3").
public static boolean hasPrepositions(java.lang.String morph)
morph
- The collocation to process. White spaces must be replaced with "_".
private static java.util.ArrayList<java.lang.String> parseSenseMap(java.lang.String[] tokens, java.lang.String pos)
tokens
- Array containing values of line.split(" ") operation of a
index file line. POS WordNet data file.pos
- The POS tag of the current file.
private static ParsedSynset parseGloss(java.lang.String line, int pos)
line
- The line to be processed.pos
- The POS tag of the WordNet file.
public static java.lang.String getPOS(int pos)
pos
- The POS tag of the word ("N=0","V=1","A=2","R=3").
public static java.util.ArrayList<java.util.ArrayList<java.lang.String>> lemmatize(java.lang.String line, edu.stanford.nlp.tagger.maxent.MaxentTagger tagger)
line
- The text to be processed.tagger
- An instance of the Stanford MaxentTagger.
public static java.util.ArrayList<java.util.ArrayList<java.lang.String>> softLemmatize(java.lang.String line, edu.stanford.nlp.tagger.maxent.MaxentTagger tagger)
line
- The text to be processed.tagger
- An instance of the Stanford MaxentTagger.
public static int getPOS(java.lang.String pos)
pos
- The POS tag of the word ("N=0","V=1","A=2","R=3","W=3").
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
public static void setPath(java.lang.String path)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |