public class WikipediaDictionary extends java.lang.Object implements Dictionary
Dictionary
that takes its entries from Wikipedia. Article
redirections are properly handled.
Several constructors are provided to either load the whole dictionary in memory or use a memory-mapped file (either temporary or not).
The provided implementation is immutable and, therefore, instances can be freely shared.
DUMMY_DICTIONARY
Constructor and Description |
---|
WikipediaDictionary(java.io.File file)
Creates a new Wikipedia dictionary from a previously created file for
memory-mapping.
|
WikipediaDictionary(java.io.Reader srcPageReader,
java.io.Reader srcRedirectReader,
java.io.Reader srcTranslationReader,
java.io.Reader trgPageReader,
java.io.Reader trgRedirectReader,
java.lang.String trgLanguageCode)
Creates a new in-memory dictionary reading its entries from some CSV
files dumped from Wikipedia.
|
WikipediaDictionary(java.io.Reader srcPageReader,
java.io.Reader srcRedirectReader,
java.io.Reader srcTranslationReader,
java.io.Reader trgPageReader,
java.io.Reader trgRedirectReader,
java.lang.String trgLanguageCode,
boolean mmap)
Creates a new dictionary reading its entries from some CSV files dumped
from Wikipedia.
|
WikipediaDictionary(java.io.Reader srcPageReader,
java.io.Reader srcRedirectReader,
java.io.Reader srcTranslationReader,
java.io.Reader trgPageReader,
java.io.Reader trgRedirectReader,
java.lang.String trgLanguageCode,
java.io.File file)
Creates a new memory-mapped dictionary reading its entries from some CSV
files dumped from Wikipedia.
|
Modifier and Type | Method and Description |
---|---|
boolean |
containsEntry(Entity src,
Entity trg)
Checks whether this bilingual dictionary contains the specified entry.
|
boolean |
equals(java.lang.Object obj) |
int |
hashCode() |
java.lang.String |
translate(Entity entity)
Translates the specified
Entity . |
public WikipediaDictionary(java.io.File file)
WikipediaDictionary(srcPageReader, srcRedirectReader, srcTranslationReader, trgPageReader, trgRedirectReader, trgLanguageCode, file)
constructor.file
- the previously created file for memory-mapping.public WikipediaDictionary(java.io.Reader srcPageReader, java.io.Reader srcRedirectReader, java.io.Reader srcTranslationReader, java.io.Reader trgPageReader, java.io.Reader trgRedirectReader, java.lang.String trgLanguageCode)
WikipediaDictionary(srcPageReader, srcRedirectReader, srcTranslationReader, trgPageReader, trgRedirectReader, trgLanguageCode, true)
.srcPageReader
- a Reader
for page.SRC.csv
, where SRC
is the code of the source language.srcRedirectReader
- a Reader
for redirect.SRC.csv
, where SRC
is the code of the source language.srcTranslationReader
- a Reader
for translation.SRC.csv
, where SRC
is the code of the source language.trgPageReader
- a Reader
for page.TRG.csv
, where TRG
is the code of the target language.trgRedirectReader
- a Reader
for redirect.TRG.csv
, where TRG
is the code of the target language.trgLanguageCode
- the code of the target language (such as "eu" for Basque or "en" for English).public WikipediaDictionary(java.io.Reader srcPageReader, java.io.Reader srcRedirectReader, java.io.Reader srcTranslationReader, java.io.Reader trgPageReader, java.io.Reader trgRedirectReader, java.lang.String trgLanguageCode, boolean mmap)
srcPageReader
- a Reader
for page.SRC.csv
, where SRC
is the code of the source language.srcRedirectReader
- a Reader
for redirect.SRC.csv
, where SRC
is the code of the source language.srcTranslationReader
- a Reader
for translation.SRC.csv
, where SRC
is the code of the source language.trgPageReader
- a Reader
for page.TRG.csv
, where TRG
is the code of the target language.trgRedirectReader
- a Reader
for redirect.TRG.csv
, where TRG
is the code of the target language.trgLanguageCode
- the code of the target language (such as "eu" for Basque or "en" for English).mmap
- if true
use a temporary memory-mapped file for memory efficiency, if false
load the whole dictionary in memory.public WikipediaDictionary(java.io.Reader srcPageReader, java.io.Reader srcRedirectReader, java.io.Reader srcTranslationReader, java.io.Reader trgPageReader, java.io.Reader trgRedirectReader, java.lang.String trgLanguageCode, java.io.File file)
WikipediaDictionary(file)
can be used
to create subsequent instances that reuse the same file for memory-mapping.srcPageReader
- a Reader
for page.SRC.csv
, where SRC
is the code of the source language.srcRedirectReader
- a Reader
for redirect.SRC.csv
, where SRC
is the code of the source language.srcTranslationReader
- a Reader
for translation.SRC.csv
, where SRC
is the code of the source language.trgPageReader
- a Reader
for page.TRG.csv
, where TRG
is the code of the target language.trgRedirectReader
- a Reader
for redirect.TRG.csv
, where TRG
is the code of the target language.trgLanguageCode
- the code of the target language (such as "eu" for Basque or "en" for English).file
- the file in which to store the dictionary through memory-mapping.public java.lang.String translate(Entity entity)
Dictionary
Entity
.translate
in interface Dictionary
entity
- the Entity
to translate.Entity
, or null
if the dictionary doesn't contain any entry for it.public boolean containsEntry(Entity src, Entity trg)
Dictionary
containsEntry
in interface Dictionary
src
- the Entity
in the source language.trg
- the Entity
in the target language.trg
is a valid translation of src
according to this dictionary.public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object