public class EustaggerParser extends java.lang.Object implements MonolingualCorpusParser
MonolingualCorpusParser
to parse a monolingual corpus that has
already been analyzed by Eustagger.Constructor and Description |
---|
EustaggerParser()
Constructs a new parser.
|
Modifier and Type | Method and Description |
---|---|
void |
parseMonolingualCorpus(MonolingualCorpusBuilder builder,
java.io.Reader... in)
Builds a
MonolingualCorpus by parsing some input. |
void |
parseMonolingualCorpus(MonolingualCorpusBuilder builder,
java.util.Set<java.lang.Integer> removedIndexes,
java.io.Reader... in)
Builds a
MonolingualCorpus by parsing some input and removing the sentences in the specified indexes. |
public void parseMonolingualCorpus(MonolingualCorpusBuilder builder, java.io.Reader... in) throws ParseException
MonolingualCorpus
by parsing some input.parseMonolingualCorpus
in interface MonolingualCorpusParser
builder
- the MonolingualCorpusBuilder
with which to build the corpus.in
- Reader
s for INPUT
, INPUT.zatiak
and INPUT.w.xml
in this exact order, where INPUT
is the input file given to Eustagger and the other two output files produced by it. More input files might be provided in multiples of three, each group of three following this specification, in which case the produced output will be the concatenation of all of them in the same order.ParseException
- if some sort of parsing error occurs or the input doesn't follow the specified format.public void parseMonolingualCorpus(MonolingualCorpusBuilder builder, java.util.Set<java.lang.Integer> removedIndexes, java.io.Reader... in) throws ParseException
MonolingualCorpus
by parsing some input and removing the sentences in the specified indexes.parseMonolingualCorpus
in interface MonolingualCorpusParser
builder
- the MonolingualCorpusBuilder
with which to build the corpus.removedIndexes
- the set of indexes for the sentences to remove, starting from 1.in
- Reader
s for INPUT
, INPUT.zatiak
and INPUT.w.xml
in this exact order, where INPUT
is the input file given to Eustagger and the other two output files produced by it. More input files might be provided in multiples of three, each group of three following this specification, in which case the produced output will be the concatenation of all of them in the same order.ParseException
- if some sort of parsing error occurs or the input doesn't follow the specified format.