public class AnalyzerParser extends java.lang.Object implements MonolingualCorpusParser
MonolingualCorpusParser
to build MonolingualCorpus
es from
plain text by analyzing it with a specific Analyzer
.Constructor and Description |
---|
AnalyzerParser(Analyzer analyzer)
Constructs a new parser with the specified
Analyzer and assuming
that the input text will have a single sentence per line. |
AnalyzerParser(Analyzer analyzer,
boolean sentencePerLine)
Constructs a new parser with the specified
Analyzer . |
Modifier and Type | Method and Description |
---|---|
void |
parseMonolingualCorpus(MonolingualCorpusBuilder builder,
java.io.Reader... in)
Builds a
MonolingualCorpus by parsing some input. |
void |
parseMonolingualCorpus(MonolingualCorpusBuilder builder,
java.util.Set<java.lang.Integer> removedIndexes,
java.io.Reader... in)
Builds a
MonolingualCorpus by parsing some input. |
public AnalyzerParser(Analyzer analyzer)
Analyzer
and assuming
that the input text will have a single sentence per line.analyzer
- the Analyzer
that has to be applied to the input text in order to build a MonolingualCorpus
.public AnalyzerParser(Analyzer analyzer, boolean sentencePerLine)
Analyzer
.analyzer
- the Analyzer
that has to be applied to the input text in order to build a MonolingualCorpus
.sentencePerLine
- whether the input text will have a single sentence per line or not. If false
, the analyzer will try to split each line into sentences by itself, but it might still decide that it consists of a single one.public void parseMonolingualCorpus(MonolingualCorpusBuilder builder, java.io.Reader... in) throws ParseException
MonolingualCorpus
by parsing some input.parseMonolingualCorpus
in interface MonolingualCorpusParser
builder
- the MonolingualCorpusBuilder
with which to build the corpus.in
- the input plain text(s) to read from. If more than one are given, the produced output will be the concatenation of all of them in the same order.ParseException
- if some sort of parsing error occurs.public void parseMonolingualCorpus(MonolingualCorpusBuilder builder, java.util.Set<java.lang.Integer> removedIndexes, java.io.Reader... in) throws ParseException
MonolingualCorpus
by parsing some input.parseMonolingualCorpus
in interface MonolingualCorpusParser
builder
- the MonolingualCorpusBuilder
with which to build the corpus.removedIndexes
- the set of indexes for the sentences to remove, starting from 1.in
- the input plain text(s) to read from. If more than one are given, the produced output will be the concatenation of all of them in the same order.ParseException
- if some sort of parsing error occurs.