public class TokenizedParser extends java.lang.Object implements MonolingualCorpusWriter
MonolingualCorpusWriter
to write a MonolingualCorpus
as
tokenized text. The produced output is in plain text, with one sentence per
line and entities represented by either their lemma preceded by "ENTI_" or
their corresponding text just as if they were plain tokens. Because of this,
the process is irreversible, that is, it is not possible to rebuild a
MonolingualCorpus
from this output since some basic information is
missing there.Constructor and Description |
---|
TokenizedParser(boolean writeEntityLemmas)
Constructs a new parser.
|
Modifier and Type | Method and Description |
---|---|
MonolingualCorpusBuilder |
getWriterCorpusBuilder(java.io.Writer out)
Returns a wrapper
MonolingualCorpusBuilder that writes sentences
by this MonolingualCorpusWriter as they are added to it. |
void |
writeMonolingualCorpus(MonolingualCorpus corpus,
java.io.Writer out)
Writes a
MonolingualCorpus as tokenized text. |
public TokenizedParser(boolean writeEntityLemmas)
writeEntityLemmas
- if true
entities will be represented by their lemma preceded by "ENTI_" in the output. If false
entities will be treated as plain tokens, and their corresponding text will appear in the output.public void writeMonolingualCorpus(MonolingualCorpus corpus, java.io.Writer out) throws ParseException
MonolingualCorpus
as tokenized text. The produced output
is in plain text, with one sentence per line and entities represented by
either their lemma preceded by "ENTI_" or their corresponding text just
as if they were plain tokens.writeMonolingualCorpus
in interface MonolingualCorpusWriter
corpus
- the MonolingualCorpus
to write.out
- the Writer
to write the output to.ParseException
- if some sort of writing error occurs.public MonolingualCorpusBuilder getWriterCorpusBuilder(java.io.Writer out)
MonolingualCorpusWriter
MonolingualCorpusBuilder
that writes sentences
by this MonolingualCorpusWriter
as they are added to it.getWriterCorpusBuilder
in interface MonolingualCorpusWriter
out
- the Writer
to write the output to.MonolingualCorpusBuilder
that writes sentences by this MonolingualCorpusWriter
as they are added to it.