public class FileBackedCorpusAlignment extends java.lang.Object implements CorpusAlignment
CorpusAlignment
backed by a file created by an external word
aligner like GIZA++.
This file should represent the alignment of one sentence
pair per line following the order of the bilingual corpus, each of them being
something like A-B C-D E-F..., which would mean that the token with
the A index in the source sentence is aligned with the token with the
B index in the target sentence, the token with the C index in
the source sentence is aligned with the token with the D index in the
target sentence and so on. Indexes start at 0
and must be integer
values.
The provided implementation is not thread-safe, so it should be synchronized externally.
Constructor and Description |
---|
FileBackedCorpusAlignment(java.io.File file)
Creates a new corpus alignment backed by the given file created by an
external aligner.
|
public FileBackedCorpusAlignment(java.io.File file) throws java.io.FileNotFoundException
file
- the file that backs the corpus alignment to create.java.io.FileNotFoundException
- if the given file is not found.