See: Description
Class | Description |
---|---|
FileBackedBilingualCorpus |
File-backed implementation of the
BilingualCorpus interface, formed
by two aligned FileBackedMonolingualCorpus instances. |
FileBackedCorpusAlignment |
A
CorpusAlignment backed by a file created by an external word
aligner like GIZA++. |
FileBackedIdManager<T> |
A file-backed id manager that maps every distinct value to a unique long id.
|
FileBackedIdManager.Builder<T> |
A builder to create
FileBackedIdManager instances by adding
values that it associates with unique ids. |
FileBackedMonolingualCorpus |
File-backed implementation of the
MonolingualCorpus interface. |
FileBackedMonolingualCorpus.Builder |
A
MonolingualCorpusBuilder to create FileBackedMonolingualCorpus instances. |
The CorpusAlignment
, MonolingualCorpus
and
BilingualCorpus
are implemented in this package in such a way that
they are backed by files, loading their content into the main memory on
demand. This makes them more memory efficient, but also more I/O intensive.
In general terms, these implementations should be preferred over their
in-memory counterparts as they offer better performance, but they could turn
problematic when used through slow I/O media (like network mounted storage).
The fact that the provided implementations are file-backed also makes them
persistent, which means that they can generally be initialized from these
files once they are created instead of building them from scratch. Generating
the files is usually the most expensive operation by far, so it is
recommended to create them once and reuse the same files in subsequent
usages.