public class FileBackedMonolingualCorpus extends java.lang.Object implements MonolingualCorpus
MonolingualCorpus
interface.Modifier and Type | Class and Description |
---|---|
static class |
FileBackedMonolingualCorpus.Builder
A
MonolingualCorpusBuilder to create FileBackedMonolingualCorpus instances. |
Modifier and Type | Field and Description |
---|---|
protected FileBackedIdManager<es.ehu.si.ixa.prebmt.model.filebacked.FileBackedCorpora.Element> |
idManager
The id manager that maps every element in the corpus to its corresponding
id.
|
protected long |
size
The amount of elements (phrase begin, phrase end, space, token and
entities) in this corpus.
|
Constructor and Description |
---|
FileBackedMonolingualCorpus(java.io.File dir,
java.lang.String id,
boolean readonly)
Creates a monolingual corpus backed in the given directory with the given
id.
|
Modifier and Type | Method and Description |
---|---|
protected int |
getAlignmentLength(long index)
Returns the amount of elements between the leftmost and the rightmost
elements in the target language with which the element in the given index
is aligned (including the leftmost and the rightmost ones).
|
protected long |
getAlignmentStart(long index)
Returns the index of the leftmost element in the target language with
which the element in the given index is aligned.
|
protected int |
getEntityId(long index)
Returns the id of the aligned entity in the given index, or
FileBackedCorpora.UNALIGNED_ENTITY_ID if the element in the given
index is not an aligned entity. |
protected long |
getId(long index)
Returns the id of the element in the given index.
|
protected long |
getSA(long index)
Returns the value of this corpus' suffix array in the given index.
|
protected double |
getWeight(long index)
Returns the lexical weight of the element in the given index.
|
java.util.Iterator<Sentence> |
iterator() |
protected void |
putId(long index,
long value)
Sets the id of the element in the given index.
|
protected void |
setAlignmentLength(long index,
int value)
Sets the amount of elements between the leftmost and the rightmost
elements in the target language with which the element in the given index
is aligned (including the leftmost and the rightmost ones).
|
protected void |
setAlignmentStart(long index,
long value)
Sets the the index of the leftmost element in the target language with
which the element in the given index is aligned.
|
protected void |
setEntityId(long index,
int value)
Sets the id of the aligned entity in the given index.
|
protected void |
setSA(long index,
long value)
Sets the value of this corpus' suffix array in the given index.
|
protected void |
setWeight(long index,
double value)
Sets the lexical weight of the element in the given index.
|
protected final FileBackedIdManager<es.ehu.si.ixa.prebmt.model.filebacked.FileBackedCorpora.Element> idManager
protected final long size
public FileBackedMonolingualCorpus(java.io.File dir, java.lang.String id, boolean readonly) throws java.io.IOException
dir
- the directory in which the corpus to create is backed.id
- the unique identifier of the monolingual corpus in the given directory.readonly
- whether the monolingual corpus to create should be read-only or not.java.io.IOException
- if some sort of I/O error occurs.public java.util.Iterator<Sentence> iterator()
iterator
in interface java.lang.Iterable<Sentence>
protected long getId(long index)
index
- the index of the element whose id is to be returned.protected long getAlignmentStart(long index)
index
- the index of the element whose alignment start is to be returned.protected int getAlignmentLength(long index)
0
is returned.index
- the index of the element whose alignment length is to be returned.protected int getEntityId(long index)
FileBackedCorpora.UNALIGNED_ENTITY_ID
if the element in the given
index is not an aligned entity.index
- the index of the element whose entity id is to be returned.FileBackedCorpora.UNALIGNED_ENTITY_ID
if the element in the given index is not an aligned entity.protected double getWeight(long index)
index
- the index of the element whose lexical weight is to be returned.protected long getSA(long index)
index
- the index of the suffix array whose value is to be returned.protected void putId(long index, long value)
index
- the index of the element whose id is to be set.value
- the id to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.protected void setAlignmentStart(long index, long value)
index
- the index of the element whose alignment start is to be set.value
- the value to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.protected void setAlignmentLength(long index, int value)
0
should be set to indicate that the element is not aligned.index
- the index of the element whose alignment length is to be set.value
- the alignment length to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.protected void setEntityId(long index, int value)
FileBackedCorpora.UNALIGNED_ENTITY_ID
should be set to indicate
element in the given index is not an aligned entity.index
- the index of the element whose entity id is to be set.value
- the id to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.protected void setWeight(long index, double value)
index
- the index of the element whose lexical weight is to be set.value
- the weight to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.protected void setSA(long index, long value)
index
- the index of the element whose value is to be set.value
- the value to set.java.lang.UnsupportedOperationException
- if this corpus is read-only.