See: Description
Interface | Description |
---|---|
AlignedEntity |
An
Entity with an integer id that marks its alignment with another
entity. |
Alignment |
Represents the alignment between the tokens in a source sentence and the
tokens in a target sentence.
|
BilingualCorpus |
Two aligned
MonolingualCorpus es that form a bilingual corpus from
which partial translations can be generated. |
CorpusAlignment |
A list of
Alignment s that represents the word alignment of a
BilingualCorpus . |
Entity |
A
Token of a special nature such as a proper noun or a number that is
characterized by its lemma, type and a list of morphological tags apart from
the text itself. |
LeafElement |
The interface that all the elements that can be leaves in the parse tree of a
Sentence must implement. |
LexicalWeighting |
Calculates a lexical weight for any given token pair (one in the source
language, and the other one in the target language).
|
MonolingualCorpus |
A list of
Sentence s that represents a monolingual corpus. |
MonolingualCorpusBuilder |
A builder to create
MonolingualCorpus instances. |
Phrase |
An inner node in the parse tree of a
Sentence . |
PhraseElement |
The interface that all the elements that can be a child of a
Phrase
must implement. |
Sentence |
A parse tree that represents a sentence together with its syntactical
structure.
|
Space | |
Text | |
TextElement |
The interface that all the elements that can form a
Text must
implement. |
Token |
The domain model mostly consists of a hierarchical representation of text
data. Text
, MonolingualCorpus
and BilingualCorpus
form the highest abstraction unit in this hierarchy:
Text
is formed by a list of TextElement
s, which can be
either Sentence
s or the Space
s separating them.
MonolingualCorpus
is formed by a list of Sentence
s.
MonolingualCorpusBuilder
s are used to create them.
BilingualCorpus
is formed by two aligned
MonolingualCorpus
es from which partial translations can be generated.
Sentence level alignment is given by the order of the Sentence
s in
each of the MonolingualCorpus
es that form the
BilingualCorpus
, and the word alignment of each of these aligned
sentence pairs is given by an Alignment
instance. A
CorpusAlignment
is a list of Alignment
s that follows the
order of the aligned sentences in the corpus.
In the next level of the hierarchy, a Sentence
is represented by its
syntactic tree and is formed by a single Phrase
that corresponds to
the root of this tree. This way, Phrase
s constitute the inner nodes
of the tree, and they are formed by a label that describes their syntactical
function and a list of PhraseElement
s that correspond to their
children. A PhraseElement
can be either a Phrase
, which would
correspond to another node in the tree or, in other words, the root of a new
subtree, or a LeafElement
, which would correspond to a leaf in the
tree. At the same time, a LeafElement
can be either a Token
or a Space
.
Token
s are the atomic elements in the parse tree that typically
correspond to words, and they can be separated by Space
s. An
Entity
is a Token
of a special nature such as a proper noun
or a number that is characterized by its lemma, type and a list of
morphological tags apart from the text itself. An AlignedEntity
is an
entity that has been aligned with another AlignedEntity
in the same
aligned sentence pair (one in the source sentence, and the other one in the
target sentence), both of which having the same integer id that marks this
relationship.
Apart from these interfaces to hierarchically represent text data, a
LexicalWeighting
calculates a numerical value for any given token
pair (one in the source language, and the other one in the target language)
that reflects how strongly aligned they are.