Package es.ehu.si.ixa.prebmt.model

Contains the interfaces that represent the domain model of the application.

See: Description

Package es.ehu.si.ixa.prebmt.model Description

Contains the interfaces that represent the domain model of the application.

The domain model mostly consists of a hierarchical representation of text data. Text, MonolingualCorpus and BilingualCorpus form the highest abstraction unit in this hierarchy:

In the next level of the hierarchy, a Sentence is represented by its syntactic tree and is formed by a single Phrase that corresponds to the root of this tree. This way, Phrases constitute the inner nodes of the tree, and they are formed by a label that describes their syntactical function and a list of PhraseElements that correspond to their children. A PhraseElement can be either a Phrase, which would correspond to another node in the tree or, in other words, the root of a new subtree, or a LeafElement, which would correspond to a leaf in the tree. At the same time, a LeafElement can be either a Token or a Space.

Tokens are the atomic elements in the parse tree that typically correspond to words, and they can be separated by Spaces. An Entity is a Token of a special nature such as a proper noun or a number that is characterized by its lemma, type and a list of morphological tags apart from the text itself. An AlignedEntity is an entity that has been aligned with another AlignedEntity in the same aligned sentence pair (one in the source sentence, and the other one in the target sentence), both of which having the same integer id that marks this relationship.

Apart from these interfaces to hierarchically represent text data, a LexicalWeighting calculates a numerical value for any given token pair (one in the source language, and the other one in the target language) that reflects how strongly aligned they are.