Computational models for semantic textual similarity
González Aguirre, Aitor
MetadataShow full item record
The overarching goal of this thesis is to advance on computational models of meaning and their evaluation. To achieve this goal we define two tasks and develop state-of-the-art systems that tackle both task: Semantic Textual Similarity (STS) and Typed Similarity.STS aims to measure the degree of semantic equivalence between two sentences by assigning graded similarity values that capture the intermediate shades of similarity. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube. This new system for STS is on par with state-of-the-art approaches that make use of Machine Learning (ML) without using any of it, but ML can be used on this system, improving the results.Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.Finally, we present systems capable of resolving the Typed Similarity task. The best system resulted in a real-world application to recommend similar items to users in an online digital library.