Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity

Lastra Díaz, Juan José; Goikoetxea Salutregi, Josu; Taieb, Mohamed Ali Hadj; García Serrano, Ana; Ben Aouicha, Mohamed; Agirre Bengoa, Eneko

dc.contributor.author	Lastra Díaz, Juan José
dc.contributor.author	Goikoetxea Salutregi, Josu
dc.contributor.author	Taieb, Mohamed Ali Hadj
dc.contributor.author	García Serrano, Ana
dc.contributor.author	Ben Aouicha, Mohamed
dc.contributor.author	Agirre Bengoa, Eneko
dc.date.accessioned	2020-01-17T13:05:54Z
dc.date.available	2020-01-17T13:05:54Z
dc.date.issued	2019-10-26
dc.identifier.citation	Data In Brief 26 : (2019) // Article ID UNSP 104432	es_ES
dc.identifier.issn	2352-3409
dc.identifier.uri	http://hdl.handle.net/10810/38598
dc.description.abstract	This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Diaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Diaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Diaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks. (c) 2019 The Authors. Published by Elsevier Inc.	es_ES
dc.description.sponsorship	This work has been partially supported by the Spanish Ministery of Economy and Competitiveness VEMODALEN project (TIN2015-71785-R), the UPV/EHU (excellence research group) and the Spanish Research Agency LIHLITH project (PCIN-2017-118/AEI) in the framework of EU ERA-Net CHIST-ERA.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/TIN2015-71785-R	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	ontology-based semantic similarity measures	es_ES
dc.subject	word embedding models	es_ES
dc.subject	information content models	es_ES
dc.subject	wordnet	es_ES
dc.subject	experimental survey	es_ES
dc.subject	HESML	es_ES
dc.subject	reprozip	es_ES
dc.title	Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Data in brief 26 (2019) 104432	es_ES
dc.rights.holder	Atribución 3.0 España	*
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S2352340919307875?via%3Dihub#sec1	es_ES
dc.identifier.doi	10.1016/j.dib.2019.104432
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES

Files in this item

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

Name:: 1-s2.0-S2352340919307875-main ...
Size:: 735.4Kb
Format:: PDF
Description:: Artículo principal

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Data in brief 26 (2019) 104432

Except where otherwise noted, this item's license is described as This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Data in brief 26 (2019) 104432