Two Datasets of Defect Reports Labeled By a Crowd of Annotators of Unknown Reliability

Hernández González, Jerónimo; Rodríguez, Daniel; Inza Cano, Iñaki; Harrison, Rachel; Lozano Alonso, José Antonio

dc.contributor.author	Hernández González, Jerónimo
dc.contributor.author	Rodríguez, Daniel
dc.contributor.author	Inza Cano, Iñaki
dc.contributor.author	Harrison, Rachel
dc.contributor.author	Lozano Alonso, José Antonio
dc.date.accessioned	2018-11-15T11:16:59Z
dc.date.available	2018-11-15T11:16:59Z
dc.date.issued	2018-06
dc.identifier.citation	Data In Brief 18 : 840-845 (2018)	es_ES
dc.identifier.issn	2352-3409
dc.identifier.uri	http://hdl.handle.net/10810/29671
dc.description.abstract	Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy. Both datasets are prepared to solve the defect classification problem by means of techniques of the learning from crowds paradigm (Hernández-González et al. [1]). Two versions of both datasets are publicly shared. In the first version, the raw data is given: the text description of defects together with the category assigned by each annotator. In the second version, the text of each defect has been transformed to a descriptive vector using text-mining techniques.	es_ES
dc.description.sponsorship	This work has been partially supported by the Basque Government(IT609-13,ElkartekBID3A), the Spanish Ministry of Economy and Competitiveness(TIN2016-78365-R) and the University-Society Project15/19(Basque Government and University of the Basque Country UPV/EHU).JoseA.Lozano is also supported by BERC Program 2014–2017(Basque Government) and Severo Ochoa Program SEV- 2013-0323 (Spanish Ministry of Economy and Competitiveness).Daniel Rodriguez carriedo utthis work while visiting Oxford Brookes University.He is partlys upported by projects Badge People TIN2016–76956-C3-3-R.Wewould like to thank Varsha Veerappa and the anony mousannotators for their help with data collection.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/TIN2016-78365-R	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/SEV-2013-0323
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.title	Two Datasets of Defect Reports Labeled By a Crowd of Annotators of Unknown Reliability	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license (http://creativecommons.org/licenses/by/4.0/).	es_ES
dc.rights.holder	Atribución 3.0 España	*
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S2352340918303226?via%3Dihub	es_ES
dc.identifier.doi	10.1016/j.dib.2018.03.109
dc.departamentoes	Ciencia de la computación e inteligencia artificial	es_ES
dc.departamentoeu	Konputazio zientziak eta adimen artifiziala	es_ES

Files in this item

Name:: 1-s2.0-S2352340918303226-main.pdf
Size:: 196.4Kb
Format:: PDF
Description:: Artículo principal

View/Open

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record

2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license
(http://creativecommons.org/licenses/by/4.0/).

Except where otherwise noted, this item's license is described as 2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license (http://creativecommons.org/licenses/by/4.0/).