dc.contributor.author | Hernández González, Jerónimo | |
dc.contributor.author | Rodríguez, Daniel | |
dc.contributor.author | Inza Cano, Iñaki | |
dc.contributor.author | Harrison, Rachel | |
dc.contributor.author | Lozano Alonso, José Antonio | |
dc.date.accessioned | 2018-11-15T11:16:59Z | |
dc.date.available | 2018-11-15T11:16:59Z | |
dc.date.issued | 2018-06 | |
dc.identifier.citation | Data In Brief 18 : 840-845 (2018) | es_ES |
dc.identifier.issn | 2352-3409 | |
dc.identifier.uri | http://hdl.handle.net/10810/29671 | |
dc.description.abstract | Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy. Both datasets are prepared to solve the defect classification problem by means of techniques of the learning from crowds paradigm (Hernández-González et al. [1]).
Two versions of both datasets are publicly shared. In the first version, the raw data is given: the text description of defects together with the category assigned by each annotator. In the second version, the text of each defect has been transformed to a descriptive vector using text-mining techniques. | es_ES |
dc.description.sponsorship | This work has been partially supported by the Basque Government(IT609-13,ElkartekBID3A), the
Spanish Ministry of Economy and Competitiveness(TIN2016-78365-R) and the University-Society
Project15/19(Basque Government and University of the Basque Country UPV/EHU).JoseA.Lozano is
also supported by BERC Program 2014–2017(Basque Government) and Severo Ochoa Program SEV-
2013-0323 (Spanish Ministry of Economy and Competitiveness).Daniel Rodriguez carriedo utthis
work while visiting Oxford Brookes University.He is partlys upported by projects Badge People
TIN2016–76956-C3-3-R.Wewould like to thank Varsha Veerappa and the anony mousannotators for
their help with data collection. | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/TIN2016-78365-R | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/SEV-2013-0323 | |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.title | Two Datasets of Defect Reports Labeled By a Crowd of Annotators of Unknown Reliability | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.holder | 2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license
(http://creativecommons.org/licenses/by/4.0/). | es_ES |
dc.rights.holder | Atribución 3.0 España | * |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S2352340918303226?via%3Dihub | es_ES |
dc.identifier.doi | 10.1016/j.dib.2018.03.109 | |
dc.departamentoes | Ciencia de la computación e inteligencia artificial | es_ES |
dc.departamentoeu | Konputazio zientziak eta adimen artifiziala | es_ES |