Two Datasets of Defect Reports Labeled By a Crowd of Annotators of Unknown Reliability

Hernández González, Jerónimo; Rodríguez, Daniel; Inza Cano, Iñaki; Harrison, Rachel; Lozano Alonso, José Antonio

Ver/

Artículo principal (196.4Kb)

Fecha

2018-06

Autor

Hernández González, Jerónimo

Rodríguez, Daniel

Inza Cano, Iñaki

Harrison, Rachel

Lozano Alonso, José Antonio

Metadatos

Mostrar el registro completo del ítem

Estadisticas en RECOLECTA
(LA Referencia)

Data In Brief 18 : 840-845 (2018)

URI

http://hdl.handle.net/10810/29671

Resumen

Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy. Both datasets are prepared to solve the defect classification problem by means of techniques of the learning from crowds paradigm (Hernández-González et al. [1]). Two versions of both datasets are publicly shared. In the first version, the raw data is given: the text description of defects together with the category assigned by each annotator. In the second version, the text of each defect has been transformed to a descriptive vector using text-mining techniques.

Colecciones

Artículos

2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license
(http://creativecommons.org/licenses/by/4.0/).

Excepto si se señala otra cosa, la licencia del ítem se describe como 2018 The Authors.Published by Elsevier Inc.This is an open access article under the CCBY license (http://creativecommons.org/licenses/by/4.0/).