On the optimal usage of labelled examples in semi-supervised multi-class classification problems

Ortigosa Hernández, Jonathan; Inza Cano, Iñaki; Lozano Alonso, José Antonio

View/Open

Report (8.890Mb)

Date

2015-04-23

Author

Ortigosa Hernández, Jonathan

Inza Cano, Iñaki

Lozano Alonso, José Antonio

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/15004

Abstract

In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed.

Collections

Informes técnicos y Documentos de trabajo