Inferring pictures layout from given text descriptions

Domínguez Becerril, Carlos

View/Open

Memoria (9.106Mb)

Date

2021-10-08

Author

Domínguez Becerril, Carlos

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/53289

Abstract

The contributions of this project consists of two systems that use a structured version of the text based on graphs and processed by a GCNN. The first one, called SG2BB that obtains directly the bounding boxes of the objects from the nodes of the graph, using handengineered heuristics. The second one, called GCN2LY that uses a seq2seq architecture with the same GCNN-based encoder as the first approach, but adding a decoder to learn to generate the layout sequentially. Moreover, this project also contributes to developing new metrics in order to better measure the quality of the generated layouts regarding the spatial composition of the scene and sizes of the objects drawn. The code developed to implement the architectures and perform the experiments can be found in the GitHub repository. This project will explore the state-of-the-art systems in computer vision, natural language processing, and deep learning for the improvement of these, which have immediate practical applications in various fields related to image creation and editing.