Inferring pictures layout from given text descriptions
View/ Open
Date
2021-10-08Author
Domínguez Becerril, Carlos
Metadata
Show full item recordAbstract
The contributions of this project consists of two systems that use a structured version
of the text based on graphs and processed by a GCNN. The first one, called SG2BB that
obtains directly the bounding boxes of the objects from the nodes of the graph, using handengineered heuristics. The second one, called GCN2LY that uses a seq2seq architecture
with the same GCNN-based encoder as the first approach, but adding a decoder to learn
to generate the layout sequentially. Moreover, this project also contributes to developing
new metrics in order to better measure the quality of the generated layouts regarding the
spatial composition of the scene and sizes of the objects drawn. The code developed to
implement the architectures and perform the experiments can be found in the GitHub
repository.
This project will explore the state-of-the-art systems in computer vision, natural language processing, and deep learning for the improvement of these, which have immediate
practical applications in various fields related to image creation and editing.