Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19670
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKonstantaropoulos, Orestis-
dc.date.accessioned2025-07-07T07:25:13Z-
dc.date.available2025-07-07T07:25:13Z-
dc.date.issued2025-06-18-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19670-
dc.description.abstractUnlike conventional vision systems that rely on passive observation, biological agents can learn through physical interaction. Human infants, for example, spend hours interacting with toys in seemingly random ways—exploring their environment and engaging in non-goal-directed behaviors. It is believed that such agents construct internal transition models that allow them to predict the future states of their environment, which they later use to efficiently acquire new skills. This process typically unfolds in the absence of explicit supervision. Instead, biological learning is driven by intrinsic incentives and shaped by structural inductive biases that help the agent make sense of its surroundings. This raises a fundamental question: Can a robot similarly develop an understanding of its environment purely through interaction, without any prior knowledge or external supervision? In this thesis, we investigate how artificial agents can autonomously explore and learn about their environment through intrinsic motivation, much like how children engage in curious free play. To this end, we propose a novel, fully self-supervised, object-centric learning framework. Our system first segments visual input into discrete entities using Slot Attention, a self-supervised object-centric vision model trained entirely on data collected from random actions of a robotic arm. A graph-based world model is then trained to predict object-centric dynamics. However, due to the limited diversity of interactions in the initial dataset, the model struggles to capture object motion. To overcome this, we introduce an intrinsically motivated reward signal based on world model’s prediction error. This reward guides a policy that actively collects informative trajectories by proposing actions that are more likely to challenge the current model’s predictions. Empirically, this policy proposes actions that result in up to three times more object displacement compared to random actions, leading to significantly richer training data. We then fine-tune both the vision and world model on these data, which leads to improved prediction and reconstruction performance. We validate our method in a simulated robotic environment with diverse objects, demonstrating that meaningful visual and physical representations can emerge entirely from self-supervised interaction. The findings of this thesis contribute to the growing body of cognitively inspired algorithms designed to enhance artificial learning systems. Specifically, this thesis highlights the potential of intrinsically motivated, object-centric learning for autonomous world perception and modeling; paving the way for the designing of systems that can incrementally develop in novel, open-ended environments without human supervision. Part of our work was accepted at the 2025 IEEE International Conference on Development and Learning (ICDL) Prague, titled "Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play" with the authors being Orestis Konstantaropoulos, Mehdi Khamassi, Petros Maragos and George Retsinas.en_US
dc.languageenen_US
dc.subjectComputer Vision, Reinforcement Learning, Active Perception, World Models, Deep Learningen_US
dc.titleEmergent Object-Centric Perception Through Intrinsically Motivated Playen_US
dc.description.pages120en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
KONSTANTAROPOULOS_ORESTIS_THESIS.pdf9.16 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.