Emergent Object-Centric Perception Through Intrinsically Motivated Play

Konstantaropoulos, Orestis

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19670

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Konstantaropoulos, Orestis	-
dc.date.accessioned	2025-07-07T07:25:13Z	-
dc.date.available	2025-07-07T07:25:13Z	-
dc.date.issued	2025-06-18	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19670	-
dc.description.abstract	Unlike conventional vision systems that rely on passive observation, biological agents can learn through physical interaction. Human infants, for example, spend hours interacting with toys in seemingly random ways—exploring their environment and engaging in non-goal-directed behaviors. It is believed that such agents construct internal transition models that allow them to predict the future states of their environment, which they later use to efficiently acquire new skills. This process typically unfolds in the absence of explicit supervision. Instead, biological learning is driven by intrinsic incentives and shaped by structural inductive biases that help the agent make sense of its surroundings. This raises a fundamental question: Can a robot similarly develop an understanding of its environment purely through interaction, without any prior knowledge or external supervision? In this thesis, we investigate how artificial agents can autonomously explore and learn about their environment through intrinsic motivation, much like how children engage in curious free play. To this end, we propose a novel, fully self-supervised, object-centric learning framework. Our system first segments visual input into discrete entities using Slot Attention, a self-supervised object-centric vision model trained entirely on data collected from random actions of a robotic arm. A graph-based world model is then trained to predict object-centric dynamics. However, due to the limited diversity of interactions in the initial dataset, the model struggles to capture object motion. To overcome this, we introduce an intrinsically motivated reward signal based on world model’s prediction error. This reward guides a policy that actively collects informative trajectories by proposing actions that are more likely to challenge the current model’s predictions. Empirically, this policy proposes actions that result in up to three times more object displacement compared to random actions, leading to significantly richer training data. We then fine-tune both the vision and world model on these data, which leads to improved prediction and reconstruction performance. We validate our method in a simulated robotic environment with diverse objects, demonstrating that meaningful visual and physical representations can emerge entirely from self-supervised interaction. The findings of this thesis contribute to the growing body of cognitively inspired algorithms designed to enhance artificial learning systems. Specifically, this thesis highlights the potential of intrinsically motivated, object-centric learning for autonomous world perception and modeling; paving the way for the designing of systems that can incrementally develop in novel, open-ended environments without human supervision. Part of our work was accepted at the 2025 IEEE International Conference on Development and Learning (ICDL) Prague, titled "Push, See, Predict: Emergent Perception Through Intrinsically Motivated Play" with the authors being Orestis Konstantaropoulos, Mehdi Khamassi, Petros Maragos and George Retsinas.	en_US
dc.language	en	en_US
dc.subject	Computer Vision, Reinforcement Learning, Active Perception, World Models, Deep Learning	en_US
dc.title	Emergent Object-Centric Perception Through Intrinsically Motivated Play	en_US
dc.description.pages	120	en_US
dc.contributor.supervisor	Μαραγκός Πέτρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
KONSTANTAROPOULOS_ORESTIS_THESIS.pdf		9.16 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.