Research Lines

Our research focuses on visual intelligence for Human-AI collaboration. We study how AI systems can perceive, understand, and anticipate what people are doing from first-person and multi-view observations, with the goal of supporting human activity through memory, skill-aware analysis, and timely assistance.

We focus on the following themes:

Procedural Understanding

Featured Work Gallery

We study how complex activities unfold over time. Our work develops models for anticipating future actions, representing procedural structure, and reasoning over long-horizon tasks in egocentric and instructional video.

This line of research moves from early action anticipation toward structured procedural reasoning, including task graphs, action segmentation, and planning-aware representations that capture not only what happens next, but how actions are organized into coherent processes.

Linked Publications

conference 2026 🏆 Highlight Top 14%
ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Luigi Seminara , Davide Moltisanti , Antonino Furnari

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

BibTeX Citation

                                      @inproceedings{Seminara2026ViterbiPlanNet,
  year = { 2026 },
  booktitle = { IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) },
  title = { ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos },
  author = { Luigi Seminara and Davide Moltisanti and Antonino Furnari },
  pdf = {https://arxiv.org/pdf/2603.04265},
  url = {https://arxiv.org/abs/2603.04265}
}

                                    
journal 2026 🏆 1st Place Ego-Exo4D Procedure Understanding Challenge 2025
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

Luigi Seminara , Giovanni Maria Farinella , Antonino Furnari

IEEE Transactions on Pattern Analysis and Machine Intelligence

BibTeX Citation

                                      @article{seminara2026task,
  author={Seminara, Luigi and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos}, 
  year={2026},
  volume={},
  number={},
  pages={1-18},
  doi={10.1109/TPAMI.2026.3689721}}

                                    
journal 2026 🏆 2nd Place Ego-Exo4D Procedure Understanding Challenge 2025
Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation

Lorenzo Mur-Labadia , Ruben Martinez-Cantin , Jose J. Guerrero , Giovanni Maria Farinella , Antonino Furnari

IEEE Transactions on Pattern Analysis and Machine Intelligence

BibTeX Citation

                                      @article{MurLabadia2026-Integrating,
  pdf = { publications/Mur_Labadia2026Integrating.pdf },
  url = { https://ieeexplore.ieee.org/document/11344783 },
  pages = { 1-17 },
  number = {  },
  year = { 2026 },
  doi = { http://10.1109/TPAMI.2026.3652831 },
  title = { Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation },
  journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence },
  author = { Lorenzo Mur-Labadia and Ruben Martinez-Cantin and Jose J. Guerrero and Giovanni Maria Farinella and Antonino Furnari },
}

                                    
journal 2025
Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs

Camillo Quattrocchi , Antonino Furnari , Daniele Di Mauro , Mario Valerio Giuffrida , Giovanni Maria Farinella

International Journal on Computer Vision (IJCV)

BibTeX Citation

                                      @article{quattrocchi2024synchronization,
  year = { 2025 },
  journal = { International Journal on Computer Vision (IJCV) },
  title = { Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs },
  author = { Camillo Quattrocchi and Antonino Furnari and Daniele Di Mauro and Mario Valerio Giuffrida and Giovanni Maria Farinella },
  url = { https://github.com/fpv-iplab/synchronization-is-all-you-need },
}

                                    

Skill, Errors, and Assistance

Featured Work Gallery

We develop methods that go beyond recognizing actions to evaluate how well they are performed. Our research in this area focuses on mistake detection, skill assessment, and assistive feedback for procedural activities, especially in egocentric settings where understanding the user’s intent and execution is crucial.

The long-term goal is to enable AI systems that can support people during real tasks by identifying deviations, anticipating difficulties, and providing timely, actionable guidance aligned with human autonomy.

Linked Publications

conference 2026
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance

Francesco Ragusa , Michele Mazzamuto , Rosario Forte , Irene D'Ambra , James Fort , Jakob Engel , Antonino Furnari , Giovanni Maria Farinella

IEEE Winter Conference on Application of Computer Vision (WACV)

BibTeX Citation

                                      @inproceedings{Ragusa2026Ego-EXTRA,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance },
  author = { Francesco Ragusa and Michele Mazzamuto and Rosario Forte and Irene D'Ambra and James Fort and Jakob Engel and Antonino Furnari and Giovanni Maria Farinella },
  pdf = { https://arxiv.org/pdf/2512.13238 },
}

                                    
preprint 2026
RECIPE: Procedural Planning via Grounding in Instructional Video

Luigi Seminara , Antonino Furnari , Lorenzo Torresani

arXiv preprint arXiv:2605.19976

BibTeX Citation

                                      @article{seminara2026recipe,
  title={RECIPE: Procedural Planning via Grounding in Instructional Video},
  author={Seminara, Luigi and Furnari, Antonino and Torresani, Lorenzo},
  journal={arXiv preprint arXiv:2605.19976},
  year={2026}
}

                                    
conference 2024
PREGO: online mistake detection in PRocedural EGOcentric videos

Alessandro Flaborea , Guido D'Amely , Leonardo Plini , Luca Scofano , Edoardo De Matteis , Antonino Furnari , Giovanni Maria Farinella , Fabio Galasso

Conference on Computer Vision and Pattern Recognition (CVPR)

BibTeX Citation

                                      @inproceedings{flaborea2024PREGO,
  year = {2024},
  booktitle = {  Conference on Computer Vision and Pattern Recognition (CVPR)  },
  title = {  PREGO: online mistake detection in PRocedural EGOcentric videos  },
  author = { Alessandro Flaborea and Guido D'Amely and Leonardo Plini and Luca Scofano and Edoardo De Matteis and Antonino Furnari and Giovanni Maria Farinella and Fabio Galasso },
  pdf={https://arxiv.org/pdf/2404.01933},
  url={https://github.com/aleflabo/PREGO?tab=readme-ov-file}

}

                                    

Memory and Streaming Intelligence

Featured Work Gallery

We investigate how AI systems can observe continuously, remember relevant past events, and reason over long streams of egocentric experience. This includes streaming perception, episodic memory, and multimodal question answering over events that unfold over time.

Our aim is to build systems that do not simply process isolated frames or clips, but maintain a compact and useful representation of ongoing experience, enabling context-aware reasoning and support in always-on wearable scenarios.

Linked Publications

conference 2026
Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

Zaira Manigrasso , Matteo Dunnhofer , Antonino Furnari , Moritz Nottebaum , Antonio Finocchiaro , Davide Marana , Rosario Forte , Giovanni Maria Farinella , Christian Micheloni

IEEE Winter Conference on Application of Computer Vision (WACV)

BibTeX Citation

                                      @inproceedings{Manigrasso2026Online,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory },
  author = { Zaira Manigrasso and Matteo Dunnhofer and Antonino Furnari and Moritz Nottebaum and Antonio Finocchiaro and Davide Marana and Rosario Forte and Giovanni Maria Farinella and Christian Micheloni },
  pdf = {  },
}

                                    
conference 2026
Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge

Giuseppe Lando , Rosario Forte , Antonino Furnari

International Conference on Computer Vision Theory and Applications (VISAPP)

BibTeX Citation

                                      @inproceedings{forte2026exploring,
  title={Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge},
  author={Giuseppe Lando and Rosario Forte and Antonino Furnari},
  booktitle={International Conference on Computer Vision Theory and Applications (VISAPP)},
  year={2026},
  url={https://arxiv.org/abs/2602.22455},
  pdf={https://arxiv.org/pdf/2602.22455}
}

                                    
preprint 2026
Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark

Maria Santos-Villafranca , Jesus Bermudez-Cameo , Alejandro Perez-Yus , Giovanni Maria Farinella , Antonino Furnari

arXiv preprint arXiv:2606.02246

BibTeX Citation

                                      @article{santosvillafranca2026egometas,
  title={Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark},
  author={Santos-Villafranca, Maria and Bermudez-Cameo, Jesus and Perez-Yus, Alejandro and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={arXiv preprint},
  year={2026},
  arxiv={2606.02246}
}

                                    
preprint 2026
EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

Rosario Forte , Giuseppe Lando , Antonino Furnari

arXiv preprint arXiv:2605.31557

BibTeX Citation

                                      @article{forte2026egostream,
  title={EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision},
  author={Forte, Rosario and Lando, Giuseppe and Furnari, Antonino},
  journal={arXiv preprint arXiv:2605.31557},
  year={2026}
}

                                    
conference 2025 🏆 Best Student Paper Award
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?

Giuseppe Lando , Rosario Forte , Giovanni Maria Farinella , Antonino Furnari

Proceedings of the 23rd International Conference on Image Analysis and Processing (ICIAP)

BibTeX Citation

                                      @inproceedings{Lando2025HowFar,
  author    = {Giuseppe Lando and Rosario Forte and Giovanni Maria Farinella and Antonino Furnari},
  title     = {How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?},
  booktitle = {Proceedings of the 23rd International Conference on Image Analysis and Processing (ICIAP)},
  year      = {2025}
}

                                    

Datasets and Benchmarks

Featured Work Gallery

We contribute datasets, benchmarks, and evaluation protocols that help shape research in egocentric and procedural video understanding. These resources provide the community with challenging real-world scenarios for studying action recognition, anticipation, skill understanding, memory, and assistance.

By building shared benchmarks, we aim to support reproducible progress and enable new research directions at the intersection of first-person vision, multimodal learning, and human-centred AI.

Linked Publications

conference 2026
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance

Francesco Ragusa , Michele Mazzamuto , Rosario Forte , Irene D'Ambra , James Fort , Jakob Engel , Antonino Furnari , Giovanni Maria Farinella

IEEE Winter Conference on Application of Computer Vision (WACV)

BibTeX Citation

                                      @inproceedings{Ragusa2026Ego-EXTRA,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance },
  author = { Francesco Ragusa and Michele Mazzamuto and Rosario Forte and Irene D'Ambra and James Fort and Jakob Engel and Antonino Furnari and Giovanni Maria Farinella },
  pdf = { https://arxiv.org/pdf/2512.13238 },
}

                                    
preprint 2026
Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark

Maria Santos-Villafranca , Jesus Bermudez-Cameo , Alejandro Perez-Yus , Giovanni Maria Farinella , Antonino Furnari

arXiv preprint arXiv:2606.02246

BibTeX Citation

                                      @article{santosvillafranca2026egometas,
  title={Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark},
  author={Santos-Villafranca, Maria and Bermudez-Cameo, Jesus and Perez-Yus, Alejandro and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={arXiv preprint},
  year={2026},
  arxiv={2606.02246}
}

                                    
journal 2025
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman , Andrew Westbury , Lorenzo Torresani , Kris Kitani , Jitendra Malik , Triantafyllos Afouras , Kumar Ashutosh , Vijay Baiyya , Siddhant Bansal , Bikram Boote , Eugene Byrne , Zach Chavis , Joya Chen , Feng Cheng , Fu-Jen Chu , Sean Crane , Avijit Dasgupta , Jing Dong , Maria Escobar , Cristhian Forigua , Abrham Gebreselasie , Sanjay Haresh , Jing Huang , Md Mohaiminul Islam , Suyog Jain , Rawal Khirodkar , Devansh Kukreja , Kevin J. Liang , Jia-Wei Liu , Sagnik Majumder , Yongsen Mao , Miguel Martin , Effrosyni Mavroudi , Tushar Nagarajan , Francesco Ragusa , Santhosh Kumar Ramakrishnan , Luigi Seminara , Arjun Somayazulu , Yale Song , Shan Su , Zihui Xue , Edward Zhang , Jinxu Zhang , Angela Castillo , Changan Chen , Xinzhu Fu , Ryosuke Furuta , Cristina González , Prince Gupta , Jiabo Hu , Yifei Huang , Yiming Huang , Weslie Khoo , Anush Kumar , Robert Kuo , Sach Lakhavani , Miao Liu , Mi Luo , Zhengyi Luo , Brighid Meredith , Austin Miller , Oluwatumininu Oguntola , Xiaqing Pan , Penny Peng , Shraman Pramanick , Merey Ramazanova , Fiona Ryan , Wei Shan , Kiran Somasundaram , Chenan Song , Audrey Southerland , Masatoshi Tateno , Huiyu Wang , Yuchen Wang , Takuma Yagi , Mingfei Yan , Xitong Yang , Zecheng Yu , Shengxin Cindy Zha , Chen Zhao , Ziwei Zhao , Zhifan Zhu , Jeff Zhuo , Pablo Arbeláez , Gedas Bertasius , David Crandall , Dima Damen , Jakob Engel , Giovanni Maria Farinella , Antonino Furnari , Bernard Ghanem , Judy Hoffman , C. V. Jawahar , Richard Newcombe , Hyun Soo Park , James M. Rehg , Yoichi Sato , Manolis Savva , Jianbo Shi , Mike Zheng Shou , Michael Wray

International Journal of Computer Vision

BibTeX Citation

                                      @article{Grauman2025,
  author    = {Kristen Grauman and Andrew Westbury and Lorenzo Torresani and Kris Kitani and Jitendra Malik and Triantafyllos Afouras and Kumar Ashutosh and Vijay Baiyya and Siddhant Bansal and Bikram Boote and Eugene Byrne and Zach Chavis and Joya Chen and Feng Cheng and Fu-Jen Chu and Sean Crane and Avijit Dasgupta and Jing Dong and Maria Escobar and Cristhian Forigua and Abrham Gebreselasie and Sanjay Haresh and Jing Huang and Md Mohaiminul Islam and Suyog Jain and Rawal Khirodkar and Devansh Kukreja and Kevin J. Liang and Jia-Wei Liu and Sagnik Majumder and Yongsen Mao and Miguel Martin and Effrosyni Mavroudi and Tushar Nagarajan and Francesco Ragusa and Santhosh Kumar Ramakrishnan and Luigi Seminara and Arjun Somayazulu and Yale Song and Shan Su and Zihui Xue and Edward Zhang and Jinxu Zhang and Angela Castillo and Changan Chen and Xinzhu Fu and Ryosuke Furuta and Cristina González and Prince Gupta and Jiabo Hu and Yifei Huang and Yiming Huang and Weslie Khoo and Anush Kumar and Robert Kuo and Sach Lakhavani and Miao Liu and Mi Luo and Zhengyi Luo and Brighid Meredith and Austin Miller and Oluwatumininu Oguntola and Xiaqing Pan and Penny Peng and Shraman Pramanick and Merey Ramazanova and Fiona Ryan and Wei Shan and Kiran Somasundaram and Chenan Song and Audrey Southerland and Masatoshi Tateno and Huiyu Wang and Yuchen Wang and Takuma Yagi and Mingfei Yan and Xitong Yang and Zecheng Yu and Shengxin Cindy Zha and Chen Zhao and Ziwei Zhao and Zhifan Zhu and Jeff Zhuo and Pablo Arbeláez and Gedas Bertasius and David Crandall and Dima Damen and Jakob Engel and Giovanni Maria Farinella and Antonino Furnari and Bernard Ghanem and Judy Hoffman and C. V. Jawahar and Richard Newcombe and Hyun Soo Park and James M. Rehg and Yoichi Sato and Manolis Savva and Jianbo Shi and Mike Zheng Shou and Michael Wray},
  title     = {Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives},
  journal   = {International Journal of Computer Vision},
  year      = {2025},
  month     = nov,
  day       = {24},
  volume    = {},
  number    = {},
  pages     = {},
  doi       = {10.1007/s11263-025-02557-6},
  url       = {https://doi.org/10.1007/s11263-025-02557-6},
  issn      = {1573-1405},
  pdf = {https://link.springer.com/content/pdf/10.1007/s11263-025-02557-6.pdf}
}

                                    
conference 2025
EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

Ivan Rodin , Tz-Ying Wu , Kyle Min , Sharath Nittur Sridhar , Antonino Furnari , Subarna Tripathi , Giovanni Maria Farinella

IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

BibTeX Citation

                                      @inproceedings{Rodin2025EASG-Bench,
  year = { 2025 },
  booktitle = { IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) },
  title = { EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs },
  author = { Ivan Rodin and Tz-Ying Wu and Kyle Min and Sharath Nittur Sridhar and Antonino Furnari and Subarna Tripathi and Giovanni Maria Farinella },
  url = { https://arxiv.org/abs/2506.05787 },
  pdf = { https://arxiv.org/pdf/2506.05787.pdf },
}

                                    
conference 2024
Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin , Antonino Furnari , Kyle Min , Subarna Tripathi , Giovanni Maria Farinella

Conference on Computer Vision and Pattern Recognition (CVPR)

BibTeX Citation

                                      @inproceedings{rodin2023action,
  primaryclass = { cs.CV },
  archiveprefix = { arXiv },
  eprint = { 2312.03391 },
  pdf = {https://arxiv.org/pdf/2312.03391.pdf},
  year = {2024},
  booktitle = {  Conference on Computer Vision and Pattern Recognition (CVPR)  },
  title = {Action Scene Graphs for Long-Form Understanding of Egocentric Videos},
  author = {Ivan Rodin and Antonino Furnari and Kyle Min and Subarna Tripathi and Giovanni Maria Farinella}
}

                                    

Looking for the full publications list?

Explore our complete catalog of journal articles, conference papers, patents, and datasets.

View All Publications