Research Lines

Our research focuses on visual intelligence for Human-AI collaboration. We study how AI systems can perceive, understand, and anticipate what people are doing from first-person and multi-view observations, with the goal of supporting human activity through memory, skill-aware analysis, and timely assistance.

We focus on the following themes:

Procedural Understanding Skill, Errors, and Assistance Memory and Streaming Intelligence Datasets and Benchmarks

Procedural Understanding

Featured Work Gallery

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation

Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs

We study how complex activities unfold over time. Our work develops models for anticipating future actions, representing procedural structure, and reasoning over long-horizon tasks in egocentric and instructional video.

This line of research moves from early action anticipation toward structured procedural reasoning, including task graphs, action segmentation, and planning-aware representations that capture not only what happens next, but how actions are organized into coherent processes.

Linked Publications

conference 2026 🏆 Highlight Top 14% 🏆 CVPR 2026 Efficient Badge

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Luigi Seminara , Davide Moltisanti , Antonino Furnari

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

PDF Code Website

BibTeX Citation


                                      @inproceedings{Seminara2026ViterbiPlanNet,
  year = { 2026 },
  booktitle = { IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) },
  title = { ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos },
  author = { Luigi Seminara and Davide Moltisanti and Antonino Furnari },
  pdf = {https://arxiv.org/pdf/2603.04265},
  url = {https://arxiv.org/abs/2603.04265}
}

journal 2026 🏆 1st Place Ego-Exo4D Procedure Understanding Challenge 2025

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

Luigi Seminara , Giovanni Maria Farinella , Antonino Furnari

IEEE Transactions on Pattern Analysis and Machine Intelligence

Code

BibTeX Citation


                                      @article{seminara2026task,
  author={Seminara, Luigi and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos}, 
  year={2026},
  volume={},
  number={},
  pages={1-18},
  doi={10.1109/TPAMI.2026.3689721}}

journal 2026 🏆 2nd Place Ego-Exo4D Procedure Understanding Challenge 2025

Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation

Lorenzo Mur-Labadia , Ruben Martinez-Cantin , Jose J. Guerrero , Giovanni Maria Farinella , Antonino Furnari

IEEE Transactions on Pattern Analysis and Machine Intelligence

PDF Project

BibTeX Citation


                                      @article{MurLabadia2026-Integrating,
  pdf = { publications/Mur_Labadia2026Integrating.pdf },
  url = { https://ieeexplore.ieee.org/document/11344783 },
  pages = { 1-17 },
  number = {  },
  year = { 2026 },
  doi = { http://10.1109/TPAMI.2026.3652831 },
  title = { Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation },
  journal = { IEEE Transactions on Pattern Analysis and Machine Intelligence },
  author = { Lorenzo Mur-Labadia and Ruben Martinez-Cantin and Jose J. Guerrero and Giovanni Maria Farinella and Antonino Furnari },
}

journal 2025

Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs

Camillo Quattrocchi , Antonino Furnari , Daniele Di Mauro , Mario Valerio Giuffrida , Giovanni Maria Farinella

International Journal on Computer Vision (IJCV)

Project

BibTeX Citation


                                      @article{quattrocchi2024synchronization,
  year = { 2025 },
  journal = { International Journal on Computer Vision (IJCV) },
  title = { Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs },
  author = { Camillo Quattrocchi and Antonino Furnari and Daniele Di Mauro and Mario Valerio Giuffrida and Giovanni Maria Farinella },
  url = { https://github.com/fpv-iplab/synchronization-is-all-you-need },
}

Skill, Errors, and Assistance

Featured Work Gallery

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance

RECIPE: Procedural Planning via Grounding in Instructional Video

PREGO: online mistake detection in PRocedural EGOcentric videos

We develop methods that go beyond recognizing actions to evaluate how well they are performed. Our research in this area focuses on mistake detection, skill assessment, and assistive feedback for procedural activities, especially in egocentric settings where understanding the user’s intent and execution is crucial.

The long-term goal is to enable AI systems that can support people during real tasks by identifying deviations, anticipating difficulties, and providing timely, actionable guidance aligned with human autonomy.

Linked Publications

conference 2026

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance

Francesco Ragusa , Michele Mazzamuto , Rosario Forte , Irene D'Ambra , James Fort , Jakob Engel , Antonino Furnari , Giovanni Maria Farinella

IEEE Winter Conference on Application of Computer Vision (WACV)

PDF Project

BibTeX Citation


                                      @inproceedings{Ragusa2026Ego-EXTRA,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance },
  author = { Francesco Ragusa and Michele Mazzamuto and Rosario Forte and Irene D'Ambra and James Fort and Jakob Engel and Antonino Furnari and Giovanni Maria Farinella },
  pdf = { https://arxiv.org/pdf/2512.13238 },
}

preprint 2026

RECIPE: Procedural Planning via Grounding in Instructional Video

Luigi Seminara , Antonino Furnari , Lorenzo Torresani

arXiv preprint arXiv:2605.19976

arXiv PDF Website

BibTeX Citation


                                      @article{seminara2026recipe,
  title={RECIPE: Procedural Planning via Grounding in Instructional Video},
  author={Seminara, Luigi and Furnari, Antonino and Torresani, Lorenzo},
  journal={arXiv preprint arXiv:2605.19976},
  year={2026}
}

conference 2024

PREGO: online mistake detection in PRocedural EGOcentric videos

Alessandro Flaborea , Guido D'Amely , Leonardo Plini , Luca Scofano , Edoardo De Matteis , Antonino Furnari , Giovanni Maria Farinella , Fabio Galasso

Conference on Computer Vision and Pattern Recognition (CVPR)

PDF Project

BibTeX Citation


                                      @inproceedings{flaborea2024PREGO,
  year = {2024},
  booktitle = {  Conference on Computer Vision and Pattern Recognition (CVPR)  },
  title = {  PREGO: online mistake detection in PRocedural EGOcentric videos  },
  author = { Alessandro Flaborea and Guido D'Amely and Leonardo Plini and Luca Scofano and Edoardo De Matteis and Antonino Furnari and Giovanni Maria Farinella and Fabio Galasso },
  pdf={https://arxiv.org/pdf/2404.01933},
  url={https://github.com/aleflabo/PREGO?tab=readme-ov-file}

}

Memory and Streaming Intelligence

Featured Work Gallery

Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge

Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark

EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?

We investigate how AI systems can observe continuously, remember relevant past events, and reason over long streams of egocentric experience. This includes streaming perception, episodic memory, and multimodal question answering over events that unfold over time.

Our aim is to build systems that do not simply process isolated frames or clips, but maintain a compact and useful representation of ongoing experience, enabling context-aware reasoning and support in always-on wearable scenarios.

Linked Publications

conference 2026

Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

Zaira Manigrasso , Matteo Dunnhofer , Antonino Furnari , Moritz Nottebaum , Antonio Finocchiaro , Davide Marana , Rosario Forte , Giovanni Maria Farinella , Christian Micheloni

IEEE Winter Conference on Application of Computer Vision (WACV)

BibTeX Citation


                                      @inproceedings{Manigrasso2026Online,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory },
  author = { Zaira Manigrasso and Matteo Dunnhofer and Antonino Furnari and Moritz Nottebaum and Antonio Finocchiaro and Davide Marana and Rosario Forte and Giovanni Maria Farinella and Christian Micheloni },
  pdf = {  },
}

conference 2026

Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge

Giuseppe Lando , Rosario Forte , Antonino Furnari

International Conference on Computer Vision Theory and Applications (VISAPP)

arXiv PDF

BibTeX Citation


                                      @inproceedings{forte2026exploring,
  title={Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge},
  author={Giuseppe Lando and Rosario Forte and Antonino Furnari},
  booktitle={International Conference on Computer Vision Theory and Applications (VISAPP)},
  year={2026},
  url={https://arxiv.org/abs/2602.22455},
  pdf={https://arxiv.org/pdf/2602.22455}
}

preprint 2026

Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark

Maria Santos-Villafranca , Jesus Bermudez-Cameo , Alejandro Perez-Yus , Giovanni Maria Farinella , Antonino Furnari

arXiv preprint arXiv:2606.02246

Website arXiv Data

BibTeX Citation


                                      @article{santosvillafranca2026egometas,
  title={Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark},
  author={Santos-Villafranca, Maria and Bermudez-Cameo, Jesus and Perez-Yus, Alejandro and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={arXiv preprint},
  year={2026},
  arxiv={2606.02246}
}

preprint 2026

EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision

Rosario Forte , Giuseppe Lando , Antonino Furnari

arXiv preprint arXiv:2605.31557

arXiv PDF Website

BibTeX Citation


                                      @article{forte2026egostream,
  title={EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision},
  author={Forte, Rosario and Lando, Giuseppe and Furnari, Antonino},
  journal={arXiv preprint arXiv:2605.31557},
  year={2026}
}

conference 2026

Retrieval-Augmented Online Textual Memory for Episodic Memory Video Question Answering

Raffaele Calì , Giuseppe Lando , Rosario Forte , Antonino Furnari

International Conference on Content-Based Multimedia Indexing (CBMI)

Conference

BibTeX Citation


                                      @inproceedings{Cali2026Retrieval,
  author    = {Raffaele Cal{\`i} and Giuseppe Lando and Rosario Forte and Antonino Furnari},
  title     = {Retrieval-Augmented Online Textual Memory for Episodic Memory Video Question Answering},
  booktitle = {International Conference on Content-Based Multimedia Indexing (CBMI)},
  year      = {2026}
}

conference 2025 🏆 Best Student Paper Award

How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?

Giuseppe Lando , Rosario Forte , Giovanni Maria Farinella , Antonino Furnari

Proceedings of the 23rd International Conference on Image Analysis and Processing (ICIAP)

arXiv PDF

BibTeX Citation


                                      @inproceedings{Lando2025HowFar,
  author    = {Giuseppe Lando and Rosario Forte and Giovanni Maria Farinella and Antonino Furnari},
  title     = {How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?},
  booktitle = {Proceedings of the 23rd International Conference on Image Analysis and Processing (ICIAP)},
  year      = {2025}
}

Datasets and Benchmarks

Featured Work Gallery

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

We contribute datasets, benchmarks, and evaluation protocols that help shape research in egocentric and procedural video understanding. These resources provide the community with challenging real-world scenarios for studying action recognition, anticipation, skill understanding, memory, and assistance.

By building shared benchmarks, we aim to support reproducible progress and enable new research directions at the intersection of first-person vision, multimodal learning, and human-centred AI.

Linked Publications

conference 2026

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance

Francesco Ragusa , Michele Mazzamuto , Rosario Forte , Irene D'Ambra , James Fort , Jakob Engel , Antonino Furnari , Giovanni Maria Farinella

IEEE Winter Conference on Application of Computer Vision (WACV)

PDF Project

BibTeX Citation


                                      @inproceedings{Ragusa2026Ego-EXTRA,
  year = { 2026 },
  booktitle = { IEEE Winter Conference on Application of Computer Vision (WACV) },
  title = { Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance },
  author = { Francesco Ragusa and Michele Mazzamuto and Rosario Forte and Irene D'Ambra and James Fort and Jakob Engel and Antonino Furnari and Giovanni Maria Farinella },
  pdf = { https://arxiv.org/pdf/2512.13238 },
}

preprint 2026

Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark

Maria Santos-Villafranca , Jesus Bermudez-Cameo , Alejandro Perez-Yus , Giovanni Maria Farinella , Antonino Furnari

arXiv preprint arXiv:2606.02246

Website arXiv Data

BibTeX Citation


                                      @article{santosvillafranca2026egometas,
  title={Ego-METAS: an Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark},
  author={Santos-Villafranca, Maria and Bermudez-Cameo, Jesus and Perez-Yus, Alejandro and Farinella, Giovanni Maria and Furnari, Antonino},
  journal={arXiv preprint},
  year={2026},
  arxiv={2606.02246}
}

journal 2025

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman , Andrew Westbury , Lorenzo Torresani , Kris Kitani , Jitendra Malik , Triantafyllos Afouras , Kumar Ashutosh , Vijay Baiyya , Siddhant Bansal , Bikram Boote , Eugene Byrne , Zach Chavis , Joya Chen , Feng Cheng , Fu-Jen Chu , Sean Crane , Avijit Dasgupta , Jing Dong , Maria Escobar , Cristhian Forigua , Abrham Gebreselasie , Sanjay Haresh , Jing Huang , Md Mohaiminul Islam , Suyog Jain , Rawal Khirodkar , Devansh Kukreja , Kevin J. Liang , Jia-Wei Liu , Sagnik Majumder , Yongsen Mao , Miguel Martin , Effrosyni Mavroudi , Tushar Nagarajan , Francesco Ragusa , Santhosh Kumar Ramakrishnan , Luigi Seminara , Arjun Somayazulu , Yale Song , Shan Su , Zihui Xue , Edward Zhang , Jinxu Zhang , Angela Castillo , Changan Chen , Xinzhu Fu , Ryosuke Furuta , Cristina González , Prince Gupta , Jiabo Hu , Yifei Huang , Yiming Huang , Weslie Khoo , Anush Kumar , Robert Kuo , Sach Lakhavani , Miao Liu , Mi Luo , Zhengyi Luo , Brighid Meredith , Austin Miller , Oluwatumininu Oguntola , Xiaqing Pan , Penny Peng , Shraman Pramanick , Merey Ramazanova , Fiona Ryan , Wei Shan , Kiran Somasundaram , Chenan Song , Audrey Southerland , Masatoshi Tateno , Huiyu Wang , Yuchen Wang , Takuma Yagi , Mingfei Yan , Xitong Yang , Zecheng Yu , Shengxin Cindy Zha , Chen Zhao , Ziwei Zhao , Zhifan Zhu , Jeff Zhuo , Pablo Arbeláez , Gedas Bertasius , David Crandall , Dima Damen , Jakob Engel , Giovanni Maria Farinella , Antonino Furnari , Bernard Ghanem , Judy Hoffman , C. V. Jawahar , Richard Newcombe , Hyun Soo Park , James M. Rehg , Yoichi Sato , Manolis Savva , Jianbo Shi , Mike Zheng Shou , Michael Wray

International Journal of Computer Vision

PDF Project

BibTeX Citation


                                      @article{Grauman2025,
  author    = {Kristen Grauman and Andrew Westbury and Lorenzo Torresani and Kris Kitani and Jitendra Malik and Triantafyllos Afouras and Kumar Ashutosh and Vijay Baiyya and Siddhant Bansal and Bikram Boote and Eugene Byrne and Zach Chavis and Joya Chen and Feng Cheng and Fu-Jen Chu and Sean Crane and Avijit Dasgupta and Jing Dong and Maria Escobar and Cristhian Forigua and Abrham Gebreselasie and Sanjay Haresh and Jing Huang and Md Mohaiminul Islam and Suyog Jain and Rawal Khirodkar and Devansh Kukreja and Kevin J. Liang and Jia-Wei Liu and Sagnik Majumder and Yongsen Mao and Miguel Martin and Effrosyni Mavroudi and Tushar Nagarajan and Francesco Ragusa and Santhosh Kumar Ramakrishnan and Luigi Seminara and Arjun Somayazulu and Yale Song and Shan Su and Zihui Xue and Edward Zhang and Jinxu Zhang and Angela Castillo and Changan Chen and Xinzhu Fu and Ryosuke Furuta and Cristina González and Prince Gupta and Jiabo Hu and Yifei Huang and Yiming Huang and Weslie Khoo and Anush Kumar and Robert Kuo and Sach Lakhavani and Miao Liu and Mi Luo and Zhengyi Luo and Brighid Meredith and Austin Miller and Oluwatumininu Oguntola and Xiaqing Pan and Penny Peng and Shraman Pramanick and Merey Ramazanova and Fiona Ryan and Wei Shan and Kiran Somasundaram and Chenan Song and Audrey Southerland and Masatoshi Tateno and Huiyu Wang and Yuchen Wang and Takuma Yagi and Mingfei Yan and Xitong Yang and Zecheng Yu and Shengxin Cindy Zha and Chen Zhao and Ziwei Zhao and Zhifan Zhu and Jeff Zhuo and Pablo Arbeláez and Gedas Bertasius and David Crandall and Dima Damen and Jakob Engel and Giovanni Maria Farinella and Antonino Furnari and Bernard Ghanem and Judy Hoffman and C. V. Jawahar and Richard Newcombe and Hyun Soo Park and James M. Rehg and Yoichi Sato and Manolis Savva and Jianbo Shi and Mike Zheng Shou and Michael Wray},
  title     = {Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives},
  journal   = {International Journal of Computer Vision},
  year      = {2025},
  month     = nov,
  day       = {24},
  volume    = {},
  number    = {},
  pages     = {},
  doi       = {10.1007/s11263-025-02557-6},
  url       = {https://doi.org/10.1007/s11263-025-02557-6},
  issn      = {1573-1405},
  pdf = {https://link.springer.com/content/pdf/10.1007/s11263-025-02557-6.pdf}
}

conference 2025

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

Ivan Rodin , Tz-Ying Wu , Kyle Min , Sharath Nittur Sridhar , Antonino Furnari , Subarna Tripathi , Giovanni Maria Farinella

IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

PDF arXiv

BibTeX Citation


                                      @inproceedings{Rodin2025EASG-Bench,
  year = { 2025 },
  booktitle = { IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) },
  title = { EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs },
  author = { Ivan Rodin and Tz-Ying Wu and Kyle Min and Sharath Nittur Sridhar and Antonino Furnari and Subarna Tripathi and Giovanni Maria Farinella },
  url = { https://arxiv.org/abs/2506.05787 },
  pdf = { https://arxiv.org/pdf/2506.05787.pdf },
}

journal 2024 🏆 EgoVis 2022/2023 distinguished paper award

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Kristen Grauman , Andrew Westbury , Eugene Byrne , Vincent Cartillier , Zachary Chavis , Antonino Furnari , Rohit Girdhar , Jackson Hamburger , Hao Jiang , Devansh Kukreja , Miao Liu , Xingyu Liu , Miguel Martin , Tushar Nagarajan , Ilija Radosavovic , Santhosh Kumar Ramakrishnan , Fiona Ryan , Jayant Sharma , Michael Wray , Mengmeng Xu , Eric Zhongcong Xu , Chen Zhao , Siddhant Bansal , Dhruv Batra , Sean Crane , Tien Do , Morrie Doulaty , Akshay Erapalli , Christoph Feichtenhofer , Adriano Fragomeni , Qichen Fu , Abrham Gebreselasie , Cristina Gonzalez , James Hillis , Xuhua Huang , Yifei Huang , Wenqi Jia , Weslie Khoo , Jachym Kolar , Satwik Kottur , Anurag Kumar , Federico Landini , Chao Li , Yanghao Li , Zhenqiang Li , Karttikeya Mangalam , Raghava Modhugu , Jonathan Munro , Tullie Murrell , Takumi Nishiyasu , Will Price , Paola Ruiz Puentes , Merey Ramazanova , Leda Sari , Kiran Somasundaram , Audrey Southerland , Yusuke Sugano , Ruijie Tao , Minh Vo , Yuchen Wang , Xindi Wu , Takuma Yagi , Ziwei Zhao , Yunyi Zhu , Pablo Arbelaez , David Crandall , Dima Damen , Giovanni Maria Farinella , Christian Fuegen , Bernard Ghanem , Vamsi Krishna Ithapu , C.V. Jawahar , Hanbyul Joo , Kris Kitani , Haizhou Li , Richard Newcombe , Aude Oliva , Hyun Soo Park , James M. Rehg , Yoichi Sato , Jianbo Shi , Mike Zheng Shou , Antonio Torralba , Lorenzo Torresani , Mingfei Yan , Jitendra Malik

IEEE Transactions on Pattern Analysis and Machine Intelligence

PDF

BibTeX Citation


                                      @ARTICLE{Grauman20241,
	author = {Grauman, Kristen and Westbury, Andrew and Byrne, Eugene and Cartillier, Vincent and Chavis, Zachary and Furnari, Antonino and Girdhar, Rohit and Hamburger, Jackson and Jiang, Hao and Kukreja, Devansh and Liu, Miao and Liu, Xingyu and Martin, Miguel and Nagarajan, Tushar and Radosavovic, Ilija and Ramakrishnan, Santhosh Kumar and Ryan, Fiona and Sharma, Jayant and Wray, Michael and Xu, Mengmeng and Xu, Eric Zhongcong and Zhao, Chen and Bansal, Siddhant and Batra, Dhruv and Crane, Sean and Do, Tien and Doulaty, Morrie and Erapalli, Akshay and Feichtenhofer, Christoph and Fragomeni, Adriano and Fu, Qichen and Gebreselasie, Abrham and Gonzalez, Cristina and Hillis, James and Huang, Xuhua and Huang, Yifei and Jia, Wenqi and Khoo, Weslie and Kolar, Jachym and Kottur, Satwik and Kumar, Anurag and Landini, Federico and Li, Chao and Li, Yanghao and Li, Zhenqiang and Mangalam, Karttikeya and Modhugu, Raghava and Munro, Jonathan and Murrell, Tullie and Nishiyasu, Takumi and Price, Will and Puentes, Paola Ruiz and Ramazanova, Merey and Sari, Leda and Somasundaram, Kiran and Southerland, Audrey and Sugano, Yusuke and Tao, Ruijie and Vo, Minh and Wang, Yuchen and Wu, Xindi and Yagi, Takuma and Zhao, Ziwei and Zhu, Yunyi and Arbelaez, Pablo and Crandall, David and Damen, Dima and Farinella, Giovanni Maria and Fuegen, Christian and Ghanem, Bernard and Ithapu, Vamsi Krishna and Jawahar, C.V. and Joo, Hanbyul and Kitani, Kris and Li, Haizhou and Newcombe, Richard and Oliva, Aude and Park, Hyun Soo and Rehg, James M. and Sato, Yoichi and Shi, Jianbo and Shou, Mike Zheng and Torralba, Antonio and Torresani, Lorenzo and Yan, Mingfei and Malik, Jitendra},
	title = {Ego4D: Around the World in 3,000 Hours of Egocentric Video},
	year = {2024},
  pdf = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611736},
	journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
	pages = {1–32},
	doi = {10.1109/TPAMI.2024.3381075},
}

conference 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin , Antonino Furnari , Kyle Min , Subarna Tripathi , Giovanni Maria Farinella

Conference on Computer Vision and Pattern Recognition (CVPR)

PDF

BibTeX Citation


                                      @inproceedings{rodin2023action,
  primaryclass = { cs.CV },
  archiveprefix = { arXiv },
  eprint = { 2312.03391 },
  pdf = {https://arxiv.org/pdf/2312.03391.pdf},
  year = {2024},
  booktitle = {  Conference on Computer Vision and Pattern Recognition (CVPR)  },
  title = {Action Scene Graphs for Long-Form Understanding of Egocentric Videos},
  author = {Ivan Rodin and Antonino Furnari and Kyle Min and Subarna Tripathi and Giovanni Maria Farinella}
}

Looking for the full publications list?

Explore our complete catalog of journal articles, conference papers, patents, and datasets.

View All Publications