Back to Public Engagement
Towards Always-On Wearable AI That Perceives, Understands, and Assists
๐
June 3, 2026 ยท 17:00โ17:30 ๐ Room 108 ๐ค Antonino Furnari
This presentation outlines a framework for Collaborative AI, focusing on three pillars: sensing the world under physical constraints (Cognitive Economy), grounding perception into external structures (Knowledge Grounding), and providing empowering user feedback (Optimal Intervention).
โก
Cognitive Economy
Sensing under energy & compute constraints
๐
Knowledge Grounding
Grounding perception in external structures
๐ฏ
Optimal Intervention
Empowering feedback at the right moment
Featured Research
Works presented during the talk. Click on a paper to explore the full research page.
๐ฅฝ โก Cognitive Economy
Ego-METAS Multimodal Energy-Efficient Temporal Action Segmentation
Focus: Multimodal efficient sensing and energy-aware policies for wearable devices.
Overview: This work introduces a benchmark addressing the "Infinite Scaling Assumption" by evaluating how egocentric online models perform under strict energy budgets (e.g., 20mW).
Key Finding: Exhaustive processing heavily drains batteries, whereas greedy and learned policies for framerate reduction provide a more sustainable approach to continuous sensing.
๐ง โก Cognitive Economy
EgoStream Streaming Episodic Memory in Egocentric Vision
Focus: A diagnostic benchmark for assessing episodic memory in AI across various temporal recall regimes, from instant (0s) to ultra-long (>8h).
Overview: We explore memory management strategies like pruning, merging, and offloading within a fixed memory budget to maintain continuous video streams.
Key Finding: Pruning-based memory management generally outperforms merging and offloading, though offloading recovers performance in ultra-long recall scenarios.
๐ ๐ Knowledge Grounding
๐ CVPR 2026 Highlight ยท Top 14%
ViterbiPlanNet Injecting Procedural Knowledge for Video Planning
Focus: Using procedural graphs as cognitive support to inject knowledge into the learning process via a Differentiable Viterbi Layer.
Overview: This method addresses cognitive offloading by allowing base models to rely on structured graphs rather than memorizing full procedures.
Key Finding: Achieves state-of-the-art success rates with an order of magnitude fewer parameters (e.g., 5.58M) and demonstrates strong sample efficiency and cross-horizon generalization.
๐ ๏ธ ๐ Knowledge Grounding
RECIPE Procedural Planning via Grounding in Instructional Video
Focus: A dual-input configuration utilizing specialized procedural models to plan tasks from visual and textual histories.
Overview: The framework explores grounding as verification, utilizing noisy video corpora to build resilience against weak supervision and procedural diversity.
Key Finding: Meaningfully improves macro accuracy over supervised fine-tuning, especially in zero-shot environments.