<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Proceeding</dc:product_type><dc:title>Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos</dc:title><dc:creator>Majumder, S; Nagarajan, T; Al-Halah, Z; Pradhan, R; Grauman, K</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Given a multi-view video, which viewpoint is most informative for a human observer? Existing methods rely on heuristics or expensive "best-view" supervision to answer this question, limiting their applicability. We propose a weakly supervised approach that leverages language accompanying an instructional multi-view video as a means to recover its most informative viewpoint(s). Our key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is. To put this into action, we propose LangView, a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels. Then, those pseudo-labels are used to train a view selector, together with an auxiliary camera pose predictor that enhances view-sensitivity. During inference, our model takes as input only a multi-view video--no language or camera poses--and returns the best viewpoint to watch at each timestep. On two challenging datasets comprised of diverse multi-camera setups and how-to activities, our model consistently outperforms state-of-the-art baselines, both with quantitative metrics and human evaluation.</dc:description><dc:publisher>https://doi.org/10.48550/arXiv.2411.08753</dc:publisher><dc:date>2025-04-10</dc:date><dc:nsf_par_id>10631525</dc:nsf_par_id><dc:journal_name/><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation/><dc:issn>2411.08753</dc:issn><dc:isbn/><dc:doi>https://doi.org/</dc:doi><dcq:identifierAwardId>2505865</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>