Search for: All records

Award ID contains: 2013451

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CGBA: Curvature-aware Geometric Black-box Attack

Reza, Md Farhamdur; Rahmati, Ali; Wu, Tianfu; Dai, Huaiyu (October 2023, International Conference on Computer Vision (ICCV))

Decision-based black-box attacks often necessitate a large number of queries to craft an adversarial example. Moreover, decision-based attacks based on querying boundary points in the estimated normal vector direction often suffer from inefficiency and convergence issues. In this paper, we propose a novel query-efficient \b curvature-aware \b geometric decision-based \b black-box \b attack (CGBA) that conducts boundary search along a semicircular path on a restricted 2D plane to ensure finding a boundary point successfully irrespective of the boundary curvature. While the proposed CGBA attack can work effectively for an arbitrary decision boundary, it is particularly efficient in exploiting the low curvature to craft high-quality adversarial examples, which is widely seen and experimentally verified in commonly used classifiers under non-targeted attacks. In contrast, the decision boundaries often exhibit higher curvature under targeted attacks. Thus, we develop a new query-efficient variant, CGBA-H, that is adapted for the targeted attack. In addition, we further design an algorithm to obtain a better initial boundary point at the expense of some extra queries, which considerably enhances the performance of the targeted attack. Extensive experiments are conducted to evaluate the performance of our proposed methods against some well-known classifiers on the ImageNet and CIFAR10 datasets, demonstrating the superiority of CGBA and CGBA-H over state-of-the-art non-targeted and targeted attacks, respectively.
more » « less
Full Text Available
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

Liu, Xianpeng; Zheng, Ce; Cheng, Kelvin; Xue, Nan; Qi, Goo-Jun; Wu, Tianfu (October 2023, The Computer Vision Foundation (CVF))

The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a topdown manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model [20] to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.
more » « less
Full Text Available
Learning data science methods through a mobile device and full body motion data

https://doi.org/10.1177/21695067231200871

Jung, SeHee; Wang, Hanwen; Su, Bingyi; Lu, Lu; Qing, Liwei; Xu, Xu (September 2023, Proceedings of the Human Factors and Ergonomics Society Annual Meeting)

This study presents a mobile app that facilitates undergraduate students to learn data science through their own full body motions. Leveraging the built-in camera of a mobile device, the proposed app captures the user and feeds their images into an open-source computer-vision algorithm that localizes the key joint points of human body. As students can participate in the entire data collection process, the obtained motion data is context-rich and personally relevant to them. The app utilizes the collected motion data to explain various concepts and methods in data science under the context of human movements. The app also visualizes the geometric interpretation of data through various visual aids, such as interactive graphs and figures. In this study, we use principal component analysis, a commonly used dimensionality reduction method, as an example to demonstrate the proposed learning framework. Strategies to encompass other learning modules are also discussed for further improvement.
more » « less
Full Text Available
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers

https://doi.org/10.1109/CVPR52729.2023.01781

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Cuntoor, Naresh; Lee, Mun Wai; Wu, Tianfu (June 2023, IEEE)

Vision Transformers (ViTs) are built on the assumption of treating image patches as “visual tokens” and learn patch-to-patch attention. The patch embedding based tokenizer has a semantic gap with respect to its counterpart, the textual tokenizer. The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it non-trivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. Queries in our PaCa-ViT starts with patches, while keys and values are directly based on clustering (with a predefined small number of clusters). The clusters are learned end-to-end, leading to better tokenizers and inducing joint clustering-for-attention and attention-for-clustering for better and interpretable models. The quadratic complexity is relaxed to linear complexity. The proposed PaCa module is used in designing efficient and interpretable ViT backbones and semantic segmentation head networks. In experiments, the proposed methods are tested on ImageNet-1k image classification, MS-COCO object detection and instance segmentation and MIT-ADE20k semantic segmentation. Compared with the prior art, it obtains better performance in all the three benchmarks than the SWin [32] and the PVTs [47], [48] by significant margins in ImageNet-1k and MIT-ADE20k. It is also significantly more efficient than PVT models in MS-COCO and MIT-ADE20k due to the linear complexity. The learned clusters are semantically meaningful. Code and model checkpoints are available at https:/github.com/iVMCL/PaCaViT.
more » « less
Full Text Available
Level-S ² fM: Structure from Motion on Neural Level Set of Implicit Surfaces

https://doi.org/10.1109/CVPR52729.2023.01650

Xiao, Yuxi; Xue, Nan; Wu, Tianfu; Xia, Gui-Song (June 2023, IEEE)

This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM, which estimates the camera poses and scene geometry from a set of uncalibrated images by learning coordinate MLPs for the implicit surfaces and the radiance fields from the established key-point correspondences. Our novel formulation poses some new challenges due to inevitable two-view and few-view configurations in the incremental SfM pipeline, which complicates the optimization of coordinate MLPs for volumetric neural rendering with unknown camera poses. Nevertheless, we demonstrate that the strong inductive basis conveying in the 2D correspondences is promising to tackle those challenges by exploiting the relationship between the ray sampling schemes. Based on this, we revisit the pipeline of incremental SfM and renew the key components, including two-view geometry initialization, the camera poses registration, the 3D points triangulation, and Bundle Adjustment, with a fresh perspective based on neural implicit surfaces. By unifying the scene geometry in small MLP networks through coordinate MLPs, our Level-S2fM treats the zero-level set of the implicit surface as an informative top-down regularization to manage the reconstructed 3D points, reject the outliers in correspondences via querying SDF, and refine the estimated geometries by NBA (Neural BA). Not only does our Level-S2fM lead to promising results on camera pose estimation and scene geometry reconstruction, but it also shows a promising way for neural implicit rendering without knowing camera extrinsic beforehand.
more » « less
Full Text Available
Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

https://doi.org/10.1109/TPAMI.2023.3312749

Xue, Nan; Wu, Tianfu; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Zhang, Liangpei; Torr, Philip H.S. (January 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence)

This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT) field representation that encodes line segments using a closed-form 4D geometric vector field. The proposed HAWP consists of three sequential components empowered by end-to-end and HAT-driven designs: (1) generating a dense set of line segments from HAT fields and endpoint proposals from heatmaps, (2) binding the dense line segments to sparse endpoint proposals to produce initial wireframes, and (3) filtering false positive proposals through a novel endpoint-decoupled line-of-interest aligning (EPD LOIAlign) module that captures the co-occurrence between endpoint proposals and HAT fields for better verification. Thanks to our novel designs, HAWPv2 shows strong performance in fully supervised learning, while HAWPv3 excels in self-supervised learning, achieving superior repeatability scores and efficient training (24 GPU hours on a single GPU). Furthermore, HAWPv3 exhibits a promising potential for wireframe parsing in out-of-distribution images without providing ground truth labels of wireframes.
more » « less
Full Text Available
NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D Reconstruction

https://doi.org/10.1109/TPAMI.2023.3314745

Tan, Bin; Xue, Nan; Wu, Tianfu; Xia, Gui-Song (January 2023, IEEE Transactions on Pattern Analysis and Machine Intelligence)

This paper studies the challenging two-view 3D reconstruction problem in a rigorous sparse-view configuration, which is suffering from insufficient correspondences in the input image pairs for camera pose estimation. We present a novel Neural One-PlanE RANSAC framework (termed NOPE-SAC in short) that exerts excellent capability of neural networks to learn one-plane pose hypotheses from 3D plane correspondences. Building on the top of a Siamese network for plane detection, our NOPE-SAC first generates putative plane correspondences with a coarse initial pose. It then feeds the learned 3D plane correspondences into shared MLPs to estimate the one-plane camera pose hypotheses, which are subsequently reweighed in a RANSAC manner to obtain the final camera pose. Because the neural one-plane pose minimizes the number of plane correspondences for adaptive pose hypotheses generation, it enables stable pose voting and reliable pose refinement with a few of plane correspondences for the sparse-view inputs. In the experiments, we demonstrate that our NOPE-SAC significantly improves the camera pose estimation for the two-view inputs with severe viewpoint changes, setting several new state-of-the-art performances on two challenging benchmarks, i.e., MatterPort3D and ScanNet, for sparse-view 3D reconstruction. The source code is released at https://github.com/IceTTTb/NopeSAC for reproducible research.
more » « less
Full Text Available
Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching

Cheng, Kelvin; Wu, Tianfu; Healey, Christopher G. (December 2022, Advances in Neural Information Processing Systems)

The integration of DNN-contextualized binary-pattern-driven non-parametric cost volume and DNN cost aggregation leads to more robust and more generalizable stereo matching. Abstract: Stereo matching is a classic challenging problem in computer vision, which has recently witnessed remarkable progress by Deep Neural Networks (DNNs). This paradigm shift leads to two interesting and entangled questions that have not been addressed well. First, it is unclear whether stereo matching DNNs that are trained from scratch really learn to perform matching well. This paper studies this problem from the lens of white-box adversarial attacks. It presents a method of learning stereo-constrained photometrically-consistent attacks, which by design are weaker adversarial attacks, and yet can cause catastrophic performance drop for those DNNs. This observation suggests that they may not actually learn to perform matching well in the sense that they should otherwise achieve potentially even better after stereo-constrained perturbations are introduced. Second, stereo matching DNNs are typically trained under the simulation-to-real (Sim2Real) pipeline due to the data hungriness of DNNs. Thus, alleviating the impacts of the Sim2Real photometric gap in stereo matching DNNs becomes a pressing need. Towards joint adversarially robust and domain generalizable stereo matching, this paper proposes to learn DNN-contextualized binary-pattern-driven non-parametric cost-volumes. It leverages the perspective of learning the cost aggregation via DNNs, and presents a simple yet expressive design that is fully end-to-end trainable, without resorting to specific aggregation inductive biases. In experiments, the proposed method is tested in the SceneFlow dataset, the KITTI2015 dataset, and the Middlebury dataset. It significantly improves the adversarial robustness, while retaining accuracy performance comparable to state-of-the-art methods. It also shows a better Sim2Real generalizability. Our code and pretrained models are released at \href{https://github.com/kelkelcheng/AdversariallyRobustStereo}{this Github Repo}.
more » « less
Full Text Available
A mobile platform-based app to assist undergraduate learning of human kinematics in biomechanics courses

https://doi.org/10.1016/j.jbiomech.2022.111243

Wang, Hanwen; Xie, Ziyang; Lu, Lu; Su, Bingyi; Jung, Sehee; Xu, Xu (September 2022, Journal of Biomechanics)

Full Text Available
A mobile platform app to assist learning human kinematics in undergraduate biomechanics courses

https://doi.org/10.1177/1071181322661058

Wang, Hanwen; Lu, Lu; Xie Bingyi Su, Ziyang; Edward P., Xu Xu (September 2022, Proceedings of the Human Factors and Ergonomics Society Annual Meeting)

Biomechanics examines different physical characteristics of the human body movement by applying principles of Newtonian mechanics to physical activities. Therefore, undergraduate biomechanics courses are highly demanding in mathematics and physics. While the inclusion of laboratory experiences can augment student comprehension of biomechanics concepts, the cost and the required expertise associated with motion tracking systems can be a burden of offering laboratory sessions. In this study, we developed a mobile platform app to facilitate learning human kinematics in biomechanics courses. An optimized computer-vision model that is based on convolutional pose machine (CPM), MobileNet V2 and TensorFlow Lite frameworks is adopted to reconstruct human pose first. A real-time human kinematics analysis then allows students to conduct human motion experiments. The proposed app can serve as a potential instructional tool in biomechanics courses.
more » « less
Full Text Available

« Prev Next »