NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences

Jing, L; Zhang, L; Tian, Y (June 2021, The 4th Multimodal Learning and Applications (MULA) Workshop in conjunction with CVPR 2021)
null (Ed.)
The success of supervised learning requires large-scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Speciﬁcally, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on ﬁve different tasks including multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the ﬁve different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method.
more » « less
Full Text Available
Cross-Modal Center Loss for 3D Cross-Modal Retrieval

Jing, L; Vahdani, E; Tan, J; Tian, Y. (June 2021, IEEE Conference on Computer Vision and Pattern Recognition (CVPR))
null (Ed.)
Cross-modal retrieval aims to learn discriminative and modal-invariant features for data from different modalities. Unlike the existing methods which usually learn from the features extracted by ofﬂine networks, in this paper, we pro- pose an approach to jointly train the components of cross- modal retrieval framework with metadata, and enable the network to ﬁnd optimal features. The proposed end-to-end framework is updated with three loss functions: 1) a novel cross-modal center loss to eliminate cross-modal discrepancy, 2) cross-entropy loss to maximize inter-class variations, and 3) mean-square-error loss to reduce modality variations. In particular, our proposed cross-modal center loss minimizes the distances of features from objects belonging to the same class across all modalities. Extensive experiments have been conducted on the retrieval tasks across multi-modalities including 2D image, 3D point cloud and mesh data. The proposed framework significantly outperforms the state-of-the-art methods for both cross-modal and in-domain retrieval for 3D objects on the ModelNet10 and ModelNet40 datasets.
more » « less
Full Text Available
Medical Image Tampering Detection: a New Dataset and Baseline

Reichman, B; Jing, L; Akin, O; Tian, Y. (December 2020, International Workshop on Artificial Intelligence for Healthcare Applications (AIHA), 2020.)
null (Ed.)
The recent advances in algorithmic photo-editing and the vulnerability of hospitals to cyberattacks raises the concern about the tampering of medical images. This paper introduces a new large scale dataset of tampered Computed Tomography (CT) scans generated by different methods, LuNoTim-CT dataset, which can serve as the most comprehensive testbed for comparative studies of data security in healthcare. We further propose a deep learning-based framework, ConnectionNet, to automatically detect if a medical image is tampered. The proposed ConnectionNet is able to handle small tampered regions and achieves promising results and can be used as the baseline for studies of medical image tampering detection.
more » « less
Full Text Available
VideoSSL: Semi-Supervised Learning for Video Classiﬁcation

Jing, L; Parag, T; Wu, Z; Tian, Y; Wang, H. (January 2021, Winter Conference on Applications of Computer Vision (WACV), 2021.)
null (Ed.)
We propose a semi-supervised learning approach for video classiﬁcation, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classiﬁcation methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The ﬁrst signal is the pseudo-labels of unlabeled examples computed from the conﬁdences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classiﬁer CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classiﬁcation CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51, and Kinetics.
more » « less
Full Text Available

Search for: All records