Data augmentation by incorporating cheap unlabeled data from multiple domains is a powerful way to improve prediction especially when there is limited labeled data. In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data. We demonstrate that for broad classes of distributions and classifiers, there exists a sample complexity gap between standard and robust classification. We quantify the extent to which this gap can be bridged by leveraging unlabeled samples from a shifted domain by providing both upper and lower bounds. Moreover, we show settings where we achieve better adversarial robustness when the unlabeled data come from a shifted domain rather than the same domain as the labeled data. We also investigate how to leverage out-of-domain data when some structural information, such as sparsity, is shared between labeled and unlabeled domains. Experimentally, we augment object recognition datasets (CIFAR-10, CINIC-10, and SVHN) with easy-to-obtain and unlabeled out-of-domain data and demonstrate substantial improvement in the model’s robustness against l_infty adversarial attacks on the original domain.
Unresolved Issues: Prevalence, Persistence, and Perils of Lame Delegations
The modern Internet relies on the Domain Name System (DNS) to convert between human-readable domain names and IP addresses. However, the correct and efficient implementation of this function is jeopardized when the configuration data binding domains, nameservers and glue records is faulty. In particular lame delegations, which occur when a nameserver responsible for a domain is unable to provide authoritative information about it, introduce both performance and security risks. We perform a broad-based measurement study of lame delegations, using both longitudinal zone data and active querying. We show that lame delegations of various kinds are common (affecting roughly 14% of domains we queried), that they can significantly degrade lookup latency (when they do not lead to outright failure), and that they expose hundreds of thousands of domains to adversarial takeover. We also explore circumstances that give rise to this surprising prevalence of lame delegations, including unforeseen interactions between the operational procedures of registrars and registries.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- IMC '20: Proceedings of the ACM Internet Measurement Conference
- Page Range or eLocation-ID:
- 281 to 294
- Sponsoring Org:
- National Science Foundation
More Like this
Abstract We present >500 zircon δ18O and Lu-Hf isotope analyses on previously dated zircons to explore the interplay between spatial and temporal magmatic signals in Zealandia Cordillera. Our data cover ~8500 km2 of middle and lower crust in the Median Batholith (Fiordland segment of Zealandia Cordillera) where Mesozoic arc magmatism along the paleo-Pacific margin of Gondwana was focused along an ~100 km wide, arc-parallel zone. Our data reveal three spatially distinct isotope domains that we term the eastern, central, and western isotope domains. These domains parallel the Mesozoic arc-axis, and their boundaries are defined by major crustal-scale faults that were reactivated as ductile shear zones during the Early Cretaceous. The western isotope domain has homogenous, mantle-like δ 18O (Zrn) values of 5.8 ± 0.3‰ (2 St.dev.) and initial εHf (Zrn) values of +4.2 ± 1.0 (2 St.dev.). The eastern isotope domain is defined by isotopically low and homogenous δ18O (Zrn) values of 3.9 ± 0.2‰ and initial εHf values of +7.8 ± 0.6. The central isotope domain is characterized by transitional isotope values that display a strong E-W gradient with δ18O (Zrn) values rising from 4.6 to 5.9‰ and initial εHf values decreasing from +5.5 to +3.7. We find thatmore »
Ultrasound B-Mode images are created from data obtained from each element in the transducer array in a process called beamforming. The beamforming goal is to enhance signals from specified spatial locations, while reducing signal from all other locations. On clinical systems, beamforming is accomplished with the delay-and-sum (DAS) algorithm. DAS is efficient but fails in patients with high noise levels, so various adaptive beamformers have been proposed. Recently, deep learning methods have been developed for this task. With deep learning methods, beamforming is typically framed as a regression problem, where clean, ground-truth data is known, and usually simulated. For in vivo data, however, it is extremely difficult to collect ground truth information, and deep networks trained on simulated data underperform when applied to in vivo data, due to domain shift between simulated and in vivo data. In this work, we show how to correct for domain shift by learning deep network beamformers that leverage both simulated data, and unlabeled in vivo data, via a novel domain adaption scheme. A challenge in our scenario is that domain shift exists both for noisy input, and clean output. We address this challenge by extending cycle-consistent generative adversarial networks, where we leverage maps betweenmore »
Domain adaptation aims to correct the classifiers when faced with distribution shift between source (training) and target (test) domains. State-of-the-art domain adaptation methods make use of deep networks to extract domain-invariant representations. However, existing methods assume that all the instances in the source domain are correctly labeled; while in reality, it is unsurprising that we may obtain a source domain with noisy labels. In this paper, we are the first to comprehensively investigate how label noise could adversely affect existing domain adaptation methods in various scenarios. Further, we theoretically prove that there exists a method that can essentially reduce the side-effect of noisy source labels in domain adaptation. Specifically, focusing on the generalized target shift scenario, where both label distribution 𝑃𝑌 and the class-conditional distribution 𝑃𝑋|𝑌 can change, we discover that the denoising Conditional Invariant Component (DCIC) framework can provably ensures (1) extracting invariant representations given examples with noisy labels in the source domain and unlabeled examples in the target domain and (2) estimating the label distribution in the target domain with no bias. Experimental results on both synthetic and real-world data verify the effectiveness of the proposed method.
Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor performance of prediction under covariate shift. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target Y, and exploits the density ratio of this representation for importance reweighting. We discuss the factors that affect the performance of our method, and demonstrate its capabilities on both pseudo-real data and real-world applications.