This paper presents an innovative solution to the challenge of part obsolescence in microelectronics, focusing on the semantic segmentation of PCB X-ray images using deep learning. Addressing the scarcity of annotated datasets, we developed a novel method to synthesize X-ray images of PCBs, employing virtual images with predefined geometries and inherent labeling to eliminate the need for manual annotation. Our approach involves creating realistic synthetic images that mimic actual X-ray projections, enhanced by incorporating noise profiles derived from real X-ray images. Two deep learning networks, based on the U-Net architecture with a VGG-16 backbone, were trained exclusively on these synthetic datasets to segment PCB junctions and traces. The results demonstrate the effectiveness of this synthetic data-driven approach, with the networks achieving high Jaccard indices on real PCB X-ray images. This study not only offers a scalable and cost-effective alternative for dataset generation in microelectronics but also highlights the potential of synthetic data in training models for complex image analysis tasks, suggesting broad applications in various domains where data scarcity is a concern.
more »
« less
SenseGen: A deep learning architecture for synthetic sensor data generation
Our ability to synthesize sensory data that preserves specific statistical properties of the real data has had tremendous implications on data privacy and big data analytics. The synthetic data can be used as a substitute for selective real data segments,that are sensitive to the user, thus protecting privacy and resulting in improved analytics.However, increasingly adversarial roles taken by data recipients such as mobile apps, or other cloud-based analytics services, mandate that the synthetic data, in addition to preserving statistical properties, should also be difficult to distinguish from the real data. Typically, visual inspection has been used as a test to distinguish between datasets. But more recently, sophisticated classifier models (discriminators), corresponding to a set of events, have also been employed to distinguish between synthesized and real data. The model operates on both datasets and the respective event outputs are compared for consistency. In this paper, we take a step towards generating sensory data that can pass a deep learning based discriminator model test, and make two specific contributions: first, we present a deep learning based architecture for synthesizing sensory data. This architecture comprises of a generator model, which is a stack of multiple Long-Short-Term-Memory (LSTM) networks and a Mixture Density Network. second, we use another LSTM network based discriminator model for distinguishing between the true and the synthesized data. Using a dataset of accelerometer traces, collected using smartphones of users doing their daily activities, we show that the deep learning based discriminator model can only distinguish between the real and synthesized traces with an accuracy in the neighborhood of 50%.
more »
« less
- PAR ID:
- 10038862
- Date Published:
- Journal Name:
- IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)
- Page Range / eLocation ID:
- 188 to 193
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep generative models have enabled the automated synthesis of high-quality data for diverse applications. However, the most effective generative models are specialized to data from a single domain (e.g., images or text). Real-world applications such as healthcare require multi-modal data from multiple domains (e.g., both images and corresponding text), which are difficult to acquire due to limited availability and privacy concerns and are much harder to synthesize. To tackle this joint synthesis challenge, we propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports, all conditional on diagnosis labels. EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder from the image embedding, and 4) a joint discriminator for assessing both the image and the corresponding text. EMIXER also enables self-supervision to leverage vast amount of unlabeled data. Extensive experiments with real X-ray reports data illustrate how data augmentation using synthesized multimodal samples can improve the performance of a variety of supervised tasks including COVID-19 X-ray classification with very limited samples. The quality of generated images and reports are also confirmed by radiologists. We quantitatively show that EMIXER generated synthetic datasets can augment X-ray image classification, report generation models to achieve 5.94% and 6.9% improvement on models trained only on real data samples. Taken together, our results highlight the promise of state of generative models to advance clinical machine learning.more » « less
-
Internet of Things (IoT) devices have been increasingly deployed in smart homes to automatically monitor and control their environments. Unfortunately, extensive recent research has shown that on-path external adversaries can infer and further fingerprint people’s sensitive private information by analyzing IoT network traffic traces. In addition, most recent approaches that aim to defend against these malicious IoT traffic analytics cannot adequately protect user privacy with reasonable traffic overhead. In particular, these approaches often did not consider practical traffic reshaping limitations, user daily routine permitting, and user privacy protection preference in their design. To address these issues, we design a new low-cost, open source user-centric defense system—PrivacyGuard—that enables people to regain the privacy leakage control of their IoT devices while still permitting sophisticated IoT data analytics that is necessary for smart home automation. In essence, our approach employs intelligent deep convolutional generative adversarial network assisted IoT device traffic signature learning, long short-term memory based artificial traffic signature injection, and partial traffic reshaping to obfuscate private information that can be observed in IoT device traffic traces. We evaluate PrivacyGuard using IoT network traffic traces of 31 IoT devices from five smart homes and buildings. We find that PrivacyGuard can effectively prevent a wide range of state-of-the-art adversarial machine learning and deep learning based user in-home activity inference and fingerprinting attacks and help users achieve the balance between their IoT data utility and privacy preserving.more » « less
-
Deep learning models rely heavily on extensive training data, but obtaining sufficient real-world data remains a major challenge in clinical fields. To address this, we explore methods for generating realistic synthetic multivariate fall data to supplement limited real-world samples collected from three fall-related datasets: SmartFallMM, UniMib, and K-Fall. We apply three conventional time-series augmentation techniques, a Diffusion-based generative AI method, and a novel approach that extracts fall segments from public video footage of older adults. A key innovation of our work is the exploration of two distinct approaches: video-based pose estimation to extract fall segments from public footage, and Diffusion models to generate synthetic fall signals. Both methods independently enable the creation of highly realistic and diverse synthetic data tailored to specific sensor placements. To our knowledge, these approaches and especially their application in fall detection represent rarely explored directions in this research area. To assess the quality of the synthetic data, we use quantitative metrics, including the Fréchet Inception Distance (FID), Discriminative Score, Predictive Score, Jensen–Shannon Divergence (JSD), and Kolmogorov–Smirnov (KS) test, and visually inspect temporal patterns for structural realism. We observe that Diffusion-based synthesis produces the most realistic and distributionally aligned fall data. To further evaluate the impact of synthetic data, we train a long short-term memory (LSTM) model offline and test it in real time using the SmartFall App. Incorporating Diffusion-based synthetic data improves the offline F1-score by 7–10% and boosts real-time fall detection performance by 24%, confirming its value in enhancing model robustness and applicability in real-world settings.more » « less
-
In collaborative learning, multiple parties contribute their datasets to jointly deduce global machine learning models for numerous predictive tasks. Despite its efficacy, this learning paradigm fails to encompass critical application domains that involve highly sensitive data, such as healthcare and security analytics, where privacy risks limit entities to individually train models using only their own datasets. In this work, we target privacy-preserving collaborative hierarchical clustering. We introduce a formal security definition that aims to achieve balance between utility and privacy and present a two-party protocol that provably satisfies it. We then extend our protocol with: (i) an optimized version for single-linkage clustering, and (ii) scalable approximation variants. We implement all our schemes and experimentally evaluate their performance and accuracy on synthetic and real datasets, obtaining very encouraging results. For example, end-to-end execution of our secure approximate protocol for over 1M 10-dimensional data samples requires 35sec of computation and achieves 97.09% accuracy.more » « less
An official website of the United States government

