skip to main content


Title: Coarse Graining of Data via Inhomogeneous Diffusion Condensation
Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogemeous diffusion process that effectively condenses data points together to uncover nested groupings at larger and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low pass filters on the data affinity graph that are applied in sequence to gradually eliminate local variability while adjusting the learned data geometry to increasingly coarser resolutions. We provide visualizations to exhibit our method as a “continuously-hierarchical” clustering with directions of eliminated variation highlighted at each step. The utility of our algorithm is demonstrated via neuronal data condensation, where the constructed multiresolution data geometry uncovers the organization, grouping, and connectivity between neurons.  more » « less
Award ID(s):
1845856
NSF-PAR ID:
10165725
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
2019 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
2624 to 2633
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application. To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application's reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover's Distance between reuse signatures of an application. From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover's Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not. 
    more » « less
  2. Abstract

    Automated optical inspection (AOI) is increasingly advocated for in situ quality monitoring of additive manufacturing (AM) processes. The availability of layerwise imaging data improves the information visibility during fabrication processes and is thus conducive to performing online certification. However, few, if any, have investigated the high-speed contact image sensors (CIS) (i.e., originally developed for document scanners and multifunction printers) for AM quality monitoring. In addition, layerwise images show complex patterns and often contain hidden information that cannot be revealed in a single scale. A new and alternative approach will be to analyze these intrinsic patterns with multiscale lenses. Therefore, the objective of this article is to design and develop an AOI system with contact image sensors for multiresolution quality inspection of layerwise builds in additive manufacturing. First, we retrofit the AOI system with contact image sensors in industrially relevant 95 mm/s scanning speed to a laser-powder-bed-fusion (LPBF) machines. Then, we design the experiments to fabricate nine parts under a variety of factor levels (e.g., gas flow blockage, re-coater damage, laser power changes). In each layer, the AOI system collects imaging data of both recoating powder beds before the laser fusion and surface finishes after the laser fusion. Second, layerwise images are pre-preprocessed for alignment, registration, and identification of regions of interests (ROIs) of these nine parts. Then, we leverage the wavelet transformation to analyze ROI images in multiple scales and further extract salient features that are sensitive to process variations, instead of extraneous noises. Third, we perform the paired comparison analysis to investigate how different levels of factors influence the distribution of wavelet features. Finally, these features are shown to be effective in predicting the extent of defects in the computed tomography (CT) data of layerwise AM builds. The proposed framework of multiresolution quality inspection is evaluated and validated using real-world AM imaging data. Experimental results demonstrated the effectiveness of the proposed AOI system with contact image sensors for online quality inspection of layerwise builds in AM processes.

     
    more » « less
  3. We present multiresolution tree-structured networks to process point clouds for 3D shape understanding and generation tasks. Our network represents a 3D shape as a set of locality-preserving 1D ordered list of points at multiple resolutions. This allows efficient feed-forward processing through 1D convolutions, coarse-to-fine analysis through a multi-grid architecture, and it leads to faster convergence and small memory footprint during training. The proposed tree-structured encoders can be used to classify shapes and outperform existing point-based architectures on shape classification benchmarks, while tree-structured decoders can be used for generating point clouds directly and they outperform existing approaches for image-to-shape inference tasks learned using the ShapeNet dataset. Our model also allows unsupervised learning of point-cloud based shapes by using a variational autoencoder, leading to higher-quality generated shapes. 
    more » « less
  4. As network, I/O, accelerator, and NVM devices capable of a million operations per second make their way into data centers, the software stack managing such devices has been shifting from implementations within the operating system kernel to more specialized kernel-bypass approaches. While the in-kernel approach guarantees safety and provides resource multiplexing, it imposes too much overhead on microsecond-scale tasks. Kernel-bypass approaches improve throughput substantially but sacrifice safety and complicate resource management: if applications are mutually distrusting, then either each application must have exclusive access to its own device or else the device itself must implement resource management. This paper shows how to attain both safety and performance via intra-process isolation for data plane libraries. We propose protected libraries as a new OS abstraction which provides separate user-level protection domains for different services (e.g., network and in-memory database), with performance approaching that of unprotected kernel bypass. We also show how this new feature can be utilized to enable sharing of data plane libraries across distrusting applications. Our proposed solution uses Intel's memory protection keys (PKU) in a safe way to change the permissions associated with subsets of a single address space. In addition, it uses hardware watch-points to delay asynchronous event delivery and to guarantee independent failure of applications sharing a protected library. We show that our approach can efficiently protect high-throughput in-memory databases and user-space network stacks. Our implementation allows up to 2.3 million library entrances per second per core, outperforming both kernellevel protection and two alternative implementations that use system calls and Intel's VMFUNC switching of user-level address spaces, respectively. 
    more » « less
  5. As computer-focused policies and trends become more popular in schools, more students access math curriculum online. While computer-based programs may be responsive to some student input, their algorithmic basis can make it more difficult for them to be prepared for divergent student thinking, especially in comparison to a teacher. Consider programs that assess student work by judging how well it matches pre-set answers. Unless designed and enacted in classrooms with care, computer-based curriculum materials might encourage students to think about mathematics in pre-determined ways. How do students approach the process of mathematics while using online materials, especially in terms of engaging in original thought? Drawing on Pickering’s (1995) dance of agency and Sinclair’s (2001) conception of students as path-finders or track-takers, I define two modes of mathematical behavior: trail-taking and bushwhacking. While trail-taking, students follow an established approach, often relying on Pickering’s (1995) disciplinary agency, wherein the mathematics “leads [them] through a series of manipulations” (p. 115). The series of manipulations can be seen as a trail that a student may choose to follow. Bushwhacking, on the other hand, refers to actions a student takes of their own invention. It is possible that, unknown to the student, these actions have been taken before by others. In bushwhacking, the student possesses agency, which Pickering (1995) describes as active (rather than passive) and as hallmarked by “choice and discretion” (p. 117). In this study, students worked in several dynamic geometric environments (DGEs) during a geometry lesson about the midline theorem. The lesson was originally recorded as part of a larger study designing mathematically captivating lessons. Students accessed both problems and online addresses for corresponding DGEs via a printed packet. Students interacted with the DGEs on individual laptops, but were seated in groups of three or four. Passages of group conversations in which students transitioned between trail-taking and bushwhacking were selected for closer analysis, which involved identifying evidence of each mode and highlighting the curricular or social forces that may have contributed to shifts between modes. Of particular interest were episodes in which students asked one another to share results, which led to students reconsidering previously set approaches, and episodes in which students interacted with DGEs containing a relatively high proportion of drag-able components, which corresponded to some students working in bushwhacking mode, spontaneously suggesting and revising approaches for manipulating the DGE (e.g., “unless you make this parallel to the bottom, but I don’t think you... yes you can.”). Both types of episodes were found in multiple groups’ conversations. Further analysis of student interactions with tasks, especially with varying levels of student control and sharing, could serve to inform future computer-based task design aimed to encourage students to productively engage in bushwhacking while problem-solving. 
    more » « less