skip to main content

Title: LDA v. LSA: A Comparison of Two Computational Text Analysis Tools for the Functional Categorization of Patents
One means to support for design-by-analogy (DbA) in practice involves giving designers efficient access to source analogies as inspiration to solve problems. The patent database has been used for many DbA support efforts, as it is a preexisting repository of catalogued technology. Latent Semantic Analysis (LSA) has been shown to be an effective computational text processing method for extracting meaningful similarities between patents for useful functional exploration during DbA. However, this has only been shown to be useful at a small-scale (100 patents). Considering the vastness of the patent database and realistic exploration at a large scale, it is important to consider how these computational analyses change with orders of magnitude more data. We present analysis of 1,000 random mechanical patents, comparing the ability of LSA to Latent Dirichlet Allocation (LDA) to categorize patents into meaningful groups. Resulting implications for large(r) scale data mining of patents for DbA support are detailed.
Authors:
; ; ; ;
Award ID(s):
1663204
Publication Date:
NSF-PAR ID:
10055536
Journal Name:
International Conference on Case-Based Reasoning
Sponsoring Org:
National Science Foundation
More Like this
  1. Design-by-analogy (DbA) is an important method for innovation that has gained much attention due to its history of leading to successful and novel design solutions. The method uses a repository of existing design solutions where designers can recognize and retrieve analogical inspirations. Yet, exploring for analogical inspiration has been a laborious task for designers. This work presents a computational methodology that is driven by a topic modeling technique called non-negative matrix factorization (NMF). NMF is widely used in the text mining field for its ability to discover topics within documents based on their semantic content. In the proposed methodology, NMF is performed iteratively to build hierarchical repositories of design solutions, with which designers can explore clusters of analogical stimuli. This methodology has been applied to a repository of mechanical design-related patents, processed to contain only component-, behavior-, or material-based content to test if unique and valuable attribute-based analogical inspiration can be discovered from the different representations of patent data. The hierarchical repositories have been visualized, and a case study has been conducted to test the effectiveness of the analogical retrieval process of the proposed methodology. Overall, this paper demonstrates that the exploration-based computational methodology may provide designers an enhanced controlmore »over design repositories to retrieve analogical inspiration for DbA practice.« less
  2. This paper presents an explorative-based computational methodology to aid the analogical retrieval process in design-by-analogy practice. The computational methodology, driven by Nonnegative Matrix Factorization (NMF), iteratively builds a hierarchical repositories of design solutions within which clusters of design analogies can be explored by designers. In the work, the methodology has been applied on a large repository of mechanical design related patents, processed to contain only component-, behavior-, or material-based content, to demonstrate that unique and valuable attribute-based analogical inspiration can be discovered from different representations of patent data. For explorative purposes, the hierarchical repositories have been visualized with a three-dimensional hierarchical structure and two-dimensional bar graph structure, which can be used interchangeably for retrieving analogies. This paper demonstrates that the explorative-based computational methodology provides designers an enhanced control over design repositories, empowering them to retrieve analogical inspiration for design-by-analogy practice.
  3. Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scalemore »benchmarks.

    « less
  4. Drawing, as a skill, is closely tied to many creative fields and it is a unique practice for every individual. Drawing has been shown to improve cognitive and communicative abilities, such as visual communication, problem-solving skills, students’ academic achievement, awareness of and attention to surrounding details, and sharpened analytical skills. Drawing also stimulates both sides of the brain and improves peripheral skills of writing, 3-D spatial recognition, critical thinking, and brainstorming. People are often exposed to drawing as children, drawing their families, their houses, animals, and, most notably, their imaginative ideas. These skills develop over time naturally to some extent, however, while the base concept of drawing is a basic skill, the mastery of this skill requires extensive practice and it can often be significantly impacted by the self-efficacy of an individual. Sketchtivity is an AI tool developed by Texas A&M University to facilitate the growth of drawing skills and track their performance. Sketching skill development depends in part on students’ self-efficacy associated with their drawing abilities. Gauging the drawing self-efficacy of individuals is critical in understanding the impact that this drawing practice has had with this new novel instrument, especially in contrast to traditional practicing methods. It may alsomore »be very useful for other researchers, educators, and technologists. This study reports the development and initial validation of a new 13-item measure that assesses perceived drawing self efficacy. The13 items to measure drawing self efficacy were developed based on Bandura’s guide for constructing Self-Efficacy Scales. The participants in the study consisted of 222 high school students from engineering, art, and pre-calculus classes. Internal consistency of the 13 observed items were found to be very high (Cronbach alpha: 0.943), indicating a high reliability of the scale. Exploratory Factor Analysis was performed to further investigate the variance among the 13 observed items, to find the underlying latent factors that influenced the observed items, and to see if the items needed revision. We found that a three model was the best fit for our data, given fit statistics and model interpretability. The factors are: Factor 1: Self-efficacy with respect to drawing specific objects; Factor 2: Self-efficacy with respect to drawing practically to solve problems, communicating with others, and brainstorming ideas; Factor 3: Self-efficacy with respect to drawing to create, express ideas, and use one’s imagination. An alternative four-factor model is also discussed. The purpose of our study is to inform interventions that increase self-efficacy. We believe that this assessment will be valuable especially for education researchers who implement AI-based tools to measure drawing skills.This initial validity study shows promising results for a new measure of drawing self-efficacy. Further validation with new populations and drawing classes is needed to support its use, and further psychometric testing of item-level performance. In the future, this self-efficacy assessment could be used by teachers and researchers to guide instructional interventions meant to increase drawing self-efficacy.« less
  5. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The goal of this work was to design a low-cost computing facility that can support the development of an open source digital pathology corpus containing 1M images [1]. A single image from a clinical-grade digital pathology scanner can range in size from hundreds of megabytes to five gigabytes. A 1M image database requires over a petabyte (PB) of disk space. To do meaningful work in this problem space requires a significant allocation of computing resources. The improvements and expansions to our HPC (highperformance computing) cluster, known as Neuronix [2], required to support working with digital pathology fall into two broad categories: computation and storage. To handle the increased computational burden and increase job throughput, we are using Slurm [3] as our scheduler and resource manager. For storage, we have designed and implemented a multi-layer filesystem architecture to distribute a filesystem across multiple machines. These enhancements, which are entirely based on open source software, have extended the capabilities of our cluster and increased its cost-effectiveness. Slurm has numerous features that allow it to generalize to a number of different scenarios. Among the most notable is its support for GPU (graphics processing unit) scheduling. GPUs can offer a tremendous performance increase inmore »machine learning applications [4] and Slurm’s built-in mechanisms for handling them was a key factor in making this choice. Slurm has a general resource (GRES) mechanism that can be used to configure and enable support for resources beyond the ones provided by the traditional HPC scheduler (e.g. memory, wall-clock time), and GPUs are among the GRES types that can be supported by Slurm [5]. In addition to being able to track resources, Slurm does strict enforcement of resource allocation. This becomes very important as the computational demands of the jobs increase, so that they have all the resources they need, and that they don’t take resources from other jobs. It is a common practice among GPU-enabled frameworks to query the CUDA runtime library/drivers and iterate over the list of GPUs, attempting to establish a context on all of them. Slurm is able to affect the hardware discovery process of these jobs, which enables a number of these jobs to run alongside each other, even if the GPUs are in exclusive-process mode. To store large quantities of digital pathology slides, we developed a robust, extensible distributed storage solution. We utilized a number of open source tools to create a single filesystem, which can be mounted by any machine on the network. At the lowest layer of abstraction are the hard drives, which were split into 4 60-disk chassis, using 8TB drives. To support these disks, we have two server units, each equipped with Intel Xeon CPUs and 128GB of RAM. At the filesystem level, we have implemented a multi-layer solution that: (1) connects the disks together into a single filesystem/mountpoint using the ZFS (Zettabyte File System) [6], and (2) connects filesystems on multiple machines together to form a single mountpoint using Gluster [7]. ZFS, initially developed by Sun Microsystems, provides disk-level awareness and a filesystem which takes advantage of that awareness to provide fault tolerance. At the filesystem level, ZFS protects against data corruption and the infamous RAID write-hole bug by implementing a journaling scheme (the ZFS intent log, or ZIL) and copy-on-write functionality. Each machine (1 controller + 2 disk chassis) has its own separate ZFS filesystem. Gluster, essentially a meta-filesystem, takes each of these, and provides the means to connect them together over the network and using distributed (similar to RAID 0 but without striping individual files), and mirrored (similar to RAID 1) configurations [8]. By implementing these improvements, it has been possible to expand the storage and computational power of the Neuronix cluster arbitrarily to support the most computationally-intensive endeavors by scaling horizontally. We have greatly improved the scalability of the cluster while maintaining its excellent price/performance ratio [1].« less