skip to main content


Search for: All records

Creators/Authors contains: "Setlur, Srirangaraj"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. These disparities have been explored in existing biometrics systems such as facial biometrics (identifying individuals based on facial attributes). However, such ethical issues remain largely unexplored in voice biometric systems that are very popular and extensively used globally. Using a corpus of non-speech voice records featuring a diverse group of 300 speakers by race (75 each from White, Black, Asian, and Latinx subgroups) and gender (150 each from female and male subgroups), we explore and reveal that racial subgroup has a similar voice characteristic and gender subgroup has a significant different voice characteristic. Moreover, non-negligible racial and gender disparities exist in speaker identification accuracy by analyzing the performance of one commercial product and five research products. The average accuracy for Latinxs can be 12% lower than Whites (p < 0.05, 95% CI 1.58%, 14.15%) and can be significantly higher for female speakers than males (3.67% higher, p < 0.05, 95% CI 1.23%, 11.57%). We further discover that racial disparities primarily result from the neural network-based feature extraction within the voice biometric product and gender disparities primarily due to both voice inherent characteristic difference and neural network-based feature extraction. Finally, we point out strategies (e.g., feature extraction optimization) to incorporate fairness and inclusive consideration in biometrics technology. 
    more » « less
  2. In this paper we present AMPNet, an acoustic abnormality detection model deployed at ACV Auctions to automatically identify engine faults of vehicles listed on the ACV Auctions platform. We investigate the problem of engine fault detection and discuss our approach of deep-learning based audio classification on a large-scale automobile dataset collected at ACV Auctions. Specifically, we discuss our data collection pipeline and its challenges, dataset preprocessing and training procedures, and deployment of our trained models into a production setting. We perform empirical evaluations of AMPNet and demonstrate that our framework is able to successfully capture various engine anomalies agnostic of vehicle type. Finally we demonstrate the effectiveness and impact of AMPNet in the real world, specifically showing a 20.85% reduction in vehicle arbitrations on ACV Auctions' live auction platform. 
    more » « less
  3. Traditional fingerprint authentication requires the acquisition of data through touch-based specialized sensors. However, due to many hygienic concerns including the global spread of the COVID virus through contact with a surface has led to an increased interest in contactless fingerprint image acquisition methods. Matching fingerprints acquired using contactless imaging against contact-based images brings up the problem of performing cross modal fingerprint matching for identity verification. In this paper, we propose a cost-effective, highly accurate and secure end-to-end contactless fingerprint recognition solution. The proposed framework first segments the finger region from an image scan of the hand using a mobile phone camera. For this purpose, we developed a cross-platform mobile application for fingerprint enrollment, verification, and authentication keeping security, robustness, and accessibility in mind. The segmented finger images go through fingerprint enhancement to highlight discriminative ridge-based features. A novel deep convolutional network is proposed to learn a representation from the enhanced images based on the optimization of various losses. The proposed algorithms for each stage are evaluated on multiple publicly available contactless databases. Our matching accuracy and the associated security employed in the system establishes the strength of the proposed solution framework. 
    more » « less
  4. null (Ed.)
    Representational Learning in the form of high dimensional embeddings have been used for multiple pattern recognition applications. There has been a significant interest in building embedding based systems for learning representations in the mathematical domain. At the same time, retrieval of structured information such as mathematical expressions is an important need for modern IR systems. In this work, our motivation is to introduce a robust framework for learning representations for similarity based retrieval of mathematical expressions. Given a query by example, the embedding can find the closest matching expression as a function of euclidean distance between them. We leverage recent advancements in image-based and graph-based deep learning algorithms to learn our similarity embeddings. We do this first, by using unimodal encoders in graph space and image space and then, a multi-modal combination of the same. To overcome the lack of training data, we force the networks to learn a deep metric using triplets generated with a heuristic scoring function. We also adopt a custom strategy for mining hard samples to train our neural networks. Our system produces rankings similar to those generated by the original scoring function, but using only a fraction of the time. Our results establish the viability of using such a multi-modal embedding for this task. 
    more » « less
  5. Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases. 
    more » « less
  6. null (Ed.)
    Lecture videos are rapidly becoming an invaluable source of information for students across the globe. Given the large number of online courses currently available, it is important to condense the information within these videos into a compact yet representative summary that can be used for search-based applications. We propose a framework to summarize whiteboard lecture videos by finding feature representations of detected handwritten content regions to determine unique content. We investigate multi-scale histogram of gradients and embeddings from deep metric learning for feature representation. We explicitly handle occluded, growing and disappearing handwritten content. Our method is capable of producing two kinds of lecture video summaries - the unique regions themselves or so-called key content and keyframes (which contain all unique content in a video segment). We use weighted spatio-temporal conflict minimization to segment the lecture and produce keyframes from detected regions and features. We evaluate both types of summaries and find that we obtain state-of-the-art peformance in terms of number of summary keyframes while our unique content recall and precision are comparable to state-of-the-art. 
    more » « less
  7. Del Bimbo, Alberto ; Cucchiara, Rita ; Sclaroff, Stan ; Farinella, Giovanni M ; Mei, Tao ; Bertini, Marco ; Escalante, Hugo J ; Vezzani, Roberto. (Ed.)
    The volume of online lecture videos is growing at a frenetic pace. This has led to an increased focus on methods for automated lecture video analysis to make these resources more accessible. These methods consider multiple information channels including the actions of the lecture speaker. In this work, we analyze two methods that use spatio-temporal features of the speaker skeleton for action classification in lecture videos. The first method is the AM Pose model which is based on Random Forests with motion-based features. The second is a state-of-the-art action classifier based on a two-stream adaptive graph convolutional network (2S-AGCN) that uses features of both joints and bones of the speaker skeleton. Each video is divided into fixed-length temporal segments. Then, the speaker skeleton is estimated on every frame in order to build a representation for each segment for further classification. Our experiments used the AccessMath dataset and a novel extension which will be publicly released. We compared four state-of-the-art pose estimators: OpenPose, Deep High Resolution, AlphaPose and Detectron2. We found that AlphaPose is the most robust to the encoding noise found in online videos. We also observed that 2S-AGCN outperforms the AM Pose model by using the right domain adaptations. 
    more » « less
  8. null (Ed.)
  9. Del Bimbo, Alberto ; Cucchiara, Rita ; Sclaroff, Stan ; Farinella, Giovanni M ; Mei, Tao ; Bertini, Marc ; Escalante, Hugo J ; Vezzani, Roberto. (Ed.)
    This work summarizes the results of the second Competition on Harvesting Raw Tables from Infographics (ICPR 2020 CHART-Infographics). Chart Recognition is difficult and multifaceted, so for this competition we divide the process into the following tasks: Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided two sets of datasets for training and evaluation of the participant submissions. The first set is based on synthetic charts (Adobe Synth) generated from real data sources using matplotlib. The second one is based on manually annotated charts extracted from the Open Access section of the PubMed Central (UB PMC). More than 25 teams registered out of which 7 submitted results for different tasks of the competition. While results on synthetic data are near perfect at times, the same models still have room to improve when it comes to data extraction from real charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use. 
    more » « less
  10. null (Ed.)
    Charts are useful communication tools for the presentation of data in a visually appealing format that facilitates comprehension. There have been many studies dedicated to chart mining, which refers to the process of automatic detection, extraction and analysis of charts to reproduce the tabular data that was originally used to create them. By allowing access to data which might not be available in other formats, chart mining facilitates the creation of many downstream applications. This paper presents a comprehensive survey of approaches across all components of the automated chart mining pipeline such as (i) automated extraction of charts from documents; (ii) processing of multi-panel charts; (iii) automatic image classifiers to collect chart images at scale; (iv) automated extraction of data from each chart image, for popular chart types as well as selected specialized classes; (v) applications of chart mining; and (vi) datasets for training and evaluation, and the methods that were used to build them. Finally, we summarize the main trends found in the literature and provide pointers to areas for further research in chart mining. 
    more » « less