skip to main content

Search for: All records

Creators/Authors contains: "Roy, Nirmalya"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Language-guided smart systems can help to design next-generation human-machine interactive applications. The dense text description is one of the research areas where systems learn the semantic knowledge and visual features of each video frame and map them to describe the video's most relevant subjects and events. In this paper, we consider untrimmed sports videos as our case study. Generating dense descriptions in the sports domain to supplement journalistic works without relying on commentators and experts requires more investigation. Motivated by this, we propose an end-to-end automated text-generator, SpecTextor, that learns the semantic features from untrimmed videos of sports games and generates associated descriptive texts. The proposed approach considers the video as a sequence of frames and sequentially generates words. After splitting videos into frames, we use a pre-trained VGG-16 model for feature extraction and encoding the video frames. With these encoded frames, we posit a Long Short-Term Memory (LSTM) based attention-decoder pipeline that leverages soft-attention mechanism to map the semantic features with relevant textual descriptions to generate the explanation of the game. Because developing a comprehensive description of the game warrants training on a set of dense time-stamped captions, we leverage two available public datasets: ActivityNet Captions and Microsoft Videomore »Description. In addition, we utilized two different decoding algorithms: beam search and greedy search and computed two evaluation metrics: BLEU and METEOR scores.« less
    Free, publicly-accessible full text available June 1, 2023
  2. Free, publicly-accessible full text available March 1, 2023
  3. Stay at home order during the COVID-19 helps flatten the curve but ironically, instigate mental health problems among the people who have Substance Use Disorders. Measuring the electrical activity signals in brain using off-the-shelf consumer wearable devices such as smart wristwatch and mapping them in real time to underlying mood, behavioral and emotional changes play striking roles in postulating mental health anomalies. In this work, we propose to implement a wearable, On-device Mental Anomaly Detection (OMAD) system to detect anomalous behaviors and activities that render to mental health problems and help clinicians to design effective intervention strategies. We propose an intrinsic artifact removal model on Electroencephalogram (EEG) signal to better correlate the fine-grained behavioral changes. We design model compression technique on the artifact removal and activity recognition (main) modules. We implement a magnitude-based weight pruning technique both on convolutional neural network and Multilayer Perceptron to employ the inference phase on Nvidia Jetson Nano; one of the tightest resource-constrained devices for wearables. We experimented with three different combinations of feature extractions and artifact removal approaches. We evaluate the performance of OMAD in terms of accuracy, F1 score, memory usage and running time for both unpruned and compressed models using EEG datamore »from both control and treatment (alcoholic) groups for different object recognition tasks. Our artifact removal model and main activity detection model achieved about ≈ 93% and 90% accuracy, respectively with significant reduction in model size (70%) and inference time (31%).« less
  4. Human activity recognition (HAR) from wearable sensor data has recently gained widespread adoption in a number of fields. However, recognizing complex human activities, postural and rhythmic body movements (e.g., dance, sports) is challenging due to the lack of domain-specific labeling information, the perpetual variability in human movement kinematics profiles due to age, sex, dexterity and the level of professional training. In this paper, we propose a deep activity recognition model to work with limited labeled data, both for simple and complex human activities. To mitigate the intra- and inter-user spatio-temporal variability of movements, we posit novel data augmentation and domain normalization techniques. We depict a semi-supervised technique that learns noise and transformation invariant feature representation from sparsely labeled data to accommodate intra-personal and inter-user variations of human movement kinematics. We also postulate a transfer learning approach to learn domain invariant feature representations by minimizing the feature distribution distance between the source and target domains. We showcase the improved performance of our proposed framework, AugToAct, using a public HAR dataset. We also design our own data collection, annotation and experimental setup on complex dance activity recognition steps and kinematics movements where we achieved higher performance metrics with limited label data comparedmore »to simple activity recognition tasks.« less