skip to main content

Title: Online Neural Cell Tracking using Blob-Seed Segmentation and Optical Flow
Existing neural cell tracking methods generally use the morphology cell features for data association. However, these features are limited to the quality of cell segmentation and are prone to errors for mitosis determination. To over- come these issues, in this work we propose an online multi- object tracking method that leverages both cell appearance and motion features for data association. In particular, we propose a supervised blob-seed network (BSNet) to predict the cell appearance features and an unsupervised optical flow network (UnFlowNet) for capturing the cell motions. The data association is then solved using the Hungarian al- gorithm. Experimental evaluation shows that our approach achieves better performance than existing neural cell track- ing methods.
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
CVPR 2019 Workshop
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a novel approach to multi-person multi-camera tracking based on learning the space-time continuum of a camera network. Some challenges involved in tracking multiple people in real scenarios include a) ensuring reliable continuous association of all persons, and b) accounting for presence of blind-spots or entry/exit points. Most of the existing methods design sophisticated models that require heavy tuning of parameters and it is a nontrivial task for deep learning approaches as they cannot be applied directly to address the above challenges. Here, we deal with the above points in a coherent way by proposing a discriminative spatio-temporal learningmore »approach for tracking based on person re-identification using LSTM networks. This approach is more robust when no a-priori information about the aspect of an individual or the number of individuals is known. The idea is to identify detections as belonging to the same individual by continuous association and recovering from past errors in associating different individuals to a particular trajectory. We exploit LSTM's ability to infuse temporal information to predict the likelihood that new detections belong to the same tracked entity by jointly incorporating visual appearance features and location information. The proposed approach gives a 50% improvement in the error rate compared to the previous state-of-the-art method on the CamNeT dataset and 18% improvement as compared to the baseline approach on DukeMTMC dataset.« less
  2. Abstract Background Analyzing single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the identification of cell types. With the availability of a huge amount of single cell sequencing data and discovering more and more cell types, classifying cells into known cell types has become a priority nowadays. Several methods have been introduced to classify cells utilizing gene expression data. However, incorporating biological gene interaction networks has been proved valuable in cell classification procedures. Results In this study, we propose amore »multimodal end-to-end deep learning model, named sigGCN, for cell classification that combines a graph convolutional network (GCN) and a neural network to exploit gene interaction networks. We used standard classification metrics to evaluate the performance of the proposed method on the within-dataset classification and the cross-dataset classification. We compared the performance of the proposed method with those of the existing cell classification tools and traditional machine learning classification methods. Conclusions Results indicate that the proposed method outperforms other commonly used methods in terms of classification accuracy and F1 scores. This study shows that the integration of prior knowledge about gene interactions with gene expressions using GCN methodologies can extract effective features improving the performance of cell classification.« less
  3. Web tracking and advertising (WTA) nowadays are ubiquitously performed on the web, continuously compromising users' privacy. Existing defense solutions, such as widely deployed blocking tools based on filter lists and alternative machine learning based solutions proposed in prior research, have limitations in terms of accuracy and effectiveness. In this work, we propose WtaGraph, a web tracking and advertising detection framework based on Graph Neural Networks (GNNs). We first construct an attributed homogenous multi-graph (AHMG) that represents HTTP network traffic, and formulate web tracking and advertising detection as a task of GNN-based edge representation learning and classification in AHMG. We thenmore »design four components in WtaGraph so that it can (1) collect HTTP network traffic, DOM, and JavaScript data, (2) construct AHMG and extract corresponding edge and node features, (3) build a GNN model for edge representation learning and WTA detection in the transductive learning setting, and (4) use a pre-trained GNN model for WTA detection in the inductive learning setting. We evaluate WtaGraph on a dataset collected from Alexa Top 10K websites, and show that WtaGraph can effectively detect WTA requests in both transductive and inductive learning settings. Manual verification results indicate that WtaGraph can detect new WTA requests that are missed by filter lists and recognize non-WTA requests that are mistakenly labeled by filter lists. Our ablation analysis, evasion evaluation, and real-time evaluation show that WtaGraph can have a competitive performance with flexible deployment options in practice.« less
  4. Camera-based systems are increasingly used for collecting information on intersections and arterials. Unlike loop controllers that can generally be only used for detection and movement of vehicles, cameras can provide rich information about the traffic behavior. Vision-based frameworks for multiple-object detection, object tracking, and near-miss detection have been developed to derive this information. However, much of this work currently addresses processing videos offline. In this article, we propose an integrated two-stream convolutional networks architecture that performs real-time detection, tracking, and near-accident detection of road users in traffic video data. The two-stream model consists of a spatial stream network for objectmore »detection and a temporal stream network to leverage motion features for multiple-object tracking. We detect near-accidents by incorporating appearance features and motion features from these two networks. Further, we demonstrate that our approaches can be executed in real-time and at a frame rate that is higher than the video frame rate on a variety of videos collected from fisheye and overhead cameras.« less
  5. In computer vision, tracking humans across camera views remains challenging, especially for complex scenarios with frequent occlusions, significant lighting changes and other difficulties. Under such conditions, most existing appearance and geometric cues are not reliable enough to distinguish humans across camera views. To address these challenges, this paper presents a stochastic attribute grammar model for leveraging complementary and discriminative human attributes for enhancing cross-view tracking. The key idea of our method is to introduce a hierarchical representation, parse graph, to describe a subject and its movement trajectory in both space and time domains. This results in a hierarchical compositional representation,more »comprising trajectory entities of varying level, including human boxes, 3D human boxes, tracklets and trajectories. We use a set of grammar rules to decompose a graph node (e.g. tracklet) into a set of children nodes (e.g. 3D human boxes), and augment each node with a set of attributes, including geometry (e.g., moving speed, direction), accessories (e.g., bags), and/or activities (e.g., walking, running). These attributes serve as valuable cues, in addition to appearance features (e.g., colors), in determining the associations of human detection boxes across cameras. In particular, the attributes of a parent node are inherited by its children nodes, resulting in consistency constraints over the feasible parse graph. Thus, we cast cross-view human tracking as finding the most discriminative parse graph for each subject in videos. We develop a learning method to train this attribute grammar model from weakly supervised training data. To infer the optimal parse graph and its attributes, we develop an alternative parsing method that employs both top-down and bottom-up computations to search the optimal solution. We also explicitly reason the occlusion status of each entity in order to deal with significant changes of camera viewpoints. We evaluate the proposed method over public video benchmarks and demonstrate with extensive experiments that our method clearly outperforms state-of-theart tracking methods.« less