skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: When Will Breakfast be Ready: Temporal Prediction of Food Readiness Using Deep Convolutional Neural Networks on Thermal Videos
In this paper, we perform prediction of food readiness during cooking by using deep convolutional neural networks on thermal video data. Our work treats readiness prediction as ultra-fine recognition of progression in cooking at a per-frame level. We analyze the performance of readiness prediction for eggs, pancakes, and bacon strips using two types of neural networks: a classifier network that bins a frame into one of five classes depending on how far cooking has progressed at that frame, and a regressor network that predicts percentage of cooking time spent at each frame. Our work provides classification accuracies of 98% and higher within one step of the ground truth class using the classifier, and provides an average error of within 20 seconds for the elapsed time predicted using the regressor when compared to ground truth.  more » « less
Award ID(s):
1730183
PAR ID:
10094311
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data. 
    more » « less
  2. We introduce a novel method for summarization of whiteboard lecture videos using key handwritten content regions. A deep neural network is used for detecting bounding boxes that contain semantically meaningful groups of handwritten content. A neural network embedding is learnt, under triplet loss, from the detected regions in order to discriminate between unique handwritten content. The detected regions along with embeddings at every frame of the lecture video are used to extract unique handwritten content across the video which are presented as the video summary. Additionally, a spatiotemporal index is constructed from the video which records the time and location of each individual summary region in the video which can potentially be used for content-based search and navigation. We train and test our methods on the publicly available AccessMath dataset. We use the DetEval scheme to benchmark our summarization by recall of unique ground truth objects (92.09%) and average number of summary regions (128) compared to the ground truth (88). 
    more » « less
  3. Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings. 
    more » « less
  4. We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network y = Aσ(Wx) + ξ, where A,W are weight matrices, ξ represents noise, and the number of neurons in the hidden layer is no larger than the input or output, our algorithm is guaranteed to recover the parameters A,W of the ground-truth network. The only requirement on the input x is that it is symmetric, which still allows highly complicated and structured input. Our algorithm is based on the method-of-moments framework and extends several results in tensor decompositions. We use spectral algorithms to avoid the complicated non-convex optimization in learning neural networks. Experiments show that our algorithm can robustly learn the ground-truth neural network with a small number of samples for many symmetric input distributions. 
    more » « less
  5. This paper proposes a machine learning method to predict the solutions of related nonlinear optimal control problems given some parametric input, such as the initial state. The map between problem parameters to optimal solutions is called the problem-optimum map, and is often discontinuous due to nonconvexity, discrete homotopy classes, and control switching. This causes difficulties for traditional function approximators such as neural networks, which assume continuity of the underlying function. This paper proposes a mixture of experts (MoE) model composed of a classifier and several regressors, where each regressor is tuned to a particular continuous region. A novel training approach is proposed that trains classifier and regressors independently. MoE greatly outperforms standard neural networks, and achieves highly reliable trajectory prediction (over 99.5% accuracy) in several dynamic vehicle control problems. 
    more » « less