skip to main content


Title: Rank Position Forecasting in Car Racing
Rank position forecasting in car racing is a challenging problem when using a Deep Learning-based model over timeseries data. It is featured with highly complex global dependency among the racing cars, with uncertainty resulted from existing and external factors; and it is also a problem with data scarcity. Existing methods, including statistical models, machine learning regression models, and several state-of-the-art deep forecasting models all perform not well on this problem. By an elaborate analysis of pit stop events, we find it critical to decompose the cause-and-effect relationship and model the rank position and pit stop events separately. In choosing a sub-model from different neural network models, we find the model with weak assumptions on the global dependency structure performs the best. Based on these observations, we propose RankNet, a combination of the encoder-decoder network and a separate Multilayer Perception network that is capable of delivering probabilistic forecasting to model the pit stop events and rank position in car racing. Further with the help of feature optimizations, RankNet demonstrates a significant performance improvement, where MAE improves 19% in two laps forecasting task and 7% in the stint forecasting task over the best baseline and is also more stable when adapting to unseen new data. Details of the model optimizations and performance profiling are presented. It is promising to provide useful interactions of neural networks in forecasting racing cars and shine a light on solutions to similar challenging issues in general forecasting problems.  more » « less
Award ID(s):
2151597
NSF-PAR ID:
10320706
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the difference that the objective is to estimate probabilities rather than predicting the specific outcome. This work investigates probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on model (epistemic) uncertainty. For problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. We also give a theoretical analysis of a model for high-dimensional probability estimation which reproduces several of the phenomena evinced in our experiments. Finally, we propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data. 
    more » « less
  2. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  3. According to the National Academies, a week long forecast of velocity, vertical structure, and duration of the Loop Current (LC) and its eddies at a given location is a critical step toward understanding their effects on the gulf ecosystems as well as toward anticipating and mitigating the outcomes of anthropogenic and natural disasters in the Gulf of Mexico (GoM). However, creating such a forecast has remained a challenging problem since LC behavior is dominated by dynamic processes across multiple time and spatial scales not resolved at once by conventional numerical models. In this paper, building on the foundation of spatiotemporal predictive learning in video prediction, we develop a physics informed deep learning based prediction model called—Physics-informed Tensor-train ConvLSTM (PITT-ConvLSTM)—for forecasting 3D geo-spatiotemporal sequences. Specifically, we propose (1) a novel 4D higher-order recurrent neural network with empirical orthogonal function analysis to capture the hidden uncorrelated patterns of each hierarchy, (2) a convolutional tensor-train decomposition to capture higher-order space-time correlations, and (3) a mechanism that incorporates prior physics from domain experts by informing the learning in latent space. The advantage of our proposed approach is clear: constrained by the law of physics, the prediction model simultaneously learns good representations for frame dependencies (both short-term and long-term high-level dependency) and inter-hierarchical relations within each time frame. Experiments on geo-spatiotemporal data collected from the GoM demonstrate that the PITT-ConvLSTM model can successfully forecast the volumetric velocity of the LC and its eddies for a period greater than 1 week. 
    more » « less
  4. Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling, planning, and energy system forecasting. Independently of the application domain, proximal algorithms have emerged as a formal optimization method that successfully solves a wide array of existing problems, often exploiting problem-specific structures in the optimization. Although model-based formal optimization provides a principled approach to problem modeling with convergence guarantees, at first glance, this seems to be at odds with black-box deep learning methods. A recent line of work shows that, when combined with learning-based ingredients, model-based optimization methods are effective, interpretable, and allow for generalization to a wide spectrum of applications with little or no extra training data. However, experimenting with such hybrid approaches for different tasks by hand requires domain expertise in both proximal optimization and deep learning, which is often error-prone and time-consuming. Moreover, naively unrolling these iterative methods produces lengthy compute graphs, which when differentiated via autograd techniques results in exploding memory consumption, making batch-based training challenging. In this work, we introduce ∇-Prox, a domain-specific modeling language and compiler for large-scale optimization problems using differentiable proximal algorithms. ∇-Prox allows users to specify optimization objective functions of unknowns concisely at a high level, and intelligently compiles the problem into compute and memory-efficient differentiable solvers. One of the core features of ∇-Prox is its full differentiability, which supports hybrid model- and learning-based solvers integrating proximal optimization with neural network pipelines. Example applications of this methodology include learning-based priors and/or sample-dependent inner-loop optimization schedulers, learned with deep equilibrium learning or deep reinforcement learning. With a few lines of code, we show ∇-Prox can generate performant solvers for a range of image optimization problems, including end-to-end computational optics, image deraining, and compressive magnetic resonance imaging. We also demonstrate ∇-Prox can be used in a completely orthogonal application domain of energy system planning, an essential task in the energy crisis and the clean energy transition, where it outperforms state-of-the-art CVXPY and commercial Gurobi solvers. 
    more » « less
  5. Time-evolution of partial differential equations is the key to model several dynamical processes, events forecasting but the operators associated with such problems are non-linear. We propose a Padé approximation based exponential neural operator scheme for efficiently learning the map between a given initial condition and activities at a later time. The multiwavelets bases are used for space discretization. By explicitly embedding the exponential operators in the model, we reduce the training parameters and make it more data-efficient which is essential in dealing with scarce real-world datasets. The Padé exponential operator uses a to model the non-linearity compared to recent neural operators that rely on using multiple linear operator layers in succession. We show theoretically that the gradients associated with the recurrent Padé network are bounded across the recurrent horizon. We perform experiments on non-linear systems such as Korteweg-de Vries (KdV) and Kuramoto–Sivashinsky (KS) equations to show that the proposed approach achieves the best performance and at the same time is data-efficient. We also show that urgent real-world problems like Epidemic forecasting (for example, COVID-19) can be formulated as a 2D time-varying operator problem. The proposed Padé exponential operators yield better prediction results ( better MAE than best neural operator (non-neural operator deep learning model)) compared to state-of-the-art forecasting models. 
    more » « less