- PAR ID:
- 10054141
- Date Published:
- Journal Name:
- Cluster Computing
- ISSN:
- 1386-7857
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Effective public transit operations are one of the fundamental requirements for a modern community. Recently, a number of transit agencies have started integrating automated vehicle locators in their fleet, which provides a real-time estimate of the time of arrival. In this paper, we use the data collected over several months from one such transit system and show how this data can be potentially used to learn long term patterns of travel time. More specifically, we study the effect of weather and other factors such as traffic on the transit system delay. These models can later be used to understand the seasonal variations and to design adaptive and transient transit schedules. Towards this goal, we also propose an online architecture called DelayRadar. The novelty of DelayRadar lies in three aspects: (1) a data store that collects and integrates real-time and static data from multiple data sources, (2) a predictive statistical model that analyzes the data to make predictions on transit travel time, and (3) a decision making framework to develop an optimal transit schedule based on variable forecasts related to traffic, weather, and other impactful factors. This paper focuses on identifying the model with the best predictive accuracy to be used in DelayRadar. According to the preliminary study results, we are able to explain more than 70% of the variance in the bus travel time and we can make future travel predictions with an out-of-sample error of 4.8 minutes with information on the bus schedule, traffic, and weather.more » « less
-
null (Ed.)
With the popularity of the Internet, traditional offline resource allocation has evolved into a new form, called online resource allocation. It features the online arrivals of agents in the system and the real-time decision-making requirement upon the arrival of each online agent. Both offline and online resource allocation have wide applications in various real-world matching markets ranging from ridesharing to crowdsourcing. There are some emerging applications such as rebalancing in bike sharing and trip-vehicle dispatching in ridesharing, which involve a two-stage resource allocation process. The process consists of an offline phase and another sequential online phase, and both phases compete for the same set of resources. In this paper, we propose a unified model which incorporates both offline and online resource allocation into a single framework. Our model assumes non-uniform and known arrival distributions for online agents in the second online phase, which can be learned from historical data. We propose a parameterized linear programming (LP)-based algorithm, which is shown to be at most a constant factor of 1/4 from the optimal. Experimental results on the real dataset show that our LP-based approaches outperform the LP-agnostic heuristics in terms of robustness and effectiveness.
-
Real-time decision making has acquired increasing interest as a means to efficiently operating complex systems. The main challenge in achieving real-time decision making is to understand how to develop next generation optimization procedures that can work efficiently using: (i) real data coming from a large complex dynamical system, (ii) simulation models available that reproduce the system dynamics. While this paper focuses on a different problem with respect to the literature in RL, the methods proposed in this paper can be used as a support in a sequential setting as well. The result of this work is the new Generalized Ordinal Learning Framework (GOLF) that utilizes simulated data interpreting them as low accuracy information to be intelligently collected offline and utilized online once the scenario is revealed to the user. GOLF supports real-time decision making on complex dynamical systems once a specific scenario is realized. We show preliminary results of the proposed techniques that motivate the authors in further pursuing the presented ideas.more » « less
-
Summary Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event‐driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on‐demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event‐driven model. The services are increasingly moving to an event‐driven model in the form of Function as a Service (FaaS) to compose services. An event‐driven runtime designed for data processing consists of well‐understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component‐based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.
-
A large variety of sound sources in the ocean, including biological, geophysical, and man-made, can be simultaneously monitored over instantaneous continental-shelf scale regions via the passive ocean acoustic waveguide remote sensing (POAWRS) technique by employing a large-aperture densely-populated coherent hydrophone array system. Millions of acoustic signals received on the POAWRS system per day can make it challenging to identify individual sound sources. An automated classification system is necessary to enable sound sources to be recognized. Here, the objectives are to (i) gather a large training and test data set of fin whale vocalization and other acoustic signal detections; (ii) build multiple fin whale vocalization classifiers, including a logistic regression, support vector machine (SVM), decision tree, convolutional neural network (CNN), and long short-term memory (LSTM) network; (iii) evaluate and compare performance of these classifiers using multiple metrics including accuracy, precision, recall and F1-score; and (iv) integrate one of the classifiers into the existing POAWRS array and signal processing software. The findings presented here will (1) provide an automatic classifier for near real-time fin whale vocalization detection and recognition, useful in marine mammal monitoring applications; and (2) lay the foundation for building an automatic classifier applied for near real-time detection and recognition of a wide variety of biological, geophysical, and man-made sound sources typically detected by the POAWRS system in the ocean.more » « less