skip to main content


Title: Deep Learning for Climate Models of the Atlantic Ocean
A deep neural network is trained to predict sea surface temperature variations at two important regions of the Atlantic ocean, using 800 years of simulated climate dynamics based on the first-principles physics models. This model is then tested against 60 years of historical data. Our statistical model learns to approximate the physical laws governing the simulation, providing significant improvement over simple statistical forecasts and comparable to most state-of-the-art dynamical/conventional forecast models for a fraction of the computational cost.  more » « less
Award ID(s):
1920304
NSF-PAR ID:
10273992
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
AAAI Spring Symposium: MLPS, 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Goal 1 of the 2030 Agenda for Sustainable Development, adopted by all United Nations member States in 2015, is to end poverty in all forms everywhere. The major indicator to monitor the goal is the so-called headcount ratio or poverty rate, i.e., proportion or percentage of people under poverty. In India, where nearly a quarter of population still live below the poverty line, monitoring of poverty needs greater attention, more frequently at shorter intervals (e.g., every year) to evaluate the effectiveness of planning, programs and actions taken by the governments to eradicate poverty. Poverty rate computation for India depends on two basic ingredients – rural and urban poverty lines for different states and union territories and average Monthly Per-capita Consumer Expenditure (MPCE). While MPCE can be obtained every year, usually from the Consumer Expenditure Survey on shorter schedules with a few exceptions where the information is obtained from another survey, determination of poverty lines is a highly complex, costly and time-consuming process. Poverty lines are essentially determined by a panel of experts who draws their conclusions partly based on their subjective opinions and partly based on data from multiple sources. The main data source the panel uses is the Consumer Expenditure Survey data with a detailed schedule, which are usually available every five years or so. In this paper, we undertake a feasibility study to explore if estimates of headcount ratios or Poverty Ratios in intervening years can be provided in absence of poverty lines by relating poverty ratios with average MPCE through a statistical model. Then we can use the fitted model to predict poverty rates for intervening years based on average MPCE. We explore a few in this work models using Bayesian methodology. The reason behind calling this ‘synthetic prediction’ rests on the synthetic assumption of model invariance over years, often used in the small area literature. While the data-based assessment of our Bayesian synthetic prediction procedure is encouraging, there is a great potential for improvements on the models presented in this paper, e.g., by incorporating more auxiliary data as they become available. In any case, we expect our preliminary work in this important area will encourage researchers to think about statistical modeling as a possible way to at least partially solve a problem for which no objective solution is currently available. 
    more » « less
  2. Abstract

    We study the statistical properties of tidal weather (variability period <30 days) of DW1 amplitude using the extended Canadian Middle Atmospheric Model (eCMAM) and Sounding of the Atmosphere using Broadband Emission Radiometry (SABER). A hierarchy of statistical models, for example, the autoregressive (AR), vector AR, and parsimonious vector AR models, are built to predict tidal weather. The quasi 23‐day oscillation found in the tidal weather is a key parameter in the statistical models. Comparing to the more complex vector AR and parsimonious vector AR models, which consider the spatial correlations of tidal weather, the simplest AR model can predict one‐day tidal weather with an accuracy of 89% (R2: correlation coefficient squared). In the AR model, 23 coefficients at each latitude and height are obtained from seven years of eCMAM data. Tidal weather is predicted via a linear combination of 23 days of tidal weather data prior to the prediction day. Different sensitivity tests are performed to prove the robustness of these coefficients. These coefficients obtained from eCMAM are in very good agreement with those from SABER. SABER tidal weather is predicted with an accuracy of 86% and 87% at one day by the AR models with coefficients from eCMAM and SABER, respectively. The five‐day forecast accuracy is between 60 and 65%.

     
    more » « less
  3. Abstract

    Fluctuations in the path of the Gulf Stream (GS) have been previously studied by primarily connecting to either the wind‐driven subtropical gyre circulation or buoyancy forcing via the subpolar gyre. Here we present a statistical model for 1 year predictions of the GS path (represented by the GS northern wall—GSNW) betweenW andW incorporating both mechanisms in a combined framework. An existing model with multiple parameters including the previous year's GSNW index, center location, and amplitude of the Icelandic Low and the Southern Oscillation Index was augmented with basin‐wide Ekman drift over the Azores High. The addition of the wind is supported by a validation of the simpler two‐layer Parsons‐Veronis model of GS separation over the last 40 years. A multivariate analysis was carried out to compare 1‐year‐in‐advance forecast correlations from four different models. The optimal predictors of the best performing model include: (a) the GSNW index from the previous year, (b) gyre‐scale integrated Ekman Drift over the past 2 years, and (c) longitude of the Icelandic Low center lagged by 3 years. The forecast correlation over the 27 years (1994–2020) is 0.65, an improvement from the previous multi‐parameter model's forecast correlation of 0.52. The improvement is attributed to the addition of the wind‐drift component. The sensitivity of forecasting the GS path after extreme atmospheric years is quantified. Results indicate the possibility of better understanding and enhanced predictability of the dominant wind‐driven variability of the Atlantic Meridional Overturning Circulation and of fisheries management models that use the GS path as a metric.

     
    more » « less
  4. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  5. Abstract

    Predicting infectious disease dynamics is a central challenge in disease ecology. Models that can assess which individuals are most at risk of being exposed to a pathogen not only provide valuable insights into disease transmission and dynamics but can also guide management interventions. Constructing such models for wild animal populations, however, is particularly challenging; often only serological data are available on a subset of individuals and nonlinear relationships between variables are common.

    Here we provide a guide to the latest advances in statistical machine learning to construct pathogen‐risk models that automatically incorporate complex nonlinear relationships with minimal statistical assumptions from ecological data with missing data. Our approach compares multiple machine learning algorithms in a unified environment to find the model with the best predictive performance and uses game theory to better interpret results. We apply this framework on two major pathogens that infect African lions: canine distemper virus (CDV) and feline parvovirus.

    Our modelling approach provided enhanced predictive performance compared to more traditional approaches, as well as new insights into disease risks in a wild population. We were able to efficiently capture and visualize strong nonlinear patterns, as well as model complex interactions between variables in shaping exposure risk from CDV and feline parvovirus. For example, we found that lions were more likely to be exposed to CDV at a young age but only in low rainfall years.

    When combined with our data calibration approach, our framework helped us to answer questions about risk of pathogen exposure that are difficult to address with previous methods. Our framework not only has the potential to aid in predicting disease risk in animal populations, but also can be used to build robust predictive models suitable for other ecological applications such as modelling species distribution or diversity patterns.

     
    more » « less