ETL and ML Forecasting Modeling Process Automation System

Wu, Jennifer; Bein, Doina; Huang, Jidong; Kurwadkar, Sudarshan

doi:10.54941/ahfe1003775

Citation Details

ETL and ML Forecasting Modeling Process Automation System

Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error. more »

Award ID(s):: 1832536

PAR ID:: 10487924

Author(s) / Creator(s):: Wu, Jennifer; Bein, Doina; Huang, Jidong; Kurwadkar, Sudarshan

Editor(s):: Waldemar Karwowski

Publisher / Repository:: AHFE Open Access

Date Published:: 2023-01-01

Journal Name:: Applied Human Factors and Ergonomics International

ISSN:: 2771-0718

Subject(s) / Keyword(s):: Extract-transform-load (ETL) process, Machine learning, Random forest regressor, Forecasting model, Online retailer, Apache Airflow, Docker, AWS

Format(s):: Medium: X

Location:: New York, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.54941/ahfe1003775

More Like this