skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: ETL and ML Forecasting Modeling Process Automation System

Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error.

 
more » « less
Award ID(s):
1832536
NSF-PAR ID:
10487924
Author(s) / Creator(s):
; ; ;
Editor(s):
Waldemar Karwowski 
Publisher / Repository:
AHFE Open Access
Date Published:
Journal Name:
Applied Human Factors and Ergonomics International
ISSN:
2771-0718
Subject(s) / Keyword(s):
Extract-transform-load (ETL) process, Machine learning, Random forest regressor, Forecasting model, Online retailer, Apache Airflow, Docker, AWS
Format(s):
Medium: X
Location:
New York, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Sales forecasts are critical to businesses of all sizes, enabling teams to project revenue, prioritize marketing, plan distribution, and scale inventory levels. To date, however, sales forecasts of new products have been shown to be highly inaccurate, due in large part to the lack of data about each new product and the subjective judgements required to compensate for this lack of data. The present study explores product sales forecasting performed by human groups and compares the accuracy of group forecasts generated by traditional polls to those made using Artificial Swarm Intelligence (ASI), a technique which has been shown to amplify the forecasting accuracy of groups in a wide range of fields. In collaboration with a major fashion retailer and a major fashion publisher, groups of fashion-conscious millennial women predicted the relative sales volumes of eight sweaters promoted during the 2018 holiday season, first by ranking each sweater’s sales in an online poll, and then using Swarm software to form an ASI system. The Swarm-based forecast was significantly more accurate than the poll. In fact, the top four sweaters ranked by swarm sold 23.7% more units, or $600,000 worth of sweaters during the target period, as compared to the top four sweaters as ranked by survey, (p = 0.0497), indicating that swarms of small consumer groups can be used to forecast sales with significantly higher accuracy than a traditional poll. 
    more » « less
  2. Food retailers are stores that stock staple perishable foods such as vegetables, fruits, dairy, bread, cereal, meat, poultry, or fish on a continuous basis and sell these items to the public. Store types include supercenters, grocery stores, convenience stores, combination stores, dollar stores, butcher shops, bakeries, and other specialty food stores. This mission focused on understanding how critical infrastructure failures impact the function of food retailers and how the change in functioning changes food access. This research focused on five infrastructure systems -- transportation, electric power, communications, water, and the buildings utilized by food retailers to carry out their normal activities. The functioning of food retailers was broken down into three branches or domains that are critical for the operation of a food retailer. Specifically, food retailers need 1) people to help run the operation, 2) property or, more generally, a physical structure or structures, to house and conduct operations; 3) products or food stuffs to sell. This mission includes four social science collections related to the in-person survey of food retailers. These collections include the sample frame (a list of all food retailers within the study area with a chance of being randomly selected for the survey), the primary (raw) data collected from the Harris County and Southeast Texas surveys, and an example of a secondary (curated) dataset that focuses on critical infrastructure failures and changes in food retailer functioning.Food insecurity is a chronic problem in the United States that annually affects over 40 million people under normal conditions. This difficult reality can dramatically worsen after disasters. Such events can disrupt both the supply and demand sides of food systems, restricting food distribution and access precisely when households are in a heightened need for food assistance. Often, retailers and food banks must react quickly to meet local needs under difficult post-disaster circumstances. Residents of Harris County and Southeast Texas experienced this problem after Hurricane Harvey made landfall on the Texas Gulf Coast in August 2017. The primary data collected by this project relate specifically to the supply side. The data attempt to identify factors that impacted the ability of suppliers to help ensure access to food, with a focus on fresh food access. Factors included impacts to people, property and products due to hurricane-related damage to infrastructure. Two types of food suppliers were the foci of this research: food aid agencies and food retailers. The research team examined food aid agencies in Southeast Texas with data collection methods that included secondary data analysis, a focus group and an online survey. The second population studied was food retailers with in-person surveys with store managers. Food retailers were randomly sampled in three Texas counties: Jefferson, Orange, and Harris. The data collection methods resulted in 32 food aid agency online survey responses and 210 completed food retail in-person surveys. Data were collected five to eight months after the event, which helped to increase the reliability and validity of the data. The time-sensitive nature of post-disaster data requires research teams to quickly organize their efforts before entering the field. The purpose of this project archive is to share the primary data collected, document methods, and to help future research teams reduce the amount of time needed for project development and reporting. This archive does not contain Personally or Business Identifiable Information. 
    more » « less
  3. With rapid innovation in the electronics industry, product obsolescence forecasting has become increasingly important. More accurate obsolescence forecasting would have cost reduction effects in product design and part procurement over a product’s lifetime. Currently many obsolescence forecasting methods require manual input or perform market analysis on a part by part basis; practices that are not feasible for large bill of materials. In response, this paper introduces an obsolescence forecasting framework that is capable of being scaled to meet industry needs while remaining highly accurate. The framework utilizes machine learning to classify parts as active, in production, or obsolete and discontinued. This classification and labeling of parts can be useful in the design stage in part selection and during inventory management with evaluating the chance that suppliers might stop production. A case study utilizing the proposed framework is presented to demonstrate and validate the improved accuracy of obsolescence risk forecasting. As shown, the framework correctly identified active and obsolete products with an accuracy as high as 98.3%.

     
    more » « less
  4. Access after disasters to resources such as food poses planning problems that affect millions of people each year. Understanding how disasters disrupt and alter food access during the initial steps of the recovery process provides new evidence to inform both food system and disaster planning. This research takes a supply-side focus and explores the results from a survey of food retailers after Hurricane Harvey in three Texas counties. The survey collected information on how the disaster affected a store’s property, people, and products and the length of time a store was closed, had reduced hours, and stopped selling fresh food items. We find that a focus only on store closures and property damage would underestimate the number of days residents have limited fresh food access by nearly two weeks. Further, stores in lower-income communities with chronic low-access to supermarkets (food deserts) were closed longer than other stores, potentially compounding pre-existing inequalities. We conclude that to plan for a more equitable food supply post-disaster, planners should embrace more dimensions of access, encourage retailer mitigation, and assess the types of retailers and their distribution within their communities. This mission includes social science collections related to models for predicting days to restoration of food access after a disaster. These collections include the programed workflow to replicated results for the journal article: Rosenheim, Nathanael, Maria Watson, John Cassels Connors, Mastura Safayet, Walter Gillis Peacock. “Food Access After Disasters: A Multidimensional View of Restoration After Hurricane Harvey”. Journal of the American Planning Association. doi.org/0.1080/01944363.2023.2284160Food insecurity is a chronic problem in the United States that annually affects over 40 million people under normal conditions. This difficult reality can dramatically worsen after disasters. Such events can disrupt both the supply and demand sides of food systems, restricting food distribution and access precisely when households are in a heightened need for food assistance. Often, retailers and food banks must react quickly to meet local needs under difficult post-disaster circumstances. Residents of Harris County and Southeast Texas experienced this problem after Hurricane Harvey made landfall on the Texas Gulf Coast in August 2017. The primary data collected by this project relate specifically to the supply side. The data attempt to identify factors that impacted the ability of suppliers to help ensure access to food, with a focus on fresh food access. Factors included impacts to people, property and products due to hurricane-related damage to infrastructure. Two types of food suppliers were the foci of this research: food aid agencies and food retailers. The research team examined food aid agencies in Southeast Texas with data collection methods that included secondary data analysis, a focus group and an online survey. The second population studied was food retailers with in-person surveys with store managers. Food retailers were randomly sampled in three Texas counties: Jefferson, Orange, and Harris. The data collection methods resulted in 32 food aid agency online survey responses and 210 completed food retail in-person surveys. Data were collected five to eight months after the event, which helped to increase the reliability and validity of the data. The time-sensitive nature of post-disaster data requires research teams to quickly organize their efforts before entering the field. The purpose of this project archive is to share the primary data collected, document methods, and to help future research teams reduce the amount of time needed for project development and reporting. This archive does not contain Personally or Business Identifiable Information. 
    more » « less
  5. This paper studies an inventory management problem faced by an upstream supplier that is in a collaborative agreement, such as vendor-managed inventory (VMI), with a retailer. A VMI partnership provides the supplier an opportunity to manage in- ventory for the supply chain in exchange for point-of-sales (POS)- and inventory-level information from the retailer. However, retailers typically possess superior local market information and as has been the case in recent years, are able to capture and analyze customer purchasing behavior beyond the traditional POS data. Such analyses provide the retailer access to market signals that are otherwise hard to capture using POS information. We show and quantify the implication of the financial obligations of each party in VMI that renders communication of such important market signals as noncredible. To help insti- tute a sound VMI collaboration, we propose learn and screen—a dynamic inventory mechanism—for the supplier to effectively manage inventory and information in the supply chain. The proposed mechanism combines the ability of the supplier to learn about market conditions from POS data (over multiple selling periods) and dynamically de- termine when to screen the retailer and acquire his private demand information. Inventory decisions in the proposed mechanism serve a strategic purpose in addition to their classic role of satisfying customer demand. We show that our proposed dynamic mechanism significantly improves the supplier’s expected profit and increases the efficiency of the overall supply chain operations under a VMI agreement. In addition, we determine the market conditions in which a strategic approach to VMI results in significant profit im- provements for both firms, particularly when the retailer has high market power (i.e., when the supplier highly depends on the retailer) and when the supplier has relatively less knowledge about the end customer/market compared with the retailer. 
    more » « less