Sales forecasts are critical to businesses of all sizes, enabling teams to project revenue, prioritize marketing, plan distribution, and scale inventory levels. To date, however, sales forecasts of new products have been shown to be highly inaccurate, due in large part to the lack of data about each new product and the subjective judgements required to compensate for this lack of data. The present study explores product sales forecasting performed by human groups and compares the accuracy of group forecasts generated by traditional polls to those made using Artificial Swarm Intelligence (ASI), a technique which has been shown to amplify the forecasting accuracy of groups in a wide range of fields. In collaboration with a major fashion retailer and a major fashion publisher, groups of fashion-conscious millennial women predicted the relative sales volumes of eight sweaters promoted during the 2018 holiday season, first by ranking each sweater’s sales in an online poll, and then using Swarm software to form an ASI system. The Swarm-based forecast was significantly more accurate than the poll. In fact, the top four sweaters ranked by swarm sold 23.7% more units, or $600,000 worth of sweaters during the target period, as compared to the top four sweaters as ranked by survey, (p = 0.0497), indicating that swarms of small consumer groups can be used to forecast sales with significantly higher accuracy than a traditional poll.
more »
« less
ETL and ML Forecasting Modeling Process Automation System
Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error.
more »
« less
- Award ID(s):
- 1832536
- PAR ID:
- 10487924
- Editor(s):
- Waldemar Karwowski
- Publisher / Repository:
- AHFE Open Access
- Date Published:
- Journal Name:
- Applied Human Factors and Ergonomics International
- ISSN:
- 2771-0718
- Subject(s) / Keyword(s):
- Extract-transform-load (ETL) process, Machine learning, Random forest regressor, Forecasting model, Online retailer, Apache Airflow, Docker, AWS
- Format(s):
- Medium: X
- Location:
- New York, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface.more » « less
-
Summary Large scientific facilities provide researchers with instrumentation, data, and data products that can accelerate scientific discovery. However, increasing data volumes coupled with limited local computational power prevents researchers from taking full advantage of what these facilities can offer. Many researchers looked into using commercial and academic cyberinfrastructure (CI) to process these data. Nevertheless, there remains a disconnect between large facilities and CI that requires researchers to be actively part of the data processing cycle. The increasing complexity of CI and data scale necessitates new data delivery models, those that can autonomously integrate large‐scale scientific facilities and CI to deliver real‐time data and insights. In this paper, we present our initial efforts using the Ocean Observatories Initiative project as a use case. In particular, we present a subscription‐based data streaming service for data delivery that leverages the Apache Kafka data streaming platform. We also show how our solution can automatically integrate large‐scale facilities with CI services for automated data processing.more » « less
-
Food retailers are stores that stock staple perishable foods such as vegetables, fruits, dairy, bread, cereal, meat, poultry, or fish on a continuous basis and sell these items to the public. Store types include supercenters, grocery stores, convenience stores, combination stores, dollar stores, butcher shops, bakeries, and other specialty food stores. This mission focused on understanding how critical infrastructure failures impact the function of food retailers and how the change in functioning changes food access. This research focused on five infrastructure systems -- transportation, electric power, communications, water, and the buildings utilized by food retailers to carry out their normal activities. The functioning of food retailers was broken down into three branches or domains that are critical for the operation of a food retailer. Specifically, food retailers need 1) people to help run the operation, 2) property or, more generally, a physical structure or structures, to house and conduct operations; 3) products or food stuffs to sell. This mission includes four social science collections related to the in-person survey of food retailers. These collections include the sample frame (a list of all food retailers within the study area with a chance of being randomly selected for the survey), the primary (raw) data collected from the Harris County and Southeast Texas surveys, and an example of a secondary (curated) dataset that focuses on critical infrastructure failures and changes in food retailer functioning.Food insecurity is a chronic problem in the United States that annually affects over 40 million people under normal conditions. This difficult reality can dramatically worsen after disasters. Such events can disrupt both the supply and demand sides of food systems, restricting food distribution and access precisely when households are in a heightened need for food assistance. Often, retailers and food banks must react quickly to meet local needs under difficult post-disaster circumstances. Residents of Harris County and Southeast Texas experienced this problem after Hurricane Harvey made landfall on the Texas Gulf Coast in August 2017. The primary data collected by this project relate specifically to the supply side. The data attempt to identify factors that impacted the ability of suppliers to help ensure access to food, with a focus on fresh food access. Factors included impacts to people, property and products due to hurricane-related damage to infrastructure. Two types of food suppliers were the foci of this research: food aid agencies and food retailers. The research team examined food aid agencies in Southeast Texas with data collection methods that included secondary data analysis, a focus group and an online survey. The second population studied was food retailers with in-person surveys with store managers. Food retailers were randomly sampled in three Texas counties: Jefferson, Orange, and Harris. The data collection methods resulted in 32 food aid agency online survey responses and 210 completed food retail in-person surveys. Data were collected five to eight months after the event, which helped to increase the reliability and validity of the data. The time-sensitive nature of post-disaster data requires research teams to quickly organize their efforts before entering the field. The purpose of this project archive is to share the primary data collected, document methods, and to help future research teams reduce the amount of time needed for project development and reporting. This archive does not contain Personally or Business Identifiable Information.more » « less
-
Access after disasters to resources such as food poses planning problems that affect millions of people each year. Understanding how disasters disrupt and alter food access during the initial steps of the recovery process provides new evidence to inform both food system and disaster planning. This research takes a supply-side focus and explores the results from a survey of food retailers after Hurricane Harvey in three Texas counties. The survey collected information on how the disaster affected a store’s property, people, and products and the length of time a store was closed, had reduced hours, and stopped selling fresh food items. We find that a focus only on store closures and property damage would underestimate the number of days residents have limited fresh food access by nearly two weeks. Further, stores in lower-income communities with chronic low-access to supermarkets (food deserts) were closed longer than other stores, potentially compounding pre-existing inequalities. We conclude that to plan for a more equitable food supply post-disaster, planners should embrace more dimensions of access, encourage retailer mitigation, and assess the types of retailers and their distribution within their communities. This mission includes social science collections related to models for predicting days to restoration of food access after a disaster. These collections include the programed workflow to replicated results for the journal article: Rosenheim, Nathanael, Maria Watson, John Cassels Connors, Mastura Safayet, Walter Gillis Peacock. “Food Access After Disasters: A Multidimensional View of Restoration After Hurricane Harvey”. Journal of the American Planning Association. doi.org/0.1080/01944363.2023.2284160Food insecurity is a chronic problem in the United States that annually affects over 40 million people under normal conditions. This difficult reality can dramatically worsen after disasters. Such events can disrupt both the supply and demand sides of food systems, restricting food distribution and access precisely when households are in a heightened need for food assistance. Often, retailers and food banks must react quickly to meet local needs under difficult post-disaster circumstances. Residents of Harris County and Southeast Texas experienced this problem after Hurricane Harvey made landfall on the Texas Gulf Coast in August 2017. The primary data collected by this project relate specifically to the supply side. The data attempt to identify factors that impacted the ability of suppliers to help ensure access to food, with a focus on fresh food access. Factors included impacts to people, property and products due to hurricane-related damage to infrastructure. Two types of food suppliers were the foci of this research: food aid agencies and food retailers. The research team examined food aid agencies in Southeast Texas with data collection methods that included secondary data analysis, a focus group and an online survey. The second population studied was food retailers with in-person surveys with store managers. Food retailers were randomly sampled in three Texas counties: Jefferson, Orange, and Harris. The data collection methods resulted in 32 food aid agency online survey responses and 210 completed food retail in-person surveys. Data were collected five to eight months after the event, which helped to increase the reliability and validity of the data. The time-sensitive nature of post-disaster data requires research teams to quickly organize their efforts before entering the field. The purpose of this project archive is to share the primary data collected, document methods, and to help future research teams reduce the amount of time needed for project development and reporting. This archive does not contain Personally or Business Identifiable Information.more » « less
An official website of the United States government

