skip to main content

Search for: All records

Creators/Authors contains: "Bein, Doina"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Waldemar Karwowski (Ed.)

    Given the importance of online retailers in the market, forecasting sales has become one of the essential market strategic considerations. Modern Machine Learning tools help in forecasting sales for many online retailers. These models need refinement and automatization to increase efficiency and productivity. Suppose an automated function can be applied to capture historical data and execute forecasting models automatically; it will reduce the time and human resources for the company to manage the forecasting system. An automated data processing and forecasting model system offers the marketing department more flexible market sales forecasting. Proposed here is an automated weekly periodic sales forecasting system that integrates: the Extract-Transform-Load (ETL) data processing process and machine learning forecasting model and sends the outcomes as messages. For this study, the data is obtained for an online women's shoe retailer from three data sources (AWS Redshift, AWS S3, and Google Sheets). The system collects the sales data for 120 weeks, then passes it to an ETL process, and runs the machine learning forecasting model to forecast the sales of the retailer's products in the next week. The machine learning model is built using the random forest regressor. The top 25 products with the most popular forecasting results are selected and sent to the owner’s email for further market evaluation. The system is built as a Directed Acyclic Graph (DAG) using Python script on Apache Airflow. To facilitate the management of the system, the authors set up Apache Airflow in a Docker container. The whole process does not require human monitoring and management. If the project is executed on Airflow, it will notify the project owner to inspect the cause of any potential error.

    more » « less
  2. Waldemar Karwowski (Ed.)

    Online advertising is a billion-dollar industry, with many companies choosing online websites and various social media platforms to promote their products. The primary concerns in online marketing are to optimize the performance of a digital advert, reach the right audience, and maximize revenue, which can be achieved by predicting the accurate probability of a given ad being clicked, called the Click-Through Rate. It is assumed that a high CTR depicts the ad reaching its target customers while a low CTR shows that it is not reaching its desired audience, which may constitute a low return on investment (ROI). We propose a data-science-driven approach to help businesses improve their internet advertising campaigns which involves building various machine learning models to accurately predict the CTR and selecting the best-performing model. To build our classification models, we use the Avazu dataset, publicly available on the Kaggle website. Having insights on this metric will allow companies to compete in real-time bidding, gauge how relevant their keywords are in search engine querying, and mitigate an unexpected loss in spending budget. The authors in this paper strive to use modern machine learning tools and techniques to improve the performance of predicting Click-Through Rate (CTR) in online advertisements and bring change to the industry.

    more » « less
  3. United Nations recognized access to safe drinking water as a human right, yet many countries in the developing world lack access to potable water. Recurrent incidences of water-borne illnesses have a devastating effect on the morale and personal well-being of many people living in developing countries, contrasting the achievement of the UN’s objective. Qualitative and semi-quantitative approaches used for risk assessment are often ineffective, time-consuming, and do not discern the risk due to ingestion of unsafe drinking water at the global scale. This research utilizes a global dataset of drinking water facilities to evaluate the risks using a clustering approach. Extensive data analysis involving predetermined risk thresholds, the exceedance of which indicates the potential adverse risk. These risk-thresholds are based on the JMP Service Ladder, which effectively utilizes density-based spatial clustering of applications. Risk analysis of 132 datasets was conducted to designate the risk categories ranging from low, medium, and high-risk. Of the dataset analyzed, 90 areas were designated as a low-risk category while 42 were medium-risk. Overall, the clustering approach is an excellent tool to analyze a large dataset for risk assessment which will help the potential stakeholder, including the water utility manager, to assess the potential risk due to declining water quality quickly. Additionally, the clustering approach can be further harnessed for better data visualization, long-term performance evaluation of water utility, and real-time drinking water quality monitoring. 
    more » « less
  4. Despite national efforts in increasing representation of minority students in STEM disciplines, disparities prevail. Hispanics account for 17.4% of the U.S. population, and nearly 20% of the youth population (21 years and below) in the U.S. is Hispanic, yet they account for just 7% of the STEM workforce. To tackle these challenges, the National Science Foundation (NSF) has granted a 5-year project – ASSURE-US, that seeks to improve undergraduate education in Engineering and Computer Science (ECS) at California State University, Fullerton. The project seeks to advance student success during the first two years of college for ECS students. Towards that goal, the project incorporates a very diverse set of approaches, such as socio-cultural and academic interventions. Multiple strategies including developing early intervention strategies in gateway STEM courses, creating a nurturing faculty-student interaction and collaborative learning environment, providing relevant, contextual-based learning experiences, integrating project-based learning with engineering design in lower-division courses, exposing lower-division students to research to sustain student interests, and helping students develop career-readiness skills. The project also seeks to develop an understanding of the personal, social, cognitive, and contextual factors contributing to student persistence in STEM learning that can be used by STEM faculty to improve their pedagogical and student-interaction approaches. This paper summarizes the major approaches the ASSURE-US project plans to implement to reduce the achievement gap and motivate ECS students to remain in the program. Preliminary findings from the first-year implementation of the project including pre- and post- data were collected and analyzed from about one hundred freshmen and sophomore ECS students regarding their academic experience in lower-division classes and their feedback for various social support events held by the ASSURE-US project during the academic year 2018-19. The preliminary results obtained during the first year of ASSURE-US project suggests that among the different ASSURE-US activities implemented in the first year, both the informal faculty-student interactions and summer research experiences helped students commit more to their major during their lower-division years. The pre-post surveys also show improvements in terms of awareness among ASSURE-US students for obtaining academic support services, understanding career options and pathways, and obtaining personal counseling services. 
    more » « less