skip to main content


Title: Real-Time Twitter Data Mining Approach to Infer User Perception Toward Active Mobility
This study evaluates the level of service of shared transportation facilities through mining geotagged data from social media and analyzing the perceptions of road users. An algorithm is developed adopting a text classification approach with contextual understanding to filter out relevant information related to users’ perceptions toward active mobility. Using a heuristic-based keyword matching approach produces about 75% tweets that are out of context, so that approach is deemed unsuitable for information extraction from Twitter. This study implements six different text classification models and compares the performance of these models for tweet classification. The model is applied to real-world data to filter out relevant information, and content analysis is performed to check the distribution of keywords within the filtered data. The text classification model “term frequency-inverse document frequency” vectorizer-based logistic regression model performed best at classifying the tweets. To select the best model, the performances of the models are compared based on precision, recall, F1 score (geometric mean of precision and recall), and accuracy metrics. The findings from the analysis show that the proposed method can help produce more relevant information on walking and biking facilities as well as safety concerns. By analyzing the sentiments of the filtered data, the existing condition of biking and walking facilities in the DC area can be inferred. This method can be a critical part of the decision support system to understand the qualitative level of service of existing transportation facilities.  more » « less
Award ID(s):
1917019
NSF-PAR ID:
10291507
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Transportation Research Record: Journal of the Transportation Research Board
ISSN:
0361-1981
Page Range / eLocation ID:
036119812110049
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media platforms are playing increasingly critical roles in disaster response and rescue operations. During emergencies, users can post rescue requests along with their addresses on social media, while volunteers can search for those messages and send help. However, efficiently leveraging social media in rescue operations remains challenging because of the lack of tools to identify rescue request messages on social media automatically and rapidly. Analyzing social media data, such as Twitter data, relies heavily on Natural Language Processing (NLP) algorithms to extract information from texts. The introduction of bidirectional transformers models, such as the Bidirectional Encoder Representations from Transformers (BERT) model, has significantly outperformed previous NLP models in numerous text analysis tasks, providing new opportunities to precisely understand and classify social media data for diverse applications. This study developed and compared ten VictimFinder models for identifying rescue request tweets, three based on milestone NLP algorithms and seven BERT-based. A total of 3191 manually labeled disaster-related tweets posted during 2017 Hurricane Harvey were used as the training and testing datasets. We evaluated the performance of each model by classification accuracy, computation cost, and model stability. Experiment results show that all BERT-based models have significantly increased the accuracy of categorizing rescue-related tweets. The best model for identifying rescue request tweets is a customized BERT-based model with a Convolutional Neural Network (CNN) classifier. Its F1-score is 0.919, which outperforms the baseline model by 10.6%. The developed models can promote social media use for rescue operations in future disaster events. 
    more » « less
  2. Information Retrieval (IR) plays a pivotal role indiverse Software Engineering (SE) tasks, e.g., bug localization and triaging, bug report routing, code retrieval, requirements analysis, etc. SE tasks operate on diverse types of documents including code, text, stack-traces, and structured, semi-structured and unstructured meta-data that often contain specialized vocabularies. As the performance of any IR-based tool critically depends on the underlying document types, and given the diversity of SE corpora, it is essential to understand which models work best for which types of SE documents and tasks.We empirically investigate the interaction between IR models and document types for two representative SE tasks (bug localization and relevant project search), carefully chosen as they require a diverse set of SE artifacts (mixtures of code and text),and confirm that the models’ performance varies significantly with mix of document types. Leveraging this insight, we propose a generalized framework, SRCH, to automatically select the most favorable IR model(s) for a given SE task. We evaluate SRCH w.r.t. these two tasks and confirm its effectiveness. Our preliminary user study shows that SRCH’s intelligent adaption of the IR model(s) to the task at hand not only improves precision and recall for SE tasks but may also improve users’ satisfaction. 
    more » « less
  3. Problem definition: Approximately 11,000 alleged illicit massage businesses (IMBs) exist across the United States hidden in plain sight among legitimate businesses. These illicit businesses frequently exploit workers, many of whom are victims of human trafficking, forced or coerced to provide commercial sex. Academic/practical relevance: Although IMB review boards like Rubmaps.ch can provide first-hand information to identify IMBs, these sites are likely to be closed by law enforcement. Open websites like Yelp.com provide more accessible and detailed information about a larger set of massage businesses. Reviews from these sites can be screened for risk factors of trafficking. Methodology: We develop a natural language processing approach to detect online customer reviews that indicate a massage business is likely engaged in human trafficking. We label data sets of Yelp reviews using knowledge of known IMBs. We develop a lexicon of key words/phrases related to human trafficking and commercial sex acts. We then build two classification models based on this lexicon. We also train two classification models using embeddings from the bidirectional encoder representations from transformers (BERT) model and the Doc2Vec model. Results: We evaluate the performance of these classification models and various ensemble models. The lexicon-based models achieve high precision, whereas the embedding-based models have relatively high recall. The ensemble models provide a compromise and achieve the best performance on the out-of-sample test. Our results verify the usefulness of ensemble methods for building robust models to detect risk factors of human trafficking in reviews on open websites like Yelp. Managerial implications: The proposed models can save countless hours in IMB investigations by automatically sorting through large quantities of data to flag potential illicit activity, eliminating the need for manual screening of these reviews by law enforcement and other stakeholders.

    Funding: This work was supported by the National Science Foundation [Grant 1936331].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1196 .

     
    more » « less
  4. Abstract Background Many snakes are low-energy predators that use crypsis to ambush their prey. Most of these species feed very infrequently, are sensitive to the presence of larger vertebrates, such as humans, and spend large portions of their lifetime hidden. This makes direct observation of feeding behaviour challenging, and previous methodologies developed for documenting predation behaviours of free-ranging snakes have critical limitations. Animal-borne accelerometers have been increasingly used by ecologists to quantify activity and moment-to-moment behaviour of free ranging animals, but their application in snakes has been limited to documenting basic behavioural states (e.g., active vs. non-active). High-frequency accelerometry can provide new insight into the behaviour of this important group of predators, and here we propose a new method to quantify key aspects of the feeding behaviour of three species of viperid snakes ( Crotalus spp.) and assess the transferability of classification models across those species. Results We used open-source software to create species-specific models that classified locomotion, stillness, predatory striking, and prey swallowing with high precision, accuracy, and recall. In addition, we identified a low cost, reliable, non-invasive attachment method for accelerometry devices to be placed anteriorly on snakes, as is likely necessary for accurately classifying distinct behaviours in these species. However, species-specific models had low transferability in our cross-species comparison. Conclusions Overall, our study demonstrates the strong potential for using accelerometry to document critical feeding behaviours in snakes that are difficult to observe directly. Furthermore, we provide an ‘end-to-end’ template for identifying important behaviours involved in the foraging ecology of viperids using high-frequency accelerometry. We highlight a method of attachment of accelerometers, a technique to simulate feeding events in captivity, and a model selection procedure using biologically relevant window sizes in an open-access software for analyzing acceleration data (AcceleRater). Although we were unable to obtain a generalized model across species, if more data are incorporated from snakes across different body sizes and different contexts (i.e., moving through natural habitat), general models could potentially be developed that have higher transferability. 
    more » « less
  5. null (Ed.)
    Bridge inspection is an important step in preserving and rehabilitating transportation infrastructure for extending their service lives. The advancement of mobile robotic technology allows the rapid collection of a large amount of inspection video data. However, the data are mainly the images of complex scenes, wherein a bridge of various structural elements mix with a cluttered background. Assisting bridge inspectors in extracting structural elements of bridges from the big complex video data, and sorting them out by classes, will prepare inspectors for the element-wise inspection to determine the condition of bridges. This article is motivated to develop an assistive intelligence model for segmenting multiclass bridge elements from the inspection videos captured by an aerial inspection platform. With a small initial training dataset labeled by inspectors, a Mask Region-based Convolutional Neural Network pre-trained on a large public dataset was transferred to the new task of multiclass bridge element segmentation. Besides, the temporal coherence analysis attempts to recover false negatives and identify the weakness that the neural network can learn to improve. Furthermore, a semi-supervised self-training method was developed to engage experienced inspectors in refining the network iteratively. Quantitative and qualitative results from evaluating the developed deep neural network demonstrate that the proposed method can utilize a small amount of time and guidance from experienced inspectors (3.58 h for labeling 66 images) to build the network of excellent performance (91.8% precision, 93.6% recall, and 92.7% f1-score). Importantly, the article illustrates an approach to leveraging the domain knowledge and experiences of bridge professionals into computational intelligence models to efficiently adapt the models to varied bridges in the National Bridge Inventory. 
    more » « less