The unprecedented rise of social media platforms, combined with location-aware technologies, has led to continuously producing a significant amount of geo-social data that flows as a user-generated data stream. This data has been exploited in several important use cases in various application domains. This article supports geo-social personalized queries in streaming data environments. We define temporal geo-social queries that provide users with real-time personalized answers based on their social graph. The new queries allow incorporating keyword search to get personalized results that are relevant to certain topics. To efficiently support these queries, we propose an indexing framework that provides lightweight and effective real-time indexing to digest geo-social data in real time. The framework distinguishes highly dynamic data from relatively stable data and uses appropriate data structures and a storage tier for each. Based on this framework, we propose a novel geo-social index and adopt two baseline indexes to support the addressed queries. The query processor then employs different types of pruning to efficiently access the index content and provide a real-time query response. The extensive experimental evaluation based on real datasets has shown the superiority of our proposed techniques to index real-time data and provide low-latency queries compared to existing competitors. 
                        more » 
                        « less   
                    
                            
                            Up-and-Coming or Down-and-Out? Social Media Popularity as an Indicator of Neighborhood Change
                        
                    
    
            By quantifying Twitter activity and sentiment for each of 274 neighborhood areas in New York City, this study introduces the Neighborhood Popularity Index and correlates changes in the index with real estate prices, a common measure of neighborhood change. Results show that social media provide both a near-real-time indicator of shifting attitudes toward neighborhoods and an early warning measure of future changes in neighborhood composition and demand. Although social media data provide an important complement to traditional data sources, the use of social media for neighborhood studies raises concerns regarding data accessibility and equity issues in data representativeness and bias. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1653772
- PAR ID:
- 10323407
- Date Published:
- Journal Name:
- Journal of Planning Education and Research
- ISSN:
- 0739-456X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Research shows that certain external factors can affect the mental health of many people in a community. Moreover, the importance of mental health has significantly increased in recent years due to the COVID-19 pandemic. Many people communicate and express their emotions through social media platforms, which provide researchers with opportunities to examine insights into their opinions and mental state. While social sensing studies using social media data have flourished in the last decade, many studies using social media data to detect and predict mental health status have focused on the individual level. In this study, we aim to generate a social sensing index for mental health to monitor emotional well-being, which is closely related to mental health, and to identify daily trends in negative emotions at the city level. We conduct sentiment analysis on Twitter data and compute entropy of the degree of sentiment change to develop the index. We observe sentiment trends fluctuate significantly in response to unusual events. It is found that the social sensing index for mental health reflects both city-wide and local events that trigger negative emotions, as well as areas where negative emotions persist. The study contributes to the growing body of research that uses social media data to examine mental health at a city-level. We focus on mental health at the city-level rather than individual, which provides a broader perspective on the mental health of a population. Social sensing index for mental health allows public health professionals to monitor and identify persistent negative sentiments and potential areas where mental health issues may emerge.more » « less
- 
            null (Ed.)Research and experimentation using big data sets, specifically large sets of electronic health records (EHR) and social media data, is demonstrating the potential to understand the spread of diseases and a variety of other issues. Applications of advanced algorithms, machine learning, and artificial intelligence indicate a potential for rapidly advancing improvements in public health. For example, several reports indicate that social media data can be used to predict disease outbreak and spread (Brown, 2015). Since real-world EHR data has complicated security and privacy issues preventing it from being widely used by researchers, there is a real need to synthetically generate EHR data that is realistic and representative. Current EHR generators, such as Syntheaä (Walonoski et al., 2018) only simulate and generate pure medical-related data. However, adding patients’ social media data with their simulated EHR data would make combined data more comprehensive and realistic for healthcare research. This paper presents a patients’ social media data generator that extends an EHR data generator. By adding coherent social media data to EHR data, a variety of issues can be examined for emerging interests, such as where a contagious patient may have been and others with whom they may have been in contact. Social media data, specifically Twitter data, is generated with phrases indicating the onset of symptoms corresponding to the synthetically generated EHR reports of simulated patients. This enables creation of an open data set that is scalable up to a big-data size, and is not subject to the security, privacy concerns, and restrictions of real healthcare data sets. This capability is important to the modeling and simulation community, such as scientists and epidemiologists who are developing algorithms to analyze the spread of diseases. It enables testing a variety of analytics without revealing real-world private patient information.more » « less
- 
            In this modern era, infectious diseases, such as H1N1, SARS, and Ebola, are spreading much faster than any time in history. Efficient approaches are therefore desired to monitor and track the diffusion of these deadly epidemics. Traditional computational epidemiology models are able to capture the disease spreading trends through contact network, however, one unable to provide timely updates via real-world data. In contrast, techniques focusing on emerging social media platforms can collect and monitor real-time disease data, but do not provide an understanding of the underlying dynamics of ailment propagation. To achieve efficient and accurate real-time disease prediction, the framework proposed in this paper combines the strength of social media mining and computational epidemiology. Specifically, individual health status is first learned from user's online posts through Bayesian inference, disease parameters are then extracted for the computational models at population-level, and the outputs of computational epidemiology model are inversely fed into social media data based models for further performance improvement. In various experiments, our proposed model outperforms current disease forecasting approaches with better accuracy and more stability.more » « less
- 
            Abstract We introduce theReverseSpatial Top-kKeyword (RSK)query, which is defined as:given a query term q, an integer k and a neighborhood size find all the neighborhoods of that size where q is in the top-k most frequent terms among the social posts in those neighborhoods. An obvious approach would be to partition the dataset with a uniform grid structure of a given cell size and identify the cells where this term is in the top-k most frequent keywords. However, this answer would be incomplete since it only checks for neighborhoods that are perfectly aligned with the grid. Furthermore, for every neighborhood (square) that is an answer, we can define infinitely more result neighborhoods by minimally shifting the square without including more posts in it. To address that, we need to identify contiguous regions where any point in the region can be the center of a neighborhood that satisfies the query. We propose an algorithm to efficiently answer an RSK query using an index structure consisting of a uniform grid augmented by materialized lists of term frequencies. We apply various optimizations that drastically improve query latency against baseline approaches. We also provide a theoretical model to choose the optimal cell size for the index to minimize query latency. We further examine a restricted version of the problem (RSKR) that limits the scope of the answer and propose efficientapproximatealgorithms. Finally, we examine how parallelism can improve performance by balancing the workload using a smartload slicingtechnique. Extensive experimental performance evaluation of the proposed methods using real Twitter datasets and crime report datasets, shows the efficiency of our optimizations and the accuracy of the proposed theoretical model.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    