NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models

Rashid, Muhammad Shihab; Meem, Jannat Ara; Dong, Yue; Hristidis, Vagelis (May 2024, Findings of the Association for Computational Linguistics (ACL))

Full Text Available
PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering

Meem, Jannat Ara; Rashid, Muhammad Shihab; Dong, Yue; Hristidis, Vagelis (May 2024, Findings of the Association for Computational Linguistics (ACL))

Full Text Available
Increase Merge Efficiency in LSM Trees Through Coordinated Partitioning of Sorted Runs

Mao, Qizhong; Hristidis, Vagelis (December 2023, IEEE International Conference on Big Data)

Full Text Available
NORMY: Non-Uniform History Modeling for Open Retrieval Conversational Question Answering

https://doi.org/10.1109/ICSC59802.2024.00022

Rashid, Muhammad Shihab; Meem, Jannat Ara; Hristidis, Vagelis (February 2024, IEEE)

Full Text Available
PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering

https://doi.org/10.18653/v1/2024.findings-acl.777

Meem, Jannat; Rashid, Muhammad; Dong, Yue; Hristidis, Vagelis (January 2024, Association for Computational Linguistics)

Full Text Available
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models

https://doi.org/10.18653/v1/2024.findings-acl.773

Rashid, Muhammad; Meem, Jannat; Dong, Yue; Hristidis, Vagelis (January 2024, Association for Computational Linguistics)

Full Text Available
Comparison of LSM indexing techniques for storing spatial data

https://doi.org/10.1186/s40537-023-00734-3

Mao, Qizhong; Qader, Mohiuddin Abdul; Hristidis, Vagelis (April 2023, Journal of Big Data)

Abstract In the pre-big data era, many traditional databases supported spatial queries via spatial indexes. However, modern applications are seeing a rapid increase of the volume and ingestion rate of spatial data. Log-structured Merge (LSM) tree is used by many big data systems as their storage structure in order to support write-intensive large-volume workloads, which are usually only optimized for single-dimensional data. Research has studied how spatial indexes can be supported on LSM systems, but focused mainly on the local index organization, that is, how data is organized inside a single LSM component. This paper studies various aspects of LSM spatial indexing, including spatial merge policies, which determine when and how spatial components are merged. Three stack-based and one leveled merge policies have been studied, which have been implemented on a common big data system Apache AsterixDB. The write and read performance on various workloads is evaluated, and our findings and recommendations are discussed. A key finding is that Leveled policies underperform other stack-based merge policies for most types of spatial workloads.
more » « less
Task-agnostic representation learning of multimodal twitter data for downstream applications

https://doi.org/10.1186/s40537-022-00570-x

Rivas, Ryan; Paul, Sudipta; Hristidis, Vagelis; Papalexakis, Evangelos E.; Roy-Chowdhury, Amit K. (December 2022, Journal of Big Data)

Abstract Twitter is a frequent target for machine learning research and applications. Many problems, such as sentiment analysis, image tagging, and location prediction have been studied on Twitter data. Much of the prior work that addresses these problems within the context of Twitter focuses on a subset of the types of data available, e.g. only text, or text and image. However, a tweet can have several additional components, such as the location and the author, that can also provide useful information for machine learning tasks. In this work, we explore the problem of jointly modeling several tweet components in a common embedding space via task-agnostic representation learning, which can then be used to tackle various machine learning applications. To address this problem, we propose a deep neural network framework that combines text, image, and graph representations to learn joint embeddings for 5 tweet components: body, hashtags, images, user, and location. In our experiments, we use a large dataset of tweets to learn a joint embedding model and use it in multiple tasks to evaluate its performance vs. state-of-the-art baselines specific to each task. Our results show that our proposed generic method has similar or superior performance to specialized application-specific approaches, including accuracy of 52.43% vs. 48.88% for location prediction and recall of up to 15.93% vs. 12.12% for hashtag recommendation.
more » « less
Full Text Available
Bi-directional Log-Structured Merge Tree

https://doi.org/10.1145/3538712.3538730

Zhang, Xin; Mao, Qizhong; Eldawy, Ahmed; Hristidis, Vagelis; Sun, Yihan (July 2022, International Conference on Scientific and Statistical Database Management)

Full Text Available
Incremental Partitioning for Efficient Spatial Data Analytics

https://doi.org/10.14778/3494124.349415

Vu, Tin; Eldawy, Ahmed; Hristidis, Vagelis; Tsotras, Vassilis (January 2022, PVLDB)

Full Text Available

« Prev Next »

Search for: All records