Numerous important applications rely on detailed trajectory data. Yet, unfortunately, trajectory datasets are typically sparse with large spatial and temporal gaps between each two points, which is a major hurdle for their accuracy. This paper presents Kamel; a scalable trajectory imputation system that inserts additional realistic trajectory points, boosting the accuracy of trajectory applications. Kamel maps the trajectory imputation problem tofinding the missing wordproblem; a classical problem in the natural language processing (NLP) community. This allows employing the widely used BERT model for trajectory imputation. However, BERT, as is, does not lend itself to the special characteristics of trajectories. Hence, Kamel starts from BERT, but then adds spatial-awareness to its operations, adjusts trajectory data to be closer to the nature of language data, and adds multipoint imputation ability to it; all encapsulated in one system. Experimental results based on real datasets show that Kamel significantly outperforms its competitors and is applicable to city-scale trajectories, large gaps, and tight accuracy thresholds.
more »
« less
This content will become publicly available on June 30, 2025
Let's Speak Trajectories: A Vision to Use NLP Models for Trajectory Analysis Tasks
The availability of trajectory data combined with various real-life practical applications has sparked the interest of the research community to design a plethora of algorithms for various trajectory analysis techniques. However, there is an apparent lack of full-fledged systems that provide the infrastructure support for trajectory analysis techniques, which hinders the applicability of most of the designed algorithms. Inspired by the tremendous success of the Bidirectional Encoder Representations from Transformers (BERT) deep learning model in solving various Natural Language Processing tasks, our vision is to have a BERT-like system for trajectory analysis tasks. We envision that in a few years, we will have such system where no one needs to worry again about each specific trajectory analysis operation. Whether it is trajectory imputation, similarity, clustering, or whatever, it would be one system that researchers, developers, and practitioners can deploy to get high accuracy for their trajectory operations. Our vision stands on a solid ground that trajectories in a space are highly analogous to statements in a language. We outline the challenges and the road to our vision. Exploratory results confirm the promise and possibility of our vision.
more »
« less
- Award ID(s):
- 2203553
- PAR ID:
- 10538596
- Publisher / Repository:
- ACM TSAS
- Date Published:
- Journal Name:
- ACM Transactions on Spatial Algorithms and Systems
- Volume:
- 10
- Issue:
- 2
- ISSN:
- 2374-0353
- Page Range / eLocation ID:
- 1 to 25
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract DB-BERT is a database tuning tool that exploits information gained via natural language analysis of manuals and other relevant text documents. It uses text to identify database system parameters to tune as well as recommended parameter values. DB-BERT applies large, pre-trained language models (specifically, the BERT model) for text analysis. During an initial training phase, it fine-tunes model weights in order to translate natural language hints into recommended settings. At run time, DB-BERT learns to aggregate, adapt, and prioritize hints to achieve optimal performance for a specific database system and benchmark. Both phases are iterative and use reinforcement learning to guide the selection of tuning settings to evaluate (penalizing settings that the database system rejects while rewarding settings that improve performance). In our experiments, we leverage hundreds of text documents about database tuning as input for DB-BERT. We compare DB-BERT against various baselines, considering different benchmarks (TPC-C and TPC-H), metrics (throughput and run time), as well as database systems (PostgreSQL and MySQL). The experiments demonstrate clearly that DB-BERT benefits from combining general information about database tuning, mined from text documents, with scenario-specific insights, gained via trial runs. The full source code of DB-BERT is available online athttps://itrummer.github.io/dbbert/.more » « less
-
Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straightforward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). More specifically, this is due to the fact that pre-trained models don’t have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XLNet to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine-tuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves human-level multimodal sentiment analysis performance for the first time in the NLP community.more » « less
-
Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We investigate how to encode table content considering the table structure and input length limit of BERT. We also propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT. In experiments on public datasets, we show that our best approach can outperform the previous state-of-the-art method and BERT baselines with a large margin under different evaluation metrics.more » « less
-
Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks. However, existing methods such as BERT model a single document, and do not capture dependencies or knowledge that span across documents. In this work, we propose LinkBERT, an LM pretraining method that leverages links between documents, e.g., hyperlinks. Given a text corpus, we view it as a graph of documents and create LM inputs by placing linked documents in the same context. We then pretrain the LM with two joint self-supervised objectives: masked language modeling and our new proposal, document relation prediction. We show that LinkBERT outperforms BERT on various downstream tasks across two domains: the general domain (pretrained on Wikipedia with hyperlinks) and biomedical domain (pretrained on PubMed with citation links). LinkBERT is especially effective for multi-hop reasoning and few-shot QA (+5% absolute improvement on HotpotQA and TriviaQA), and our biomedical LinkBERT sets new states of the art on various BioNLP tasks (+7% on BioASQ and USMLE).more » « less