skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SQLite: past, present, and future
In the two decades following its initial release, SQLite has become the most widely deployed database engine in existence. Today, SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed "the SQLite for analytics", has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares. Furthermore, we are aware of no work that attempts to identify root causes for SQLite's performance behavior on OLAP workloads. In this paper, we discuss SQLite in the context of this changing workload landscape. We describe how SQLite evolved from its humble beginnings to the full-featured database engine it is today. We evaluate the performance of modern SQLite on three benchmarks, each representing a different flavor of in-process data management, including transactional, analytical, and blob processing. We delve into analytical data processing on SQLite, identifying key bottlenecks and weighing potential solutions. As a result of our optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.  more » « less
Award ID(s):
1835446
PAR ID:
10390276
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
15
Issue:
12
ISSN:
2150-8097
Page Range / eLocation ID:
3535 to 3547
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As large scale data processing becomes an ever more prominent component of modern computing tasks, databases now exist as a fundamental necessity of most computational platforms. However, in many cases there exists a disparity between the specializations of database management systems and the needs of the applications that run on them. The distinction between transactional and analyt- ical workloads for databases has been well established, but not fully addressed within the space of the most widely used embedded data- base system, namely SQLite3. To overcome this shortcoming, we implement SQLite3/HE, an analytical database engine implemented as an alternative execution path for SQLite. Through the utilization of an additional, complementary storage layer, SQLite3/HE trans- forms SQLite into a hybrid database system, able to fully leverage the benefits of both row and columnar storage layouts. SQLite3/HE improves the performance of analytical queries in the 100x-1000x speedup range, at no cost to the existing transactional query perfor- mance. These results validate the decision to implement SQLite3/HE as an alternative execution path, enabling it to serve as a drop-in replacement for SQLite3 in existing systems. 
    more » « less
  2. The proliferation of modern data processing tools has given rise to open-source columnar data formats. These formats help organizations avoid repeated conversion of data to a new format for each application. However, these formats are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. As a result, DBMSs often fail to take advantage of full network bandwidth when transferring data. We aim to reduce or even eliminate this overhead by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. We implemented our storage engine based on the Apache Arrow format and integrated it into the NoisePage DBMS to evaluate our work. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods. 
    more » « less
  3. Applications use SQLite databases for storing data such as fitness/health information, contacts, text messages, calendar among others. Such information is sensitive and worth protecting. SQLite engine do not have built-in security to protect databases, rather, it relies on its environment such as the operating system to provide security for database content. While Android provides security mechanisms for SQLite databases, it has been shown to be inadequate. Proposed solutions are not able to protect sensitive database content when the OS is compromised. Also, some existing solutions fall short in protecting sensitive data if the SQLite database file is relocated to an environment that do not have any security restrictions. We propose a hardware isolation solution, leveraging ARM’s TrustZone to protect sensitive content of databases. We design and implement a prototype system on Hikey development board to demonstrate that TrustZone can be integrated with Android to protect SQLite data. Evaluation results shows our system is practical in and do not break the design patterns in Android application development. 
    more » « less
  4. We introduce indexed streams, a formal operational model and intermediate representation that describes the fused execution of a contraction language that encompasses both sparse tensor algebra and relational algebra. We prove that the indexed stream model is correct with respect to a functional semantics. We also develop a compiler for contraction expressions that uses indexed streams as an intermediate representation. The compiler is only 540 lines of code, but we show that its performance can match both the TACO compiler for sparse tensor algebra and the SQLite and DuckDB query processing libraries for relational algebra. 
    more » « less
  5. The majority of sensitive and personal user data is stored in different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for most webstores, and SQLite stores personal data such as SMS messages on a phone or browser bookmarks. Each DBMS manages its own storage (within the operating system), thus databases require their own set of forensic tools. While database carving solutions have been built by multiple research groups, forensic investigators today still lack the tools necessary to analyze DBMS forensic artifacts. The unique nature of database storage and the resulting forensic artifacts require established standards for artifact storage and viewing mechanisms in order for such advanced analysis tools to be developed. In this paper, we present 1) a standard storage format, Database Forensic File Format (DB3F), for database forensic tools output that follows the guidelines established by other (file system) forensic tools, and 2) a view and search toolkit, Database Forensic Toolkit (DF-Toolkit), that enables the analysis of data stored in our database forensic format. Using our prototype implementation, we demonstrate that our toolkit follows the state-of-the-art design used by current forensic tools and offers easy-to-interpret database artifact search capabilities. 
    more » « less