In the two decades following its initial release, SQLite has become the most widely deployed database engine in existence. Today, SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed "the SQLite for analytics", has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares. Furthermore, we are aware of no work that attempts to identify root causes for SQLite's performance behavior on OLAP workloads. In this paper, we discuss SQLite in the context of this changing workload landscape. We describe how SQLite evolved from its humble beginnings to the full-featured database engine it is today. We evaluate the performance of modern SQLite on three benchmarks, each representing a different flavor of in-process data management, including transactional, analytical, and blob processing. We delve into analytical data processing on SQLite, identifying key bottlenecks and weighing potential solutions. As a result of our optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we discuss the future of SQLite, envisioning how it will evolve to meet new demands and challenges.
more »
« less
Introducing a Query Acceleration Path for Analytics in SQLite3
As large scale data processing becomes an ever more prominent component of modern computing tasks, databases now exist as a fundamental necessity of most computational platforms. However, in many cases there exists a disparity between the specializations of database management systems and the needs of the applications that run on them. The distinction between transactional and analyt- ical workloads for databases has been well established, but not fully addressed within the space of the most widely used embedded data- base system, namely SQLite3. To overcome this shortcoming, we implement SQLite3/HE, an analytical database engine implemented as an alternative execution path for SQLite. Through the utilization of an additional, complementary storage layer, SQLite3/HE trans- forms SQLite into a hybrid database system, able to fully leverage the benefits of both row and columnar storage layouts. SQLite3/HE improves the performance of analytical queries in the 100x-1000x speedup range, at no cost to the existing transactional query perfor- mance. These results validate the decision to implement SQLite3/HE as an alternative execution path, enabling it to serve as a drop-in replacement for SQLite3 in existing systems.
more »
« less
- Award ID(s):
- 1835446
- PAR ID:
- 10390275
- Date Published:
- Journal Name:
- Conference on Innovative Data Systems Research (CIDR)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Applications use SQLite databases for storing data such as fitness/health information, contacts, text messages, calendar among others. Such information is sensitive and worth protecting. SQLite engine do not have built-in security to protect databases, rather, it relies on its environment such as the operating system to provide security for database content. While Android provides security mechanisms for SQLite databases, it has been shown to be inadequate. Proposed solutions are not able to protect sensitive database content when the OS is compromised. Also, some existing solutions fall short in protecting sensitive data if the SQLite database file is relocated to an environment that do not have any security restrictions. We propose a hardware isolation solution, leveraging ARM’s TrustZone to protect sensitive content of databases. We design and implement a prototype system on Hikey development board to demonstrate that TrustZone can be integrated with Android to protect SQLite data. Evaluation results shows our system is practical in and do not break the design patterns in Android application development.more » « less
-
Transaction isolation is conventionally achieved by restricting access to the physical items in a database. To maximize performance, isolation functionality is often packaged with recovery, I/O, and data access methods in a monolithic transactional storage manager. While this design has historically afforded high performance in online transaction processing systems, industry trends indicate a growing need for a new approach in which intertwined components of the transactional storage manager are disaggregated into modular services. This paper presents a new method to modularize the isolation component. Our work builds on predicate locking, an isolation mechanism that enables this modularization by locking logical rather than physical items in a database. Predicate locking is rarely used as the core isolation mechanism because of its high theoretical complexity and perceived overhead. However, we show that this overhead can be substantially reduced in practice by optimizing for common predicate structures. We present DIBS, a transaction scheduler that employs our predicate locking optimizations to guarantee isolation as a modular service. We evaluate the performance of DIBS as the sole isolation mechanism in a data processing system. In this setting, DIBS scales up to 10.5 million transactions per second on a TATP workload. We also explore how DIBS can be applied to existing database systems to increase transaction throughput. DIBS reduces per-transaction file system writes by 90% on TATP in SQLite, resulting in a 3X improvement in throughput. Finally, DIBS reduces row contention on YCSB in MySQL, providing serializable isolation with a 1.4X improvement in throughput.more » « less
-
The majority of sensitive and personal user data is stored in different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for most webstores, and SQLite stores personal data such as SMS messages on a phone or browser bookmarks. Each DBMS manages its own storage (within the operating system), thus databases require their own set of forensic tools. While database carving solutions have been built by multiple research groups, forensic investigators today still lack the tools necessary to analyze DBMS forensic artifacts. The unique nature of database storage and the resulting forensic artifacts require established standards for artifact storage and viewing mechanisms in order for such advanced analysis tools to be developed. In this paper, we present 1) a standard storage format, Database Forensic File Format (DB3F), for database forensic tools output that follows the guidelines established by other (file system) forensic tools, and 2) a view and search toolkit, Database Forensic Toolkit (DF-Toolkit), that enables the analysis of data stored in our database forensic format. Using our prototype implementation, we demonstrate that our toolkit follows the state-of-the-art design used by current forensic tools and offers easy-to-interpret database artifact search capabilities.more » « less
-
Serving deep learning models from relational databases brings significant benefits. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system management overhead can be significantly reduced. Second, in a relational database, data management along the storage hierarchy is fully integrated with query processing, and thus it can continue model serving even if the working set size exceeds the available memory. Applying model deduplication can greatly reduce the storage space, memory footprint, cache misses, and inference latency. However, existing data deduplication techniques are not applicable to the deep learning model serving applications in relational databases. They do not consider the impacts on model inference accuracy as well as the inconsistency between tensor blocks and database pages. This work proposed synergistic storage optimization techniques for duplication detection, page packing, and caching, to enhance database systems for model serving. Evaluation results show that our proposed techniques significantly improved the storage efficiency and the model inference latency, and outperformed existing deep learning frameworks in targeting scenarios.more » « less
An official website of the United States government

