Small. Fast. Reliable.
Choose any three.
SQLite B-Tree Module
Table Of Contents
id>
1.1 Scope and Purpose
This document provides a description of the functionality of and public
interface offered by the SQLite b-tree module. It also, to a certain extent,
describes the algorithm and implementation techniques used internally by
the module.
-
To make it easier to maintain, test and improve this critical
sub-system of the SQLite software library. -
To facilitate development of compatible backend modules that can
be used with other SQLite sub-systems, either for experimental or
production purposes.
It is important to note that, even given the second bullet point above,
the interfaces and sub-systems described in this document are not stable.
They may be changed in any way with each new SQLite release. Any
external software development that uses these interfaces must be prepared
to adapt to interface refactoring without notice.
1.2 Document and Requirements Organization
Change the following so that those requirements that describe the API
are “low-level” requirements.
Requirement ids | Contents |
---|---|
H50**** | Requirement statements specifying the functionality required of the B-Tree module. |
H51**** | Requirement statements specifying the API provided by the B-Tree module. |
L****** | Requirement statements specifying some details of the internal workings of the B-Tree module. |
1.3 Glossary
The SQLite B-Tree module, the software module described by this document,
is designed to query and modify a database stored using the database image
format described in [1]. Database images may
exist only in volatile main-memory (in-memory databases), or may be stored
persistently within the file-system (also described in
[1]). Or, a database image may be stored primarily
in main-memory with the file-system used as secondary storage if the
database image grows too large. Database images stored only in main-memory,
and those stored primarily in main-memory with the file-system used only to
provide secondary storage space are known collectively as temporary
databases. Database images stored persistently in the file-system are termed
persistent databases.
This module implements an in-memory page cache to manage database image
content. The size of the pages managed by the cache are the same as the
page-size of the database image. When operating on a persistent database,
the cache operates as a read-through, lazy-write cache. When committing a
database transaction, the user explicitly directs the cache to flush all
dirty pages through to persistent storage. A single in-memory page cache
used to access the content of a persistent database may support multiple
logical client connections. Some brief explanation of what
this means. And maybe a pointer to the “Multi-User Database Requirements”
section.
When operating on a temporary database, there may only be one client for
each page cache. Depending on the SQLite configuration, either the database
or journal file, or both, may be omitted from the system.
Figure 1 – Role of Page Cache
Figure 1 depicts…
Roughly what is encapsulated by the module.
2.1 Functional Requirements
This section contains requirements describing the functionality required
from the B-Tree module.
Figure out where integrity-check goes.
2.1.1 Opening and Closing Connections
The B-Tree module provides an interface to open new b-tree database connections.
The B-Tree module shall provide an interface to open a connection
to either a named persistent database file, or an anonymous temporary
database. (C: * H51001)
When opening a persistent database, the B-Tree module shall allow the user
to specify that the connection be opened for read-only access.
When opening a persistent database, the B-Tree module shall allow the user
to specify that the connection only be opened if the specified file exists.
If SQLite is configured to run in shared-cache mode, and a connection is opened
to a persistent database file for which there exists already a page-cache within
the current processes address space, then the connection opened shall be a
connection to the existing page-cache.
If a new B-Tree database connection is opened and requirement H50040 does not apply,
then a new page-cache shall be created within the processes address space. The
opened connection shall be a connection to the new page-cache.
The B-Tree module also provides an interface to close existing b-tree database
connections.
The B-Tree module shall provide an interface to close a B-Tree database connection.
If a B-Tree database connection is closed and this causes the associated
page-cache to have zero connections to it, then the page-cache shall be closed
and all associated resources released.
2.1.2 New Database Image Configuration
The following requirements describe database configuration options that
are only applicable to new database images. For the purposes of the
following requirements, a “new database image” is defined as one that is
zero pages in size.
The B-Tree module shall provide an interface to configure the page-size of a
new database image.
The B-Tree module shall provide an interface to configure whether or not a new
database image is auto-vacuum capable.
2.1.3 Transaction and Savepoint Functions
This needs a lot of work…
All read and write operations performed on a database image via the
B-Tree module interfaces occur within the context of a read or write
transaction. Something about the ACID nature of
transactions and how this applies to read and write transactions)
The B-Tree module shall provide an interface to open (start) a read-only transaction.
The B-Tree module shall provide an interface to close (finish) a read-only transaction.
Read/write:
The B-Tree module shall provide an interface to open a read/write transaction
or to upgrade from a read-only transaction to a read/write transaction.
The B-Tree module shall provide an interface to commit a read/write transaction.
The B-Tree module shall provide an interface to rollback a read/write transaction.
Multi-file transaction support.
Transaction state query:
The B-Tree module shall provide an interface to query a B-Tree database
connection to determine if there is an open transaction, and if so if the open
transaction is read-only or read/write.
Savepoints:
Define “savepoint transactions” and fix the following requirements.
The B-Tree module shall provide an interface to open savepoint transactions.
The B-Tree module shall provide an interface to commit savepoint transactions.
The B-Tree module shall provide an interface to rollback savepoint transactions.
2.1.4 Reading From the Database Image
The B-Tree module allows the user to read a subset of the fields from the
database image header. Each such field is stored in the header as a 4-byte
unsigned big-endian integer. A complete description of each field and its
interpretation may be found in [1].
The B-Tree module shall provide an interface to read the value of any of the
4-byte unsigned big-endian integer fields beginning at byte offset 36 of the
database image. (C: H51015 H51016 H51017)
In other words, the database image header fields that may be read via
this module are:
- The number of free pages in the database image,
- The database image schema version (schema cookie).
- The database image schema layer file-format.
- The default page-cache size.
- The “auto-vacuum last root-page” field.
- The database image text-encoding field.
- The database image user-cookie value.
- The database image incremental-vacuum flag.
With the exception of the database image header fields described above,
all data is read from the database image using B-Tree cursors. A B-Tree
cursor is a control structure for traversing the contents of a single
table or index b-tree structure within a database image. As well as
“forward” and “back” operations, a B-Tree cursor supports fast seeking to
a table entry identified by key value, or to the first or last entry in
the table.
When a B-Tree cursor is created, the specific table or index b-tree that
it is used to traverse is identified by the database image page number
of its root page. Since the root-page of the schema table is always page
1, and the contents of the schema table includes the root page numbers of all
other index and table b-tree structures in the database image, it is
possible for the application to determine the set of valid root-page
numbers by first traversing the schema table.
The B-Tree module shall provide an interface to open a B-Tree cursor on any table or
index b-tree within the database image, given its root page number.
The B-Tree module shall provide an interface to close a B-Tree cursor.
The B-Tree module shall provide an interface to move an open B-Tree cursor to
the entry associated with the largest key in the open b-tree structure.
The B-Tree module shall provide an interface to move an open B-Tree cursor to
the entry associated with the smallest key in the open b-tree structure.
The B-Tree module shall provide an interface to move an open B-Tree cursor that
currently points at a valid b-tree entry to the next entry in the b-tree
structure, sorted in order of key value, if any.
The B-Tree module shall provide an interface to move an open B-Tree cursor that
currently points at a valid b-tree entry to the previous entry in the b-tree
structure, sorted in order of key value, if any.
The B-Tree module shall provide an interface to retrieve the key value
associated with the b-tree structure entry that a B-Tree cursor is pointing to,
if any.
The B-Tree module shall provide an interface to retrieve the blob of data (the
database record) associated with the b-tree structure entry that a B-Tree
cursor open on a table b-tree is pointing to, if any.
The B-Tree module shall provide an interface to return the number of entries
currently stored in the b-tree structure that a B-Tree cursor is open on.
As well as traversing a b-tree structure using the operations enumerated
by the above requirements, it is also possible to use a cursor to search
a b-tree structure for a specified key value. If the key value can be
found, the cursor is left pointing at the entry with the specified key
value. Otherwise, the cursor is left pointing at either the entry with the
largest key that is smaller than the specified key, or to the entry with
the smallest key that is larger than the specified key. For table b-tree
structures, where the key values are 64-bit integers, the definition of
smaller, larger and equal to is straightforward. For index b-tree
structures, where the key values are database records, the manner in
which key values must be compared is more complicated. Refer to
[1] for a full explanation.
There is a specific section in [1] devoted to
record sort order in index b-tree structures. There needs to be some way to
point to it. Or, better, to the requirement or range of requirements.
Maybe a system that automatically links text like H30100 to the
corresponding requirement. Within a document if it can find it, or a
summary page (hlreq.html for example).
Given a key value, the B-Tree module shall provide an interface to move a
B-Tree cursor open on a b-tree structure to the B-Tree entry with the matching
key value, if such an entry exists.
If the interface required by H50119 is used to search for a key value that is
not present in the b-tree structure and the b-tree is not empty, the cursor shall
be moved to an existing entry that would be adjacent to a hypothetical
entry with the specified key value.
The interface required by H50119 shall provide an indication to the caller as
to whether the cursor is left pointing at an entry with a key value that is
smaller, larger or equal to the requested value, or if it is pointing to no
entry at all (because the b-tree structure is empty).
Does it depend on the structure of the tree whether the cursor is left
pointing to a smaller or larger entry after a failed search? Or is it
possible to determine which it will be based only on the set of keys
stored in the tree?
As well as the standard search operation described by the above
requirements, cursors open on index b-tree structures are required to
support several variants, as follows:
- Ignore rowid search mode. The final value in a database
record used as an index-btree key is always an integer “rowid”
field. A search in this mode proceeds as if each key in the b-tree
was missing this field. - Increment key mode.
- Prefix match mode.
- Prefix search mode.
Finish the bullet points above and add HLR for each search mode.
More than one cursor can be open on a single b-tree structure at one time.
It is also possible for a write-cursor to modify the contents of a b-tree
structure while other cursors are open on it. The b-tree module does not
include any type of row-locking mechanism. It is possible for a write-cursor
to be used to delete an entry from a b-tree structure even if there are
one or more other cursors currently pointing to the entry being deleted.
Requirements to do with how the above is handled. Traceability to
sqlite3BtreeCursorHasMoved is required.
2.1.5 Writing to the Database Image
The B-Tree module allows the user to write values to a subset of the
fields from the database image header. The set of writable fields is
the same as the set of fields enumerated in section
2.1.4 that the B-Tree module is required to
provide read access to by requirement H50109.
The B-Tree module shall provide an interface to write a value to any of the
4-byte unsigned big-endian integer fields beginning at byte offset 36 of the
database image.
The B-Tree module also supports operations to create new b-tree
structures within the database image. Existing b-tree structures may be
deleted from the database image entirely, or their entire contents may be
deleted, leaving an empty b-tree structure.
The B-Tree module shall provide an interface to create a new index or table
b-tree structures within the database image. The interface shall automatically
assign a root-page to the new b-tree structure.
The B-Tree module shall provide an interface to remove an existing index or
table b-tree structure from the database image, given the root page number of
the b-tree to remove.
The B-Tree module shall provide an interface to remove all entries from (delete
the contents of) an index or table b-tree, given the root page number of the
b-tree to empty.
As one would expect, the B-Tree module also provides an interface to
insert and delete entries from b-tree structures. These operations are
performed using a B-Tree write cursor, a special type of B-Tree cursor
(see section 2.1.4).
When opening a B-Tree cursor using the interface required by H50110, it shall
be possible to specify that the new cursor be a write cursor, or an ordinary
read-only cursor.
The B-Tree module shall provide an interface that allows the user to delete the
b-tree entry that a write cursor points to, if any. (C: L50013)
The B-Tree module shall provide an interface to insert new entries into a table
or index B-Tree, given a write cursor open on the table or index b-tree the new
entry is to be inserted into. (C: L50001 L50002 L50003 L50004 L50012)
Incremental vacuum step.
2.1.6 Page-Cache Configuration Requirements
A page-cache has a number of operational parameters that may be configured
at run-time via an open b-tree database connection. Note that even though the
interfaces provided by this module allow these parameters to be set via a
b-tree database connection, they are properties of the page-cache, not
the b-tree database connection. In situations where more than one b-tree
database connection is connected to a single page-cache, writes made via
one b-tree database connection may overwrite the values set by another.
The following table summarizes the available configuration parameters.
Parameter | Description | Requirements |
---|---|---|
Locking-mode | This! | H50138, H50139, H50140 |
Journal-mode | This! | H50141, H50142, H50143, H50144, H50145, H50146 |
Journal-file size limit | The journal-file size limit parameter may be set to any integer value within the range of a 64-bit signed integer. Any negative values is interpreted as “no limit”. Otherwise, if the journal-file size limit is set to zero or a positive number, it represents an upper limit on the size of the journal file in bytes. If the application executes a database write operation that would normally cause the journal file to grow larger than this configured limit, the operation fails and an error is returned to the user. The default value of this parameter is -1 (no limit). |
H50147, H50148, H50149 |
Database-file size limit | The database-image size limit parameter may be set to any integer value greater than zero within the range of a 32-bit signed integer. The configured value represents an upper limit on the size of the database image in pages. If the application executes a database write operation that would normally cause the database image to grow larger than this configured limit, the operation fails and an error is returned to the user. |
H50150, H50151, H50152 |
Cache size | The cache-size parameter may be set to any integer value. How it affects operation depends on the specific P-Cache implementation used by the page-cache. Refer to details for the behaviour of the built-in default P-Cache. |
H50153 |
Safety level | The safety-level parameter may be set to “none”, “normal” or “full”. Where will the effect of this defined/required? |
H50154, H50155 |
The B-Tree module shall provide an interface allowing the application to set
the locking-mode of a page-cache to either “normal” or “exclusive”, given an
open b-tree database connection to that page-cache.
If the locking-mode of a page-cache is set to “normal” when a read/write
or read-only transaction is ended, any locks held on the database file-system
representation by the page-cache shall be relinquished.
If the locking-mode of a page-cache is set to “exclusive” when a read/write
or read-only transaction is ended, any locks held on the database file-system
representation by the page-cache shall be retained.
And if a read/write transaction is downgraded to a read-only transaction?
This scenario should also be dealt with in section 2.1.3.
The B-Tree module shall provide an interface allowing the application to set
the journal-mode of a page-cache to one of “off”, “memory”, “delete”,
“persist”, or “truncate”, given an open b-tree database connection to that
page-cache.
If the journal-mode of a page-cache is set to “off” when a read/write
transaction is opened, then the transaction shall use no journal file.
If the journal-mode of a page-cache is set to “memory” when a read/write
transaction is opened, then instead of using the journal file located in the
file-system, journal-file data shall be stored in main-memory.
If the journal-mode of a page-cache is set to “delete” when a read/write
transaction is opened, then any journal file used by the transaction shall
be deleted at the conclusion of the transaction.
If the journal-mode of a page-cache is set to “truncate” when a read/write
transaction is opened, then any journal file used by the transaction shall
be truncated to zero bytes in size at the conclusion of the transaction.
If the journal-mode of a page-cache is set to “persist” when a read/write
transaction is opened, then any journal file used by the transaction shall
remain in the file-system at the conclusion of the transaction.
The difference in functionality provided by “off”, “memory” and the 3
modes that use a real journal file should also feature in
2.1.3.
The B-Tree module shall provide an interface to set the value of the
journal-file size limit configuration parameter of a page-cache, given
an open b-tree database connection to that page-cache.
The default value assigned to the journal-file size limit configuration of a
page-cache shall be -1.
If the journal-file size limit parameter is set to a non-negative value, and
the user executes a write operation that would otherwise require the journal
file to be extended to a size greater than the configured value in bytes, then
the operation shall fail and an error be returned to the user.
The B-Tree module shall provide an interface to set the value of the
database-image size limit configuration parameter of a page-cache, given
an open b-tree database connection to that page-cache.
The default value assigned to the database-image size limit configuration of a
page-cache shall be the value of the compile time symbol SQLITE_MAX_PAGE_COUNT
(1073741823 by default).
If the database-image size limit parameter is set to a non-negative value, and
the user executes a write operation that would otherwise require the journal
file to be extended to a size greater than the configured value in bytes, then
the operation shall fail and an error be returned to the user.
The B-Tree module shall provide an interface to set the value of the
cache-size configuration parameter of a page-cache, given an open b-tree
database connection to that page-cache.
See section 2.2.1 for a description of and requirements
specifying how the value of the cache-size parameter affects the
operation of a page-cache. Check this reference is
relevant after it is written. Refer to a specific requirement if possible
too.
The B-Tree module shall provide an interface allowing the application to set
the safety-level of a page-cache to one of “off”, “normal” or “full”,
given an open b-tree database connection to that page-cache.
The default value assigned to the safety-level configuration parameter of a
page-cache shall be “full”.
Description of what the safety-level actually does. Or pointer to where a
description and requirements can be found (transactions section?).
Interface to set the codec function (encryption).
The busy-handler. Where exactly does this come in? Transactions and
savepoints section?
The six page-cache operational parameters listed above may also be
queried. The following requirements specify the required query
interfaces.
The B-Tree module shall provide an interface to query the current locking-mode
of a page-cache, given an open b-tree database connection to that page-cache.
The B-Tree module shall provide an interface to query the current journal-mode
of a page-cache, given an open b-tree database connection to that page-cache.
The B-Tree module shall provide an interface to query the current journal file
size-limit of a page-cache, given an open b-tree database connection to that
page-cache.
The B-Tree module shall provide an interface to query the current database file
size-limit of a page-cache, given an open b-tree database connection to that
page-cache.
The B-Tree module shall provide an interface to query the current cache-size
of a page-cache, given an open b-tree database connection to that page-cache.
The B-Tree module shall provide an interface to query the current safety-level
of a page-cache, given an open b-tree database connection to that page-cache.
It is also possible to interrogate a b-tree database handle to determine
if it was opened on a temporary or persistent database. An b-tree
database handle opened on a persistent database may be queried for the
name of (full-path to) either the database or journal file asso