nostrdb

an unfairly fast embedded nostr database backed by lmdb
git clone git://jb55.com/nostrdb
Log | Files | Refs | Submodules | README | LICENSE

intro.doc (8503B)


      1 /*
      2  * Copyright 2015-2021 Howard Chu, Symas Corp.
      3  * All rights reserved.
      4  *
      5  * Redistribution and use in source and binary forms, with or without
      6  * modification, are permitted only as authorized by the OpenLDAP
      7  * Public License.
      8  *
      9  * A copy of this license is available in the file LICENSE in the
     10  * top-level directory of the distribution or, alternatively, at
     11  * <http://www.OpenLDAP.org/license.html>.
     12  */
     13 /** @page starting Getting Started
     14 
     15 LMDB is compact, fast, powerful, and robust and implements a simplified
     16 variant of the BerkeleyDB (BDB) API. (BDB is also very powerful, and verbosely
     17 documented in its own right.) After reading this page, the main
     18 \ref mdb documentation should make sense. Thanks to Bert Hubert
     19 for creating the
     20 <a href="https://github.com/ahupowerdns/ahutils/blob/master/lmdb-semantics.md">
     21 initial version</a> of this writeup.
     22 
     23 Everything starts with an environment, created by #mdb_env_create().
     24 Once created, this environment must also be opened with #mdb_env_open().
     25 
     26 #mdb_env_open() gets passed a name which is interpreted as a directory
     27 path. Note that this directory must exist already, it is not created
     28 for you. Within that directory, a lock file and a storage file will be
     29 generated. If you don't want to use a directory, you can pass the
     30 #MDB_NOSUBDIR option, in which case the path you provided is used
     31 directly as the data file, and another file with a "-lock" suffix
     32 added will be used for the lock file.
     33 
     34 Once the environment is open, a transaction can be created within it
     35 using #mdb_txn_begin(). Transactions may be read-write or read-only,
     36 and read-write transactions may be nested. A transaction must only
     37 be used by one thread at a time. Transactions are always required,
     38 even for read-only access. The transaction provides a consistent
     39 view of the data.
     40 
     41 Once a transaction has been created, a database can be opened within it
     42 using #mdb_dbi_open(). If only one database will ever be used in the
     43 environment, a NULL can be passed as the database name. For named
     44 databases, the #MDB_CREATE flag must be used to create the database
     45 if it doesn't already exist. Also, #mdb_env_set_maxdbs() must be
     46 called after #mdb_env_create() and before #mdb_env_open() to set the
     47 maximum number of named databases you want to support.
     48 
     49 Note: a single transaction can open multiple databases. Generally
     50 databases should only be opened once, by the first transaction in
     51 the process. After the first transaction completes, the database
     52 handles can freely be used by all subsequent transactions.
     53 
     54 Within a transaction, #mdb_get() and #mdb_put() can store single
     55 key/value pairs if that is all you need to do (but see \ref Cursors
     56 below if you want to do more).
     57 
     58 A key/value pair is expressed as two #MDB_val structures. This struct
     59 has two fields, \c mv_size and \c mv_data. The data is a \c void pointer to
     60 an array of \c mv_size bytes.
     61 
     62 Because LMDB is very efficient (and usually zero-copy), the data returned
     63 in an #MDB_val structure may be memory-mapped straight from disk. In
     64 other words <b>look but do not touch</b> (or free() for that matter).
     65 Once a transaction is closed, the values can no longer be used, so
     66 make a copy if you need to keep them after that.
     67 
     68 @section Cursors Cursors
     69 
     70 To do more powerful things, we must use a cursor.
     71 
     72 Within the transaction, a cursor can be created with #mdb_cursor_open().
     73 With this cursor we can store/retrieve/delete (multiple) values using
     74 #mdb_cursor_get(), #mdb_cursor_put(), and #mdb_cursor_del().
     75 
     76 #mdb_cursor_get() positions itself depending on the cursor operation
     77 requested, and for some operations, on the supplied key. For example,
     78 to list all key/value pairs in a database, use operation #MDB_FIRST for
     79 the first call to #mdb_cursor_get(), and #MDB_NEXT on subsequent calls,
     80 until the end is hit.
     81 
     82 To retrieve all keys starting from a specified key value, use #MDB_SET.
     83 For more cursor operations, see the \ref mdb docs.
     84 
     85 When using #mdb_cursor_put(), either the function will position the
     86 cursor for you based on the \b key, or you can use operation
     87 #MDB_CURRENT to use the current position of the cursor. Note that
     88 \b key must then match the current position's key.
     89 
     90 @subsection summary Summarizing the Opening
     91 
     92 So we have a cursor in a transaction which opened a database in an
     93 environment which is opened from a filesystem after it was
     94 separately created.
     95 
     96 Or, we create an environment, open it from a filesystem, create a
     97 transaction within it, open a database within that transaction,
     98 and create a cursor within all of the above.
     99 
    100 Got it?
    101 
    102 @section thrproc Threads and Processes
    103 
    104 LMDB uses POSIX locks on files, and these locks have issues if one
    105 process opens a file multiple times. Because of this, do not
    106 #mdb_env_open() a file multiple times from a single process. Instead,
    107 share the LMDB environment that has opened the file across all threads.
    108 Otherwise, if a single process opens the same environment multiple times,
    109 closing it once will remove all the locks held on it, and the other
    110 instances will be vulnerable to corruption from other processes.
    111 
    112 Also note that a transaction is tied to one thread by default using
    113 Thread Local Storage. If you want to pass read-only transactions across
    114 threads, you can use the #MDB_NOTLS option on the environment.
    115 
    116 @section txns Transactions, Rollbacks, etc.
    117 
    118 To actually get anything done, a transaction must be committed using
    119 #mdb_txn_commit(). Alternatively, all of a transaction's operations
    120 can be discarded using #mdb_txn_abort(). In a read-only transaction,
    121 any cursors will \b not automatically be freed. In a read-write
    122 transaction, all cursors will be freed and must not be used again.
    123 
    124 For read-only transactions, obviously there is nothing to commit to
    125 storage. The transaction still must eventually be aborted to close
    126 any database handle(s) opened in it, or committed to keep the
    127 database handles around for reuse in new transactions.
    128 
    129 In addition, as long as a transaction is open, a consistent view of
    130 the database is kept alive, which requires storage. A read-only
    131 transaction that no longer requires this consistent view should
    132 be terminated (committed or aborted) when the view is no longer
    133 needed (but see below for an optimization).
    134 
    135 There can be multiple simultaneously active read-only transactions
    136 but only one that can write. Once a single read-write transaction
    137 is opened, all further attempts to begin one will block until the
    138 first one is committed or aborted. This has no effect on read-only
    139 transactions, however, and they may continue to be opened at any time.
    140 
    141 @section dupkeys Duplicate Keys
    142 
    143 #mdb_get() and #mdb_put() respectively have no and only some support
    144 for multiple key/value pairs with identical keys. If there are multiple
    145 values for a key, #mdb_get() will only return the first value.
    146 
    147 When multiple values for one key are required, pass the #MDB_DUPSORT
    148 flag to #mdb_dbi_open(). In an #MDB_DUPSORT database, by default
    149 #mdb_put() will not replace the value for a key if the key existed
    150 already. Instead it will add the new value to the key. In addition,
    151 #mdb_del() will pay attention to the value field too, allowing for
    152 specific values of a key to be deleted.
    153 
    154 Finally, additional cursor operations become available for
    155 traversing through and retrieving duplicate values.
    156 
    157 @section optim Some Optimization
    158 
    159 If you frequently begin and abort read-only transactions, as an
    160 optimization, it is possible to only reset and renew a transaction.
    161 
    162 #mdb_txn_reset() releases any old copies of data kept around for
    163 a read-only transaction. To reuse this reset transaction, call
    164 #mdb_txn_renew() on it. Any cursors in this transaction must also
    165 be renewed using #mdb_cursor_renew().
    166 
    167 Note that #mdb_txn_reset() is similar to #mdb_txn_abort() and will
    168 close any databases you opened within the transaction.
    169 
    170 To permanently free a transaction, reset or not, use #mdb_txn_abort().
    171 
    172 @section cleanup Cleaning Up
    173 
    174 For read-only transactions, any cursors created within it must
    175 be closed using #mdb_cursor_close().
    176 
    177 It is very rarely necessary to close a database handle, and in
    178 general they should just be left open.
    179 
    180 @section onward The Full API
    181 
    182 The full \ref mdb documentation lists further details, like how to:
    183 
    184   \li size a database (the default limits are intentionally small)
    185   \li drop and clean a database
    186   \li detect and report errors
    187   \li optimize (bulk) loading speed
    188   \li (temporarily) reduce robustness to gain even more speed
    189   \li gather statistics about the database
    190   \li define custom sort orders
    191 
    192 */