security.md - nostrdb - an unfairly fast embedded nostr database backed by lmdb

security.md (14220B)
      1 # Security Considerations
      2 
      3 This sections covers questions such as when is it safe to access a
      4 buffer without risking access violation, buffer overruns, or denial of
      5 service attacks, but cannot possibly cover all security aspects.
      6 
      7 ## Reading Buffers
      8 
      9 When reading a buffer you have to know the schema type before
     10 reading and it is preferable to know the size of the buffer although not
     11 strictly required. If the type is not known, the `file_type`, aka buffer
     12 type, can be checked. This no guarantee due to collisions with other
     13 data formats and because the identifier field may be absent or
     14 misleading. The identifier therefore works best on buffers that can be
     15 trusted.
     16 
     17 If a buffer cannot be trusted, such as when receiving it over a public
     18 network, it may be the case that buffer type is known, but it is not
     19 known if someone uses an incorrect implementation of FlatBuffers, or if
     20 the buffer has somehow been corrupted in transit, or someone
     21 intentionally tampered with the buffer. In this case the buffer can be
     22 verified. A verification does not prove that the buffer has the correct
     23 type, but it does prove that it is safe to read (not write) from the
     24 buffer. The buffer size must be known in order to verify the buffer. If
     25 the buffer has a wrong type, but still (unlikey but possible) passes
     26 verification, then unexpected data may be read from the buffer, but it
     27 will not cause any crashes when using the API correctly.
     28 
     29 It is preferable to know the required alignment of a buffer which isn't
     30 trivially available unless retrieved from the builder when the buffer is
     31 created. The buffer alignment can be deduced from the schema.
     32 
     33 On many systems there is no real problem in accessing a buffer
     34 unaligned, but for systems where it matters, care must be taken because
     35 unaligned access can result in slow performance or access violations.
     36 Even on systems where alignment matters, a standard malloc operation is
     37 often sufficient because it normally aligns to the largest word that
     38 could cause access violations when unaligned. For special use case such
     39 as GPU memory access more alignment may be needed and FlatBuffers
     40 support higher alignments in the schema. Portable `aligned_alloc` and
     41 `aligned_free` support methods are available to help allocate memory with
     42 sufficient alignment. Because compile time flags can change between
     43 compilation of the runtime library and the application,
     44 `flatcc_builder_aligned_free` ensures a consistent deallocation method
     45 for aligned buffers allocated by the runtime library.
     46 
     47 A verifier for C requires the buffer to placed in aligned memory and it
     48 will fail if the buffer content is not properly aligned relative to the
     49 buffer or to an absolute memory address regardless of whether the
     50 current systems requires alignment or not. Therefore a buffer verified
     51 on one system is safe to use on all systems. One could use this fact to
     52 sign a buffer, but this is beyond the scope of FlatBuffers itself, and
     53 verifying a signature is likely much slower than re-verifying a buffer
     54 when a verifier is available.
     55 
     56 Note: it would be helpful if the verifier allowed verification only
     57 relative to the buffer start instead of requiring the absolute addresses
     58 to be aligned. This would allow verification of buffers before copying
     59 them out of unaligned locations in network buffers and also allow
     60 subsequent reading of such buffers without copying iff the system
     61 supports unaligned access. However, the verifier does not currently
     62 support this.
     63 
     64 It is not always safe to verify a buffer. A buffer can be constructed to
     65 trigger deep nesting. The FlatCC verifier has a hard coded non-exact
     66 limit of about 100 levels. This is to protection stack recursion. If the
     67 limit is exceeded, the verifier will safely fail. The limit can be
     68 changed with a compile time flag. If the limit is too permissive a
     69 system may run into stack overflow, but it is unlikely on most systems
     70 today. Typical application code may have similar recursive access
     71 functions. Therefore it is likely that recursion is safe if the verifier
     72 succeeds but it depends on the application.
     73 
     74 A buffer can point to the same data from multiple places. This is known
     75 as a DAG. The verifier rejects cycles that could lead to infinite loops
     76 during application traversal but does permit DAGs. For normal use DAGs
     77 are safe but it is possible to maliciously construct a buffer with a
     78 long vector where all elements points to a table that also has a vector
     79 of a similar nature. After a few levels, this can lead to a finite but
     80 exponentially large number of places to visit. The current FlatCC
     81 verifier does not protect against this but Googles flatc compiler has a
     82 verifier the limits the number of visited tables.
     83 
     84 When reading a buffer in C no memory allocation takes place after the
     85 buffer has initially been placed in memory. For example, strings can be
     86 read directly as C strings and strings are 0-terminated. A string might
     87 contain embedded 0 bytes which is not strictly correct but permitted.
     88 This will result in a shorter string if used naively as a C string
     89 without reading the string length via the API but it will still be safe.
     90 Other languages might have to allocate memory for objects such as
     91 strings though.
     92 
     93 A field can generally be absent. Scalar fields are always safe to
     94 access, even if they are absent, because they have a default value that
     95 will be returned. Tables, Vectors, Strings, and Structs may return null
     96 when a field is absent. This is perfectly valid but if the application
     97 does not check for null, this can lead to an access violation.
     98 
     99 A field can marked 'required' in the schema. If it is required, it will
    100 never return null unless the buffer is invalid. A verifier will detect
    101 this. On a practical note some FlatBuffer builders might not enforce the
    102 required field and readers do not always verify buffers before access
    103 (nor should they have to) - therefore an application is advised to check
    104 return values even on required fields unless the buffer is entirely
    105 trusted.
    106 
    107 If a buffer is verified, it is safe to access all vector elements up to
    108 its size. Access of elements via API calls do not necessarily check for
    109 out of bounds but some might.
    110 
    111 A buffer may also be encoded in big endian format. This is not standard,
    112 but FlatCC supports for systems that are primarily big endian. The
    113 buffer identifier will usually detect the difference because the
    114 identifier will be byte swapped. A reader therefore need to be aware of
    115 this possiblity, but most often this is not a concern since standard
    116 FlatBuffers are always little endian. The verifier will likely fail an
    117 unpexcted endian encoding but at least make it safe to access.
    118 
    119 
    120 ## Thread Safety
    121 
    122 There is no thread safety on the FlatBuffers API but read access does
    123 not mutate any state. Every read location is a temporary variable so as
    124 long as the application code is otherwise sane, it is safe read a buffer
    125 from multiple threads and if the buffer is placed on cache line
    126 alignment (typically 64 or 128 bytes) it is also efficient without false
    127 sharing.
    128 
    129 A verifier is also safe to use because it it only reads from a buffer.
    130 
    131 A builder is inherently NOT safe for multihreaded access. However, with
    132 proper synchronization there is nothing preventing one thread from doing
    133 the grunt work and another putting the high level pieces together as
    134 long as only one thread at a time is access the builder object, or the
    135 associated allocator and emitter objects. From a performance perspective
    136 this doesn't make much sense, but it might from an architectural
    137 perspective.
    138 
    139 A builder object can be cleared and reused after a buffer is constructed
    140 or abandoned. The clear operation can optionally reduce the amount of
    141 memory or keep all the memory from the previous operation. In either
    142 case it is safe for new thread to use the builder after it is cleared
    143 but two threads cannot use the builder at the same time.
    144 
    145 It is fairly cheap to create a new builder object, but of course cheaper
    146 to reuse existing memory. Often the best option is for each thread to
    147 have its own builder and own memory and defer any sharing to the point
    148 where the buffer is finished and readable.
    149 
    150 
    151 ## Schema Evolution
    152 
    153 Accessing a buffer that was created by a more recent of a FlatBuffers
    154 schema is safe iff the new version of the schema was created according
    155 the guidelines for schema evolution - notably no change of field
    156 identifiers or existing enum values and no removal or deprecation of
    157 required fields. Googles flatc tool can check if a new schema version is
    158 safe.
    159 
    160 Fields that are not required but deprecated in a new version will still
    161 be safe to access by old version but they will likely read default or
    162 null values and should be prepared for this.
    163 
    164 
    165 ## Copying or Printing Buffer Content
    166 
    167 Even if it is safe to read a buffer, it is not safe to copy or even
    168 print a buffer because a DAG can unfold to consume much more output
    169 space than the given input. In data compression this is known as a Zip
    170 Bomb but even without malicious intent, users need to aware of the
    171 potential expansion. This is also a concern when printing JSON.
    172 
    173 A table also cannot be trivially copied based on memory content because
    174 it has offsets to other content. This is not an issue when using any
    175 official API but may be if a new buffer is attempted to be constructed
    176 by Frankenstein from parts of existing buffers.
    177 
    178 Nested buffers are not allowed to share any content with its parents,
    179 siblings or child buffers for similar reasons.
    180 
    181 The verifier should complain if a buffer goes out if bounds.
    182 
    183 
    184 ## Modifying Buffers
    185 
    186 It is not safe to modify buffers in-place unless the buffer is
    187 absolutely trusted. Verifying a buffer is not enough. FlatC does not
    188 provide any means to modify a buffer in-place but it is not hard to
    189 achieve this is if so desired. It is especially easy to do this with
    190 structs, so if this is needed this is the way to do it.
    191 
    192 Modifying a buffer is unsafe because it is possible to place one table
    193 inside another table, as an example, even if this is not valid. Such
    194 overlaps are too expensive to verify by a standard verifier. As long as
    195 the buffer is not modified this does not pose any real problem, but if a
    196 field is modified in one table, it might cause a field of another table
    197 to point out of bounds. This is so obvious an attack vector that anyone
    198 wanting to hack a system is likely to use this approach. Therefore
    199 in-place modification should be avoided unless on a trusted platform.
    200 For example, a trusted network might bump a time-to-live counter when
    201 passing buffers around.
    202 
    203 Even if it is safe to modify buffers, this will not work on platforms
    204 that require endian conversion. This is usually big endian platforms,
    205 but it is possible to compile flatbuffers with native big endian format
    206 as well.
    207 platforms unless extra precautions are taken. FlatCC has a lot of
    208 low-level `to/from_pe` calls that performs the proper
    209 
    210 
    211 ## Building Buffers
    212 
    213 A buffer can be constructed incorrectly in a large number of ways that
    214 are not efficient to detect at runtime.
    215 
    216 When a buffer is constructed with a debug library then assertions will
    217 sometimes find the most obvious problems such as closing a table after
    218 opening a vector. FlatCC is quite permissive in the order of object
    219 creation but the nesting order must be respected, and it cannot type
    220 check all data. Other FlatBuffer builders typically require that child
    221 objects are completed created before a parent object is started. FlatCC
    222 does not require this but will internally organize objects in a
    223 comptible way. This removes a number of potential mistakes, but not all.
    224 
    225 Notably a table from a parent object or any other external reference
    226 should not be used in a nested buffer.
    227 
    228 It is a good idea to run a verifier on a constructed buffer at least
    229 until some confidence has been gained in the code building buffers.
    230 
    231 If a buffer needs to be constructed with sorted keys this cannot be done
    232 during construction, unlike the C++ API because the builder allocates as
    233 little memory as possible. Instead the reader interface supports a
    234 mutable cast for use with a sort operation. This sort operation must
    235 only be used on absolutely trusted buffers and verification is not
    236 sufficient if malicous overlaps can be expected.
    237 
    238 The builder will normally consume very little memory. It operates a few
    239 stacks and a small hash table in additional to a circular buffer to
    240 consum temporary buffer output. It is not possible to access constructed
    241 buffer objects buffer the buffer is complete because data may span
    242 multiple buffers. Once a buffer is complete the content can be
    243 copied out, or a support function can be used allocate new memory with
    244 the final buffer as content.
    245 
    246 The internal memory can grow large when the buffer grows large,
    247 naturally. In addition the temporary stacks may grow large if there are
    248 are large tables or notably large vectors that cannot be be copied
    249 directly to the output buffers. This creates a potential for memory
    250 allocation errors, especially on constrained systems.
    251 
    252 The builder returns error codes, but it is tedious to check these. It is
    253 not necessary to check return codes if the API is used correctly and if
    254 there are no allocation errors. It is possible to provide a custom
    255 allocator and a custom emitter. These can detect memory failures early
    256 making it potentially safe to use the builder API without any per
    257 operation checks.
    258 
    259 The generated JSON parser checks all return codes and can be used to
    260 construct a buffer safely, especially since the buffer is naturally
    261 bounded by the size of the JSON input. JSON printing, on the other hand,
    262 can potentially explode, as discussed earlier.
    263 
    264 FlatCC generated create calls such as `MyGame_Example_Monster_create()`
    265 will not be compatible across versions if there are deprecated fields
    266 even if the schema change otherwise respects schema evolutation rules.
    267 This is mostly a concern if new fields are added because compilation
    268 will otherwise break on argument count mismatch. Prior to flatcc-0.5.3
    269 argument order could change if the field (id: x) attribute was used
    270 which could lead to buffers with unexpected content. JSON parsers that
    271 support constructors (objects given as an array of create arguments)
    272 have similar concerns but here trailing arguments can be optional.
	nostrdb an unfairly fast embedded nostr database backed by lmdb
	git clone git://jb55.com/nostrdb
	Log \| Files \| Refs \| Submodules \| README \| LICENSE