nostrdb

an unfairly fast embedded nostr database backed by lmdb
git clone git://jb55.com/nostrdb
Log | Files | Refs | Submodules | README | LICENSE

binary-format.md (59658B)


      1 # FlatBuffers Binary Format
      2 
      3 
      4 <!-- vim-markdown-toc GFM -->
      5 
      6 * [Overview](#overview)
      7 * [Memory Blocks](#memory-blocks)
      8 * [Example](#example)
      9 * [Primitives](#primitives)
     10   * [Numerics](#numerics)
     11   * [Boolean](#boolean)
     12   * [Format Internal Types](#format-internal-types)
     13   * [Scalars](#scalars)
     14   * [Structs](#structs)
     15 * [Internals](#internals)
     16 * [Type Hashes](#type-hashes)
     17   * [Conflicts](#conflicts)
     18   * [Type Hash Variants](#type-hash-variants)
     19 * [Unions](#unions)
     20 * [Optional Fields](#optional-fields)
     21 * [Alignment](#alignment)
     22 * [Default Values and Deprecated Values](#default-values-and-deprecated-values)
     23 * [Schema Evolution](#schema-evolution)
     24 * [Keys and Sorting](#keys-and-sorting)
     25 * [Size Limits](#size-limits)
     26 * [Verification](#verification)
     27 * [Risks](#risks)
     28 * [Nested FlatBuffers](#nested-flatbuffers)
     29 * [Fixed Length Arrays](#fixed-length-arrays)
     30 * [Big Endian FlatBuffers](#big-endian-flatbuffers)
     31 * [StructBuffers](#structbuffers)
     32 * [StreamBuffers](#streambuffers)
     33 * [Bidirectional Buffers](#bidirectional-buffers)
     34 * [Possible Future Features](#possible-future-features)
     35   * [Force Align](#force-align)
     36   * [Mixins](#mixins)
     37 
     38 <!-- vim-markdown-toc -->
     39 
     40 
     41 ## Overview
     42 
     43 ## Memory Blocks
     44 
     45 A FlatBuffers layout consists of the following types of memory blocks:
     46 
     47 - header
     48 - table
     49 - vector
     50 - string
     51 - vtable
     52 - (structs)
     53 
     54 Each of these have a contigous memory layout. The header references the
     55 root table. Every table references a vtable and stores field in a single
     56 block of memory. Some of these fields can be offsets to vectors, strings
     57 and vectors. Vectors are one-dimensional blocks where each element is
     58 self-contained or stores reference to table or a string based on schema
     59 type information. A vtable decides which fields are present in table and
     60 where they are stored. Vtables are the key to support schema evolution.
     61 A table has no special rules about field ordering except fields must be
     62 properly aligned, and if they are ordered by size the will pack more
     63 densely but possibly more complicated to construct and the schema may
     64 provide guidelines on the preferred order. Two vtables might mean
     65 different things but accidentally have the same structure. Such vtables
     66 can be shared between different tables. vtable sharing is important when
     67 vectors store many similar tables. Structs a dense memory regions of
     68 scalar fields and smaller structs. They are mostly found embedded in
     69 tables but they are independent blocks when referenced from a union or
     70 union vector, or when used as a buffer root. Structs hold no references.
     71 
     72 Space between the above blocks are zero padded and present in order to
     73 ensure proper alignment. Structs must be packed as densely as possible
     74 according the alignment rules that apply - this ensures that all
     75 implementations will agree on the layout. The blocks must not overlap in
     76 memory but two blocks may be shared if they represent the same data such
     77 as sharing a string.
     78 
     79 FlatBuffers are constructed back to front such that lower level objects
     80 such as sub-tables and strings are created first in stored last and such
     81 that the root object is stored last and early in the buffer. See also
     82 [Stream Buffers](#stream-buffers) for a proposed variation over this
     83 theme.
     84 
     85 All addressing in FlatBuffers are relative. The reason for this? When
     86 navigating the buffer you don't need to store the buffer start or the
     87 buffer length, you can also find a new reference relative the reference
     88 you already have. vtables require the table location to find a field,
     89 but a table field referencing a string field only needs the table field
     90 location, not the table start in order to find the referenced string.
     91 This results in efficient navigation.
     92 
     93 ## Example
     94 
     95 Uoffsets add their content to their address and are positive while a
     96 tables offset to its vtable is signed and is substracted. A vtables
     97 element is added to the start of the table referring to the vtable.
     98 
     99 
    100 Schema ([eclectic.fbs]) :
    101 
    102         namespace Eclectic;
    103 
    104         enum Fruit : byte { Banana = -1, Orange = 42 }
    105         table FooBar {
    106             meal      : Fruit = Banana;
    107             density   : long (deprecated);
    108             say       : string;
    109             height    : short;
    110         }
    111         file_identifier "NOOB";
    112         root_type FooBar;
    113 
    114 JSON :
    115 
    116         { "meal": "Orange", "say": "hello", "height": -8000 }
    117 
    118 Buffer :
    119 
    120         header:
    121 
    122             +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
    123             +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
    124 
    125             ...
    126 
    127         table:
    128 
    129             +0x0100 e0 ff ff ff ; 32-bit soffset to vtable location
    130                                 ; two's complement: 2^32 - 0xffffffe0 = -0x20
    131                                 ; effective address: +0x0100 - (-0x20) = +0x0120
    132             +0x0104 00 01 00 00 ; 32-bit uoffset string field (FooBar.say)
    133                                 ; find string +0x100 = 256 bytes _from_ here
    134                                 ; = +0x0104 + 0x100 = +0x0204.
    135             +0x0108 42d         ; 8-bit (FooBar.meal)
    136             +0x0109 0           ; 8-bit padding
    137             +0x010a -8000d      ; 16-bit (FooBar.height)
    138             +0x010c  ...        ; (first byte after table end)
    139 
    140             ...
    141 
    142         vtable:
    143 
    144             +0x0120 0c 00       ; vtable length = 12 bytes
    145             +0x0122 0c 00       ; table length = 12 bytes
    146             +0x0124 08 00       ; field id 0: +0x08 (meal)
    147             +0x0126 00 00       ; field id 1: <missing> (density)
    148             +0x0128 04 00       ; field id 2: +0004 (say)
    149             +0x012a 0a 00       ; field id 3: +0x0a (height)
    150 
    151             ...
    152 
    153         string:
    154 
    155             +0x0204 05 00 00 00 ; vector element count (5 ubyte elements)
    156             +0x0208 'h' 'e'     ; vector data
    157             +0x020a 'l' 'l'     ; vector data
    158             +0x020c 'o'         ; vector data
    159             +0x020d  00         ; zero termination
    160                                 ; special case for string vectors
    161 
    162             ...
    163 
    164 
    165 Actual FlatCC generated buffer :
    166 
    167     Eclectic.FooBar:
    168       0000  08 00 00 00 4e 4f 4f 42  e8 ff ff ff 08 00 00 00  ....NOOB........
    169       0010  2a 00 c0 e0 05 00 00 00  68 65 6c 6c 6f 00 00 00  *.......hello...
    170       0020  0c 00 0c 00 08 00 00 00  04 00 0a 00              ............
    171 
    172 
    173 Note that FlatCC often places vtables last resulting in `e0 ff ff ff`
    174 style vtable offsets, while Googles flatc builder typically places them
    175 before the table resulting in `20 00 00 00` style vtable offsets which
    176 might help understand why the soffset is subtracted from and not added
    177 to the table start. Both forms are equally valid.
    178 
    179 
    180 ## Primitives
    181 
    182 Primitives are data used to build more complex structures such as
    183 strings, vectors, vtables and tables. Although structs are not strictly
    184 a primitive, it helps to view them as self-contained primitives.
    185 
    186 
    187 ### Numerics
    188 
    189 FlatBuffers are based on the following primitives that are 8, 16, 32 and
    190 64 bits in size respectively (IEEE-754, two's complement, little endian):
    191 
    192     uint8, uint16, uint32, uint64 (unsigned)
    193     int8, int16, int32, int64     (two's complement)
    194     float32, float64              (IEEE-754)
    195 
    196 
    197 ### Boolean
    198 
    199     flatbuffers_bool (uint8)
    200     flatbuffers_true (flatbuffers_bool assign as 1, read as != 0)
    201     flatbuffers_false (flatbuffers_bool = 0)
    202 
    203 Note that a C99 `bool` type has no defined size or sign so it is not an
    204 exact representation of a flatbuffers boolean encoding.
    205 
    206 When a stored value is interpreted as boolean it should not be assumed
    207 to be either 1 or 0 but rather as `not equal to 0`. When storing a
    208 boolean value or when converting a boolean value to integer before
    209 storing, the value should be 1 for true and 0 for false, In C this can
    210 be done using `!!x`.
    211 
    212 ### Format Internal Types
    213 
    214     flatbuffers_union_type_t (uint8, NONE = 0, 0 <= type <= 255)
    215     flatbuffers_identifier_t (uint8[4])
    216     flatbuffers_uoffset_t (uin32)
    217     flatbuffers_soffset_t (int32),
    218     flatbuffers_voffset_t (uint16)
    219 
    220 These types can change in FlatBuffer derived formats, but when we say
    221 FlatBuffers without further annotations, we mean the above sizes in
    222 little endian encoding. We always refer to specific field type and not a
    223 specific field size because this allows us to derive new formats easily.
    224 
    225 
    226 ### Scalars
    227 
    228 To simpify discussion we use the term scalars to mean integers, boolean,
    229 floating point and enumerations. Enumerations always have an underlying
    230 signed or unsigned integer type used for its representation.
    231 
    232 Scalars are primitives that has a size between 1 and 8 bytes. Scalars
    233 are stored aligned to the own size.
    234 
    235 The format generally supports vectors and structs of any scalar type but
    236 some language bindings might not have support for all combinations such
    237 as arrays of booleans.
    238 
    239 An special case is the encoding of a unions type code which internally
    240 is an enumeration but it is not normally permitted in places where we
    241 otherwise allow for scalars.
    242 
    243 Another special case is enumerations of type boolean which may not be
    244 widely supported, but possible. The binary format is not concerned with
    245 this distinction because a boolean is just an integer at this level.
    246 
    247 ### Structs
    248 
    249 A struct is a fixed lengthd block of a fixed number of fields in a specific order
    250 defined by a schema. A field is either a scalar, another struct or a fixed length
    251 array of these, or a fixed length char array. A struct cannot contain fields that
    252 contain itself directly or indirectly. A struct is self-contained and has no
    253 references. A struct cannot be empty.
    254 
    255 A schema cannot change the layout of a struct without breaking binary
    256 compatibility, unlike tables.
    257 
    258 When used as a table field, a struct is embedded within the table block
    259 unless it is union value. A vector of structs are placed in a separate
    260 memory block, similar to vectors of scalars. A vector of unions that has
    261 a struct as member will reference the struct as an offset, and the
    262 struct is then an independent memory block like a table.
    263 
    264 FlatCC supports that a struct can be the root object of a FlatBuffer, but
    265 most implementations likely won't support this. Structs as root are very
    266 resource efficient.
    267 
    268 Structs cannot have vectors but they can have fixed length array fields. which
    269 are equivalent to stacking multiple non-array fields of the same type after each
    270 other in a compact notation with similar alignment rules. Additionally arrays
    271 can be of char type to have a kind of fixed length string. The char type is not
    272 used outside of char arrays. A fixed length array can contain a struct that
    273 contains one more fixed length arrays. If the char array type is not support, it
    274 can be assumed to be a byte array.
    275 
    276 
    277 ## Internals
    278 
    279 All content is little endian and offsets are 4 bytes (`uoffset_t`).
    280 A new buffer location is found by adding a uoffset to the location where
    281 the offset is stored. The first location (offset 0) points to the root
    282 table. `uoffset_t` based references are to tables, vectors, and strings.
    283 References to vtables and to table fields within a table have a
    284 different calculations as discussed below.
    285 
    286 A late addition to the format allow for adding a size prefix before the
    287 standard header. When this is done, the builder must know about so it
    288 can align content according to the changed starting position. Receivers
    289 must also know about the size field just as they must know about the
    290 excpected buffer type.
    291 
    292 The next 4 bytes (`sizeof(uoffset_t)`) might represent a 4 byte buffer
    293 identifier, or it might be absent but there is no obvious way to know
    294 which.  The file identifier is typically ASCII characters from the
    295 schemas `file_identifier` field padded with 0 bytes but may also contain
    296 any custom binary identifier in little endian encoding. See
    297 [Type-Hashes](#type-hashes). The 0 identifier should be avoided because
    298 the buffer might accidentally contain zero padding when an identifier is
    299 absent and because 0 can be used by API's to speficy that no identifier
    300 should be stored.
    301 
    302 When reading a buffer, it should be checked that the length is at least
    303 8 bytes (2 * `sizeof(uoffset_t)`) Otherwise it is not safe to check the
    304 file identifier.
    305 
    306 The root table starts with a 4 byte vtable offset (`soffset_t`). The
    307 `soffset_t` has the same size as `uoffset_t` but is signed.
    308 
    309 The vtable is found by *subtracting* the signed 4 byte offset to the
    310 location where the vtable offset is stored. Note that the `FlatCC`
    311 builder typically stores vtables at the end of the buffer (clustered
    312 vtables) and therefore the vtable offset is normally negative. Other
    313 builders often store the vtable before the table unless reusing an
    314 existing vtable and this makes the soffset positive.
    315 (Nested FlatBuffers will not store vtables at the end because it would
    316 break compartmentalization).
    317 
    318 The vtable is a table of 2 byte offsets (`voffset_t`). The first two
    319 entreis are the vtable size in bytes and the table size in bytes. The
    320 next offset is the vtable entry for table field 0. A vtable will always
    321 have as many entries as the largest stored field in the table, but it
    322 might skip entries that are not needed or known (version independence) -
    323 therefore the vtable length must be checked. The table length is only
    324 needed by verifiers. A vtable lookup of an out of range table id is the
    325 same as a vtable entry that has a zero entry and results in a default
    326 value for the expected type. Multiple tables may shared the same vtable,
    327 even if they have different types. This is done via deduplication during
    328 buffer construction. A table field id maps to a vtable offset using the
    329 formula `vtable-start + sizeof(voffset_t) * (field-id + 2)`, iff the
    330 result is within the vtable size.
    331 
    332 The vtable entry stores a byte offset relative to the location where the
    333 `soffset_t` where stored at the start of the table. A table field is
    334 stored immediately after the `soffset_t` or may contain 0 padding. A
    335 table field is always aligned to at least its own size, or more if the
    336 schema demands it. Strings, sub-tables and vectors are stored as a
    337 4-byte `uoffset_t`. Such sub-elements are always found later in the
    338 buffer because `uoffset_t` is unsigned. A struct is stored in-place.
    339 Conceptually a struct is not different from an integer in this respect,
    340 just larger.
    341 
    342 If a sub-table or vector is absent and not just without fields, 0
    343 length, the containing table field pointing the sub-table or vector
    344 should be absent. Note that structs cannot contain sub-tables or
    345 vectors.
    346 
    347 A struct is always 0 padded up to its alignment. A structs alignment
    348 is given as the largest size of any non-struct member, or the alignment
    349 of a struct member (but not necessarily the size of a struct member), or
    350 the schema specified aligment.
    351 
    352 A structs members are stored in-place so the struct fields are known
    353 via the the schema, not via information in the binary format
    354 
    355 An enum is represent as its underlying integer type and can be a member
    356 of structs, fields and vectors just like integer and floating point
    357 scalars.
    358 
    359 A table is always aligned to `sizeof(uoffset_t)`, but may contain
    360 internal fields with larger alignment. That is, the table start or end
    361 are not affected by alignment requirements of field members, unlike
    362 structs.
    363 
    364 A string is a special case of a vector with the extra requirement that a
    365 0 byte must follow the counted content, so the following also applies to
    366 strings.
    367 
    368 A vector starts with a `uoffset_t` field the gives the length in element
    369 counts, not in bytes. Table fields points to the length field of a
    370 vector, not the first element. A vector may have 0 length. Note that the
    371 FlatCC C binding will return a pointer to a vectors first element which
    372 is different from the buffer internal reference.
    373 
    374 All elements of a vector has the same type which is either a scalar type
    375 including enums, a struct, or a uoffset reference where all the
    376 reference type is to tables all of the same type, all strings, or,
    377 for union vectors, references to members of the same union. Because a
    378 union member can be a struct, it is possible to have vectors that
    379 reference structs instead of embeddding them, but only via unions. It is
    380 not possible to have vectors of vectors other than string vectors,
    381 except indirectly via vectors containing tables.
    382 
    383 A vectors first element is aligned to the size of `uoffset_t` or the
    384 alignment of its element type, or the alignment required by the schema,
    385 whichever is larger. Note that the vector itself might be aligned to 4
    386 bytes while the first element is aligned to 8 bytes.
    387 
    388 (Currently the schema semantics do no support aligning vectors beyond
    389 their element size, but that might change and can be forced when
    390 building buffers via dedicated api calls).
    391 
    392 Strings are stored as vectors of type `ubyte_t`, i.e. 8-bit elements. In
    393 addition, a trailing zero is always stored. The trailing zero is not
    394 counted in the length field of the vector. Technically a string is
    395 supposed to be valid UTF-8, but in praxis it is permitted that strings
    396 contain 0 bytes or other invalid sequences. It is recommended to
    397 explicitly check strings for UTF-8 conformance when this is required
    398 rather than to expect this to alwways be true. However, the ubyte vector
    399 should be preferred for binary data.
    400 
    401 A string, vector or table may be referenced by other tables and vectors.
    402 This is known as a directed acyclic graph (DAG). Because uoffsets are
    403 unsigned and because uoffsets are never stored with a zero value (except
    404 for null entries in union vectors), it is not possible for a buffer to
    405 contain cycles which makes them safe to traverse with too much fear of
    406 excessive recursion. This makes it possible to efficiently verify that a
    407 buffer does not contain references or content outside of its expected
    408 boundaries.
    409 
    410 A vector can also hold unions, but it is not supported by all
    411 implementations. A union vector is in reality two separate vectors: a
    412 type vector and an offset vector in place of a single unions type and
    413 value fields in table. See unions.
    414 
    415 
    416 ## Type Hashes
    417 
    418 A type hash is a 32-bit value defined as the fnv1a32 hash of a tables or
    419 a structs fully qualified name. If the fnv1a32 hash returns 0 it should
    420 instead hash the empty string. 0 is used to indicate that a buffer
    421 should not store an identifier.
    422 
    423 Every table and struct has a name and optionally also a namespace of one
    424 or more levels. The fully qualified name is optional namespace followed
    425 by the type name using '.' as a separator. For example
    426 "MyGame.Sample.Monster", or "Eclectic.FooBar".
    427 
    428 The type hash can be used as the buffer identifier instead of the schema
    429 provided `file_identifier`. The type hash makes it possible to
    430 distinguish between different root types from the same schema, and even
    431 across schema as long as the namespace is unique.
    432 
    433 Type hashes introduce no changes to the binary format but the application
    434 interface must choose to support user defined identifiers or explicitly
    435 support type hashes. Alternatively an application can peak directly into
    436 the buffer at offset 4 (when `uoffset_t` is 4 bytes long).
    437 
    438 FlatCC generates the following identifier for the "MyGame.Sample.Monster"
    439 table:
    440 
    441     #define MyGame_Sample_Monster_type_hash ((flatbuffers_thash_t)0xd5be61b)
    442     #define MyGame_Sample_Monster_type_identifier "\x1b\xe6\x5b\x0d"
    443 
    444 But we can also
    445 [compute one online](https://www.tools4noobs.com/online_tools/hash/)
    446 for our example buffer:
    447 
    448         fnv1a32("Eclectic.FooBar") = 0a604f58
    449 
    450 Thus we can open a hex editor and locate
    451 
    452             +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
    453             +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier
    454 
    455 and replace it with
    456 
    457             +0x0000 00 01 00 00 ; find root table at offset +0x00000100.
    458             +0x0004 58 4f 60 0a ; very likely our file identifier identifier
    459 
    460 or generate it with `flatcc`:
    461 
    462         $ bin/flatcc --stdout doc/eclectic.fbs | grep FooBar_type
    463         #define Eclectic_FooBar_type_hash ((flatbuffers_thash_t)0xa604f58)
    464         #define Eclectic_FooBar_type_identifier "\x58\x4f\x60\x0a"
    465 
    466 
    467 The following snippet implements fnv1a32, and returns the empty string
    468 hash if the hash accidentially should return 0:
    469 
    470 
    471     static inline flatbuffers_thash_t flatbuffers_type_hash_from_name(const char *name)
    472     {
    473         uint32_t hash = 2166136261UL;
    474         while (*name) {
    475             hash ^= (uint32_t)*name;
    476             hash = hash * 16777619UL;
    477             ++name;
    478         }
    479         if (hash == 0) {
    480             hash = 2166136261UL;
    481         }
    482         return hash;
    483     }
    484 
    485 ### Conflicts
    486 
    487 It is possible to have conficts between two type hashes although the
    488 risk is small. Conflicts are not important as long as an application can
    489 distinguish between all the types it may encouter in actual use. A
    490 switch statement in C will error at compile time for two cases that have
    491 the same value, so the problem is easily detectable and fixable by
    492 modifying a name or a namespace.
    493 
    494 For global conflict resolution, a type should be identified by its fully
    495 qualified name using adequate namespaces. This obviously requires it to
    496 be stored separate from the buffer identifier due to size constraints.
    497 
    498 
    499 ### Type Hash Variants
    500 
    501 If an alternative buffer format is used, the type hash should be
    502 modified. For example, if `uoffset_t` is defined as a 64-bit value, the
    503 fnv1a64 hash should be used instead. For big endian variants the hash
    504 remains unchanged but is byteswapped. The application will use the same
    505 id while the acces layer will handle the translation.
    506 
    507 For buffers using structs as roots, the hash remains unchanged because
    508 the struct is a unique type in schema. In this way a receiver that does
    509 not handle struct roots can avoid trying to read the root as a table.
    510 
    511 For futher variations of the format, a format identifier can be inserted
    512 in front of the namespace when generating the hash. There is no formal
    513 approach to this, but as an example, lets say we want to use only 1 byte
    514 per vtable entry and identify these buffers with type hash using the
    515 prefix "ebt:" for example buffer type. We then have the type hash:
    516 
    517     #define type_hash_prefix "ebt:"
    518 
    519     hash = fnv1a32(type_hash_prefix "Eclectic.FooBar");
    520     hash = hash ? hash : fnv1a32(type_hash_prefix);
    521 
    522 If the hash returns 0 we hash the prefix.
    523 
    524 
    525 ## Unions
    526 
    527 A union is a contruction on top of the above primitives. It consists of
    528 a type and a value.
    529 
    530 In the schema a union type is a set of table types with each table name
    531 assigned a type enumeration starting from 1. 0 is the type NONE meaning
    532 the union has not value assigned. The union type is represented as a
    533 ubyte enum type, or in the binary format as a value of type
    534 `union_type_t` which for standard FlatBuffers is an 8-bit unsigned code
    535 with 0 indicating the union stores not value and a non-zero value
    536 indicating the type of the stored union.
    537 
    538 A union is stored in a table as a normal sub-table reference with the
    539 only difference being that the offset does not always point to a table
    540 of the same type. The 8-bit union type is stored as a separate table
    541 field conventially named the same as the union value field except for a
    542 `_type` suffix. The value (storing the table offset) MUST have a field
    543 ID that is exactly one larger than the type field. If value field is
    544 present the type field MUST also be present. If the type is NONE the
    545 value field MUST be absent and the type field MAY be absent because a
    546 union type always defaults to the value NONE.
    547 
    548 Vectors of unions is a late addition (mid 2017) to the FlatBuffers
    549 format. FlatCC supports union vectors as of v0.5.0.
    550 
    551 Vectors of unions have the same two fields as normal unions but they
    552 both store a vector and both vectors MUST have the same length or both
    553 be absent from the table. The type vector is a vector of 8-bit enums and
    554 the value vector is a vector of table offsets. Obviously each type
    555 vector element represents the type of the table in the corresponding
    556 value element. If an element is of type NONE the value offset must be
    557 stored as 0 which is a circular reference. This is the only offset that
    558 can have the value 0.
    559 
    560 A later addition (mid 2017) to the format allows for structs and strings
    561 to also be member of a union. A union value is always an offset to an
    562 independent memory block. For strings this is just the offset to the
    563 string. For tables it is the offset to the table, naturally, and for
    564 structs, it is an offset to an separate aligned memory block that holds
    565 a struct and not an offset to memory inside any other table or struct.
    566 FlatCC supports mixed type unions and vectors of these as of v0.5.0.
    567 
    568 ## Optional Fields
    569 
    570 As of mid 2020 the FlatBuffers format introduced optional scalar table fields.
    571 There is no change to the binary schema, but the semantics have changed slightly
    572 compared to ordinary scalar fields (which remain supported as is): If an
    573 optional field is not stored in a table, it is considered to be a null value. An
    574 optinal scalar field will have null as its default value, so any representable
    575 scalar value will always be stored in the buffer, unlike other scalar fields
    576 which by default do not store the field if the value matches the default numeric
    577 value. This was already possible before by using `force_add` semantics to force
    578 a value to be written even if it was matching the default value, and by
    579 providing an `is_present` test when reading a value so that it would be possible
    580 to distinguish between a value that happened to be a default value, and a value
    581 that was actually absent. However, such techniques were ad-hoc. Optional
    582 fields formalize the semantics of null values for scalars. Other field types
    583 already have meaningful null values. Only table fields can be optional so struct
    584 fields must always assign a value to all members.
    585 
    586 ## Alignment
    587 
    588 All alignments are powers of two between 1 and 256. Large alignments are
    589 only possible via schema specified alignments. The format naturally has
    590 a maximum alignment of the largest scalar it stores, which is a 64-bit
    591 integer or floating point value. Because C malloc typically returns
    592 buffers aligned to a least 8 bytes, it is often safe to place buffers in
    593 heap allocated memory, but if the target system does not permit
    594 unaligned access, or is slow on unaligned access, a buffer should be
    595 placed in sufficiently aligned memory. Typically it is a good idea to
    596 place buffer in cacheline aligned boundary anyway.
    597 
    598 A buffers alignment is the same as the largest alignment of any
    599 object or struct it contains.
    600 
    601 The buffer size is not guaranteed to be aligned to its own alignment
    602 unlike struct. Googles `flatc` builder does this, at least when size
    603 prefixed. The `FlatCC` tool currently does not, but it might later add
    604 an option to pad up to alignment. This would make it simpler to stack
    605 similar typed buffers in file - but users can retrieve the buffers
    606 alignment and do this manually. Thus, when stacking size prefixed
    607 buffers, each buffer should start aligned to its own size starting at
    608 the size field, and should also be zero padded up to its own alignment.
    609 
    610 
    611 ## Default Values and Deprecated Values
    612 
    613 A table can can omit storing any field that is not a required field. For
    614 strings, vectors and tables this result in returning a null value
    615 different from an empty value when reading the buffer. Struct fields not
    616 present in table are also returned as null.
    617 
    618 All fields that can return null do not have a default value. Other
    619 values, which are integers, floats and enumerations, can all have
    620 default values. The default value is returned if not found in the table.
    621 
    622 If a default value is not specified the default defaults to zero for the
    623 corresponding type. If an enumeration does not have a 0 value and no
    624 explicit default value, this is a schema error.
    625 
    626 When building a buffer the builder will compare a stored field to the
    627 known default value. If it matches the field will simple be skipped.
    628 Some builder API's makes it possible to force a default value to be
    629 stored and to check if a field is missing when reading the buffer. This
    630 can be used to handle NULL values differently from default or missing
    631 values.
    632 
    633 A deprecated field is a schema construct. The binary format either stores
    634 a table field, or it does not.
    635 
    636 A deprecated field should be treated as not available, as in no way to
    637 read the value as opposed to returning a default value regardless of
    638 whether the field is present or not. If they for some reason are made
    639 accessible the verifier must also understand and verify these fields.
    640 
    641 A deprecated field also cannot be written to a new buffer, although if
    642 it against guidelines remains possible to do so, it should be done as if
    643 the field was not deprecated.
    644 
    645 Structs cannot have default values and cannot have deprecated fields in
    646 stadard FlatBuffers. FlatCC supports marking a struct field as
    647 deprecated. This implies the field will always be zeroed and with no
    648 trivial accessors. A struct can never change size without breaking
    649 support for schema evolution.
    650 
    651 FlatCC JSON parsers allow structs to only set some values. Remaining
    652 values will be implicitly zeroed. The C API for writing buffers do not
    653 have this concern because a struct can just be written as a C struct
    654 so there is no control over which fields a user choose to set or zero.
    655 However, structs should be zeroed and padded where data is not otherwise
    656 set. This makes it possible to hash and integrity check structs (though
    657 this is not an integral part of the format).
    658 
    659 
    660 ## Schema Evolution
    661 
    662 A table has a known set of low-valued field identifiers. Any unused
    663 field id can be used in a future version. If a field (as is normal) is
    664 implicitly assigned an id, new fields can only be added at the end of
    665 the table. Internally this translates into new versions getting ever
    666 larger vtables. Note that vtables are only stored as large as needed for
    667 the actual content of table, so a rarely used new field will not cause
    668 vtables to grew when unused.
    669 
    670 Enumarations may not change values in future versions. Unions may only
    671 added new table names to the end of the union list.
    672 
    673 Structs cannot change size nor content. They cannot evolve. FlatCC
    674 permits deprecating fields which means old fields will be zeroed.
    675 
    676 Names can be changed to the extend it make sense to the applications already
    677 written around the schema, but it may still break applications relying
    678 on some reflective information. For example, a reader may provide the
    679 string representation of a numeric enum value.
    680 
    681 New types can be added. For example adding a new table is always safe
    682 as long as it does not conflict with any existing schemas using the same
    683 namespace.
    684 
    685 Required fields cannot stop being required and they cannot be deprecated.
    686 
    687 Various attributes and other changes form a gray area that will not make
    688 the binary format unsafe but may still break due to changes in code
    689 generation, serialization to JSON, or similar. For example, a generated
    690 constructor that creates a table from a list of positional arguments
    691 might break if the field order changes or grows or have fields
    692 deprecated. JSON parsers could cease to work on old formats if base64
    693 serialization is added subsequently, and so on.
    694 
    695 ## Keys and Sorting
    696 
    697 Keys and sorting is a meta construct driven by the schema. The binary
    698 format has no special concept of keys and sorting and a vector can be
    699 sorted by one of several keys so it makes no sense to enforce a specific
    700 order.
    701 
    702 The basic FlatBuffers format only permit at most one key and generally
    703 sorts vectors by that key during buffer construction. FlatCC does not do
    704 this both because sorting is not practical while building the buffer and
    705 because FlatCC supports sorting by one of several keys. Thus, in general
    706 it is not safe to assume that a vector is sorted, but it can be sorted
    707 if needed.
    708 
    709 ## Size Limits
    710 
    711 A buffer should never be larger than `2^(sizeof(soffset_t) * 8 - 1) - 1`
    712 or `2^31 - 1` i.e. 2GB for standard FlatBuffers. Beyond this safe it is
    713 not safe to represent vtable offsets and implementations can no longer
    714 use signed types to store `uoffset_t` values. This limit also ensures
    715 that all vectors can be represented safely with a signed 32-bit length
    716 type.
    717 
    718 The application interface is likely to use a native type for
    719 representing sizes and vector indices. If this type is smaller that
    720 `sizeof(soffset_t)` or equivalently `sizeof(uoffset_t)` there is a risk
    721 of overflow. The simplest way to avoid any issues is to limit the
    722 accepted buffer size of the native size type. For example, on some
    723 embedded microprocessor systems a C compiler might have a 16-bit int and
    724 `size_t` type, even if it supports `uint32_t` as well. In this the safe
    725 assumption is to limit buffers to `2^15 - 1` which is very likely more
    726 than sufficient on such systems.
    727 
    728 A builder API might also want to prevent vectors from being created when
    729 they cannot stay within the size type of the platform when the element
    730 size is multipled by the element count. This is deceiving because the
    731 element count might easily be within range. Such issues will be rare in
    732 praxis but they can be the source of magnificent vulnerabilites if not
    733 handled appropriately.
    734 
    735 
    736 ## Verification
    737 
    738 Verification as discussed here is not just about implementing a
    739 verifier. It is as much requirements that any builder must fulfill.
    740 
    741 The objective of verification is to make it safe to read from an
    742 untrusted buffer, or a trusted buffer that accidentally has an
    743 unexpected type. The verifier does not guarantee that the type is indeed
    744 the expeced type exact it can read the buffer identifier if present
    745 which is still no guarantee. The verifier cannot make it safe to modify
    746 buffers in-place because the cost of doing such a check would be
    747 prohitive in the average case.
    748 
    749 A buffer verifier is expected to verify that all objects (strings,
    750 vectors and tables) do not have an end position beyond the externally
    751 specified buffer size and that all offset are aligned relative to offset
    752 zero, and sometimes also relative to actually specified buffer (but
    753 sometimes it is desireable to verify buffers that are not stored
    754 aligned, such as in network buffers).
    755 
    756 A verifier primarily checks that:
    757 
    758 - the buffer is at least `2 * sizeof(uoffset_t)` such that the root
    759   root table and buffer identifier can be checked safely.
    760 - any offset being chased is inside the buffer that any data accessed
    761   from that resuling location is entirely inside the buffer and aligned
    762   notably the vtables first entry must be valid before the vtable can
    763   safely validate itself, but this also applies to table, string and
    764   vector fields.
    765 - any uoffset has size of at least `sizeof(uoffse_t)` to aviod
    766   self-reference and is no larger than the largest positive soffset_t
    767   value of same size - this ensures that implementations can safely add
    768   uoffset_t even if converting to signed first. It also, incidentally,
    769   ensures compatibility with StreamBuffers - see below.
    770 - vtable size is aligned and does not end outside buffer.
    771 - vtable size is at least the two header fields
    772   (`2 * `sizeof(voffset_t)`).
    773 - required table fields are present.
    774 - recursively verify all known fields and ignore other fields. Unknown
    775   fields are vtable entries after the largest known field ID of a table.
    776   These should be ignored in order to support forward versioning.
    777 - deprecated fields are valid if accessors are available to do so, or
    778   ignore if the there is no way to access the field by application code.
    779 - vectors end within the buffer.
    780 - strings end within the buffer and has a zero byte after the end which
    781   is also within the buffer.
    782 - table fields are aligned relative to buffer start - both structs,
    783   scalars, and offset types.
    784 - table field size is aligned relative to field start.
    785 - any table field does not end outside the tables size as given by the
    786   vtable.
    787 - table end (without chasing offsets) is not outside buffer.
    788 - all data referenced by offsets are also valid within the buffer
    789   according the type given by the schema.
    790 - verify that recursion depth is limited to a configurable acceptable
    791   level for the target system both to protect itself and such that
    792   general recursive buffer operations need not be concerned with stack
    793   overflow checks (a depth of 100 or so would normally do).
    794 - verify that if the union type is NONE the value (offset) field is absent and
    795   if it is not NONE that the value field is present. If the union type
    796   is known, the table should be verified. If the type is not known
    797   the table field should be ignored. A reader using the same schema would
    798   see the union as NONE. An unknown union is not an error in order to
    799   support forward versioning.
    800 - verifies the union value according to the type just like any other
    801   field or element.
    802 - verify that a union vector always has type vector if the offset vector
    803   is present and vice versa.
    804 
    805 A verifier may choose to reject unknown fields and union types, but this
    806 should only be an user selectable option, otherwise schema evolution
    807 will not be possible.
    808 
    809 A verifier needs to be very careful in how it deals with overflow and
    810 signs. Vector elements multiplied by element size can overflow. Adding
    811 an invalid offset might become valid due to overflow. In C math on
    812 unsigned types yield predictable two's complement overflow while signed
    813 overflow is undefined behavior and can and will result in unpredictable
    814 values with modern optimizing compilers. The upside is that if the
    815 verifier handles all this correctly, the application reader logic can be
    816 much simpler while staying safe.
    817 
    818 
    819 A verifier does __not__ enforce that:
    820 
    821 - structs and other table fields are aligned relative to table start because
    822   tables are only aligned to their soffset field. This means a table cannot be
    823   copied naively into a new buffer even if it has no offset fields.
    824 - the order of individual fields within a table. Even if a schema says
    825   something about ordering this should be considered advisory. A
    826   verifier may additionally check for ordering for specific
    827   applications. Table order does not affect safety nor general buffer
    828   expectations. Ordering might affect size and performance.
    829 - sorting as specified by schema attributes. A table might be sorted in
    830   different ways and an implementation might avoid sorting for
    831   performance reasons and other practical reasons. A verifier may,
    832   however, offer additional verification to ensure specific vectors or
    833   all vectors are sorted according to schema or other guidelines.  Lack
    834   of sorting might affect expected behavior but will not make the buffer
    835   unsafe to access.
    836 - that structures do not overlap. Overlap can result in vulnerabilities
    837   if a buffer is modified, and no sane builder should create overlaps
    838   other than proper DAGs except via a separate compression/decopression
    839   stage of no interest to the verifier.
    840 - strings are UTF-8 compliant. Lack of compliance may affect expections
    841   or may make strings appear shorter or garbled. Worst case a naive
    842   UTF-8 reader might reach beyond the end when observing a lead byte
    843   suggest data after buffer end, but such a read should discover the
    844   terminal 0 before things get out of hand. Being relaxed permits
    845   specific use cases where UTF-8 is not suitable without giving up the
    846   option to verify buffers. Implementations can add additional
    847   verification for specific usecases for example by providing a
    848   strict-UTF8 flag to a verifier or by verifying at the application
    849   level. This also avoids unnecessary duplicate validation, for example
    850   when an API first verifies the buffer then converts strings to an
    851   internal heap representation where UTF-8 is validated anyway.
    852 - default values are not stored. It is common to force default values to
    853   be stored. This may be used to implement NULL values as missing
    854   values different from default values.
    855 - enum values are only among the enum values of its type. There are many
    856   use cases where it is convenient to add enum values for example flags
    857   or enums as units e.g. `2 * kilo + 3 * ounce. More generally ordinary
    858   integers may have value range restrictions which is also out of scope
    859   for verifier. An application may provide additional verification when
    860   requested.
    861 
    862 More generally we can say that a verifier is a basic fast assurance that
    863 the buffer is safe to access. Any additional verification is application
    864 specific. The verifier makes it safe to apply secondary validation.
    865 Seconary validation could be automated via schema attributes and may be
    866 very useful as such, but it is a separate problem and out of scope for a
    867 core binary format verifier.
    868 
    869 A buffer identifier is optional so the verifier should be informed
    870 whether an identifier must match a given id. It should check both ASCII
    871 text and zero padding not just a string compare. It is non-trivial to
    872 decide if the second buffer field is an identifier, or some other data,
    873 but if the field does not match the expected identifier, it certainly
    874 isn't what is expected.
    875 
    876 Note that it is not entirely trivial to check vector lengths because the
    877 element size must be mulplied by the stored element count. For large
    878 elements this can lead to overflows.
    879 
    880 
    881 ## Risks
    882 
    883 Because a buffer can contain DAGs constructed to explode exponentiall,
    884 it can be dangerous to print JSON or otherwise copy content blindly
    885 if there is no upper limit on the export size.
    886 
    887 In-place modification cannot be trusted because a standard buffer
    888 verifier will detect safe read, but will not detect if two objects are
    889 overlapping. For example, a table could be stored inside another table.
    890 Modifing one table might cause access to the contained table to go out
    891 of bounds, for example by directing the vtable elsewhere.
    892 
    893 The platform native integer and size type might not be able to handle
    894 large FlatBuffers - see [Size Limits](#size-limits).
    895 
    896 Becaue FlatCC requires buffers to be sorted after builiding, there is
    897 risk due to buffer modifications. It is not sufficient to verify buffers
    898 after sorting because sorting is done inline. Therefore buffers must be
    899 trusted or rewritten before sorting.
    900 
    901 
    902 ## Nested FlatBuffers
    903 
    904 FlatBuffers can be nested inside other FlatBuffers. In concept this is
    905 very simple: a nested buffer is just a chunk of binary data stored in a
    906 `ubyte` vector, typically with some convenience methods generated to
    907 access a stored buffer. In praxis it adds a lot of complexity. Either
    908 a nested buffer must be created strictly separately and copied in as
    909 binary data, but often users mix the two builder contexts accidentally
    910 storing strings from one buffer inside another. And when done right, the
    911 containing ubyte vector might not be aligned appropriately making it
    912 invalid to access the buffer without first copying out of the containing
    913 buffer except where unaligned access is permitted. Further, a nested
    914 FlatBuffer necessarily has a length prefix because any ubyte vector has
    915 a length field at its start. Therefore, size prefixed flatbuffers should
    916 not normally be stored as nested buffers, but sometimes it is necessary
    917 in order have the buffer properly aligned after extraction.
    918 
    919 The FlatCC builder makes it possible to build nested FlatBuffers while
    920 the containing table of the parent buffer is still open. It is very
    921 careful to ensure alignment and to ensure that vtables are not shared
    922 between the two (or more) buffers, otherwise a buffer could not safely
    923 be copied out. Users can still make mistakes by storing references from
    924 one buffer in another.
    925 
    926 Still, this area is so complex that several bugs have been found.
    927 Thus, it is advice to use nested FlatBuffers with some care.
    928 
    929 On the other hand, nested FlatBuffers make it possible to trivially
    930 extract parts of FlatBuffer data. Extracting a table would require
    931 chasing pointers and could potentially explode due to shared sub-graphs,
    932 if not handled carefully.
    933 
    934 ## Fixed Length Arrays
    935 
    936 This feature can be seen as equivalent to repeating a field of the same type
    937 multiple times in struct.
    938 
    939 Fixed length array struct fields has been introduced mid 2019.
    940 
    941 A fixed length array is somewhat like a vector of fixed length containing inline
    942 fixed length elements with no stored size header. The element type can be
    943 scalars, enums and structs but not other fixed length errors (without wrapping
    944 them in a struct).
    945 
    946 An array should not be mistaken for vector as vectors are independent objects
    947 while arrays are not. Vectors cannot be fixed length. An array can store fixed
    948 size arrays inline by wrapping them in a struct and the same applies to unions.
    949 
    950 The binary format of a fixed length vector of length `n` and type `t` can
    951 be precisely emulated by created a struct that holds exactly `n` fields
    952 of type `t`, `n > 0`. This means that a fixed length array does not
    953 store any length information in a header and that it is stored inline within
    954 a struct. Alignment follows the structs alignment rules with arrays having the
    955 same alignment as their elements and not their entire size.
    956 
    957 The maximum length is limited by the maximum struct size and / or an
    958 implementation imposed length limit. Flatcc accepts any array that will fit in
    959 struct with a maximum size of 2^16-1 by default but can be compiled with a
    960 different setting. Googles flatc implementation currently enforces a maximum
    961 element count of 2^16-1.
    962 
    963 Assuming the schema compiler computes the correct alignment for the overall
    964 struct, there is no additonal work in verifying a buffer containing a fixed
    965 length array because structs are verified based on the outermost structs size
    966 and alignment without having to inspect its content.
    967 
    968 Fixed lenght arrays also support char arrays. The `char` type is similar to the
    969 `ubyte` or `uint8` type but a char can only exist as a char array like
    970 `x:[char:10]`. Chars cannot exist as a standalone struct or table field, and
    971 also not as a vector element. Char arrays are like strings, but they contain no
    972 length information and no zero terminator. They are expected to be endian
    973 neutral and to contain ASCII or UTF-8 encoded text zero padded up the array
    974 size. Text can contain embedded nulls and other control characters. In JSON form
    975 the text is printed with embedded null characters but stripped from trailing
    976 null characters and a parser will padd the missing null characters.
    977 
    978 
    979 The following example uses fixed length arrays. The example is followed by the
    980 equivalent representation without such arrays.
    981 
    982     struct Basics {
    983         int a;
    984         int b;
    985     }
    986 
    987     struct MyStruct {
    988         x: int;
    989         z: [short:3];
    990         y: float;
    991         w: [Basics:2];
    992         name: [char:4];
    993     }
    994 
    995     // NOT VALID - use a struct wrapper:
    996     table MyTable {
    997         t: [ubyte:2];
    998         m: [MyStruct:2];
    999     }
   1000 
   1001 Equivalent representation:
   1002 
   1003     struct MyStructEquivalent {
   1004         x: int;
   1005         z1: short;
   1006         z2: short;
   1007         z3: short;
   1008         y: float;
   1009         wa1: int;
   1010         wa2: int;
   1011         name1: ubyte;
   1012         name2: ubyte;
   1013         name3: ubyte;
   1014         name4: ubyte;
   1015     }
   1016 
   1017     struct MyStructArrayEquivalent {
   1018         s1: MyStructEquivalent;
   1019         s2: MyStructEquivalent;
   1020     }
   1021 
   1022     struct tEquivalent {
   1023         t1: ubyte;
   1024         t2: ubyte;
   1025     }
   1026 
   1027     table MyTableEquivalent {
   1028         t: tEquivalent;
   1029         m: MyStructArrayEquivalent;
   1030     }
   1031 
   1032 
   1033 Note that forced zero-termination can be obtained by adding a trailing ubyte
   1034 field since uninitialized struct fields should be zeroed:
   1035 
   1036     struct Text {
   1037         str: [char:255];
   1038         zterm: ubyte;
   1039     }
   1040 
   1041 Likewise a length prefix field could be added if the applications involved know
   1042 how to interpret such a field:
   1043 
   1044     struct Text {
   1045         len: ubyte;
   1046         str: [char:254];
   1047         zterm: ubyte;
   1048     }
   1049 
   1050 The above is just an example and not part of the format as such.
   1051 
   1052 
   1053 ## Big Endian FlatBuffers
   1054 
   1055 FlatBuffers are formally always little endian and even on big-endian
   1056 platforms they are reasonably efficient to access.
   1057 
   1058 However it is possible to compile FlatBuffers with native big endian
   1059 suppport using the FlatCC tool. The procedure is out of scope for this
   1060 text, but the binary format is discussed here:
   1061 
   1062 All fields have exactly the same type and size as the little endian
   1063 format but all scalars including floating point values are stored
   1064 byteswapped within their field size. Offset types are also byteswapped.
   1065 
   1066 The buffer identifier is stored byte swapped if present. For example
   1067 the 4-byte "MONS" identifier becomes "SNOM" in big endian format. It is
   1068 therefore reasonably easy to avoid accidentially mixing little- and
   1069 big-endian buffers. However, it is not trivial to handle identifers that
   1070 are not exactly 4-bytes. "HI\0\0" could be "IH\0\0" or "\0\0IH". It is
   1071 recommended to always use 4-byte identifies to avoid this problem. See
   1072 the FlatCC release 0.4.0 big endian release for details.
   1073 
   1074 When using type hash identifiers the identifier is stored as a big
   1075 endian encoded hash value. The user application will the hash in its
   1076 native form and accessor code will do the conversion as for other
   1077 values.
   1078 
   1079 
   1080 ## StructBuffers
   1081 
   1082 _NOTE: the Google FlatBuffer project originally documented structs as
   1083 valid root objects, but never actually implemented it, and has as of mid
   1084 2020 changed the specification to disallow root structs as covered in
   1085 this section. FlatCC for C has been supporting root structs for a long
   1086 time, and they can provide significant speed advantages, so FlatCC will
   1087 continue to support these._
   1088 
   1089 Unlike tables, structs are are usually embedded in in a fixed memory
   1090 block representing a table, in a vector, or embedded inline in other
   1091 structs, but can also be independent when used in a union.
   1092 
   1093 The root object in FlatBuffers is conventionally expected to be a table,
   1094 but it can also be struct. FlatCC supports StructBuffers. Since structs
   1095 do not contain references, such buffers are truly flat. Most
   1096 implementations are not likely to support structs are root but even so
   1097 they are very useful:
   1098 
   1099 It is possible to create very compact and very fast buffers this way.
   1100 They can be used where one would otherwise consider using manual structs
   1101 or memory blocks but with the advantage of a system and language
   1102 independent schema.
   1103 
   1104 StructBuffers may be particularly interesting for the Big Endian
   1105 variant of FlatBuffers for two reasons: the first being that performance
   1106 likely matters when using such buffers and thus Big Endian platforms
   1107 might want them. The second reason is the network byte order is
   1108 traditionally big endian, and this has proven very difficult to change,
   1109 even in new evolving IETF standards. StructBuffers can be used to manage
   1110 non-trival big endian encoded structs, especially structs containing
   1111 other structs, even when the receiver does not understand FlatBuffers as
   1112 concept since the header can just be dropped or trivially documented.
   1113 
   1114 
   1115 ## StreamBuffers
   1116 
   1117 StreamBuffers are so far only a concept although some implementations
   1118 may already be able to read them. The format is documented to aid
   1119 possible future implementations.
   1120 
   1121 StreamBuffers makes it possible to store partially completed buffers
   1122 for example by writing directly to disk or by sending partial buffer
   1123 data over a network. Traditional FlatBuffers require an extra copying
   1124 step to make this possible, and if the writes are partial, each section
   1125 written must also store the segment length to support reassembly.
   1126 StreamBuffers avoid this problem.
   1127 
   1128 StreamBuffers treat `uoffset_t` the same as `soffset_t` when the special
   1129 limitation that `uoffset_t` is always negative when viewed as two's
   1130 complement values.
   1131 
   1132 The implication is that everthing a table or vector references must be
   1133 stored earlier in the buffer, except vtables that can be stored freely.
   1134 Existing reader implementations that treat `uoffset_t` as signed, such as
   1135 Javascript, will be able to read StreamBuffers with no modification.
   1136 Other readers can easily be modified by casting the uoffset value to
   1137 signed before it.
   1138 
   1139 Verifiers must ensure that any buffer verified always stores all
   1140 `uoffset_t` with the same sign. This ensures the DAG structure is
   1141 preserved without cycles.
   1142 
   1143 The root table offset remains positive as an exception and points to the
   1144 root buffer. A stream buffer root table will be the last table in the
   1145 buffer.
   1146 
   1147 The root table offset may be replaced with a root table offset at the
   1148 end of the buffer instead of the start. An implementation may also
   1149 zero the initial offset and update it later. In either case the buffer
   1150 should be aligned accordingly.
   1151 
   1152 Some may prefer the traditional FlatBuffer approach because the root
   1153 table is stored and it is somehow easier or faster to access, but modern
   1154 CPU cache systems do not care much about the order of access as long as
   1155 as their is some aspect of locality of reference and the same applies to
   1156 disk access while network access likely will have the end of the buffer
   1157 in hot cache as this is the last sent. The average distance between
   1158 objects will be the same for both FlatBuffers and StreamBuffers.
   1159 
   1160 
   1161 ## Bidirectional Buffers
   1162 
   1163 Bidirectional Buffers is a generalization of StreamBuffers and FlatBuffers
   1164 and so far only exists as an idea.
   1165 
   1166 FlatBuffers and StreamBuffers only allow one direction because it
   1167 guarantees easy cycle rejection. Cycles are unwanted because it is
   1168 expensive to verify buffers with cycles and because recursive navigation
   1169 might never terminate.
   1170 
   1171 As it happens, but we can also easily reject cycles in bidirectional
   1172 buffers if we apply certain constraints that are fully backwards
   1173 compatible with existing FlatBuffers that have a size below 2GB.
   1174 
   1175 The rule is that when an offset changes direction relative to a parent
   1176 objects direction, the object creating the change of direction becomes a
   1177 new start or end of buffer for anything reachable via that offset.
   1178 
   1179 The root table R is reached via forward reference from the buffer
   1180 header.  Its boundaries are itself and the buffer end. If the this table
   1181 has an offset pointing back, for example to new table X, then table X
   1182 must see the buffer start and R as two boundaries. X must not directly
   1183 or indirectly reach outside this region. Likewise, if the table R points
   1184 to new table Y in the forward direction, then Y is bounded by itself and
   1185 the buffer end.
   1186 
   1187 A lower bound is the end of a table or a vector althoug we just say the
   1188 table or vector is a boundary. An upper bound is until the start of the
   1189 table or vector, or the end of the buffer.
   1190 
   1191 When a buffer is verified, the boundaries are initially the buffer start
   1192 and buffer end. When the references from the root table are followed,
   1193 there new boundaries are either buffer start to root table, or root
   1194 table to end, depending on the offsets direction. And so forth
   1195 recursively.
   1196 
   1197 We can relax the requirement that simple vectors and vtables cannot
   1198 cross these boundaries, except for the hard buffer limits, but it does
   1199 compicate things and is likely not necessary.
   1200 
   1201 Nested FlatBuffers already follow similar boundary rules.
   1202 
   1203 The point of Bidirectional buffers is not that it would be interesting
   1204 to store things in both directions, but to provide a single coherent
   1205 buffer model for both ways of storing buffers without making a special
   1206 case. This will allow StreamBuffers to co-exist with FlatBuffers without
   1207 any change, and it will allow procedures to build larger buffers to make
   1208 their own changes on how to build their subsection of the buffer.
   1209 
   1210 We still need to make allowance for keeping the buffer headers root
   1211 pointer implicit by context, at least as an option. Otherwise it is not,
   1212 in general, possible to start streaming buffer content before the entire
   1213 buffer is written.
   1214 
   1215 
   1216 ## Possible Future Features
   1217 
   1218 This is highly speculative, but documents some ideas that have been
   1219 floating in order to avoid incompatible custom extensions on the same
   1220 theme. Still these might never be implemented or end up being
   1221 implemented differently. Be warned.
   1222 
   1223 ### Force Align
   1224 
   1225 `force_align` attribute supported on fields of structs, scalars and
   1226 and vectors of fixed length elements.
   1227 
   1228 ### Mixins
   1229 
   1230 A mixin is a table or a struct that is apparently expanded into the table
   1231 or struct that contains it.
   1232 
   1233 Mixins add new accessors to make it appear as if the fields of the mixed
   1234 in type is copied into the containing type although physically this
   1235 isn't the case. Mixins can include types that themselves have mixins
   1236 and these mixed in fields are also expanded.
   1237 
   1238 A `mixin` is an attribute applied to table or a struct field when that
   1239 field has a struct or a table type. The binary format is unchanged and
   1240 the table or struct will continue to work as if it wasn't a mixin and
   1241 is therefore fully backwards compatible.
   1242 
   1243 Example:
   1244 
   1245     struct Position {
   1246         spawned: bool(required);
   1247         x: int;
   1248         y: int;
   1249     }
   1250 
   1251     table Motion {
   1252         place: Position(mixin);
   1253         vx: int = 0;
   1254         vy: int = 0;
   1255     }
   1256 
   1257     table Status {
   1258         title: string;
   1259         energy: int;
   1260         sleep: int;
   1261     }
   1262 
   1263     table NPC {
   1264         npcid: int;
   1265         motion: Motion(mixin);
   1266         stat: Status(mixin);
   1267     }
   1268 
   1269     table Rock {
   1270         here: Position(mixin);
   1271         color: uint32 = 0xa0a0a000;
   1272     }
   1273 
   1274     table Main {
   1275         npc1: NPC;
   1276         rock1: Rock;
   1277     }
   1278 
   1279     root_type Main;
   1280 
   1281 
   1282 Here the table NPC and Rock will appear with read accessors is if they have the fields:
   1283 
   1284     table NPC {
   1285         npcid: int;
   1286         spawned: bool(required);
   1287         x: int;
   1288         y: int;
   1289         vx: int = 0;
   1290         vy: int = 0;
   1291         title: string;
   1292         energy: int;
   1293         sleep: int;
   1294         place: Position;
   1295         motion: Motion(required);
   1296         stat: Status;
   1297     }
   1298 
   1299     table Rock {
   1300         spawned: bool(required);
   1301         x: int;
   1302         y: int;
   1303         here: Position(required);
   1304         color: uint32 = 0xa0a0a000;
   1305     }
   1306 
   1307     table Main {
   1308         npc1: NPC;
   1309         rock1: Rock;
   1310     }
   1311 
   1312     root_type Main;
   1313 
   1314 or in JSON:
   1315 
   1316     {
   1317         "npc1": {
   1318             "npcid": 1,
   1319             "spawned": true;
   1320             "x": 2,
   1321             "y": 3,
   1322             "vx": -4,
   1323             "vy": 5,
   1324             "title": "Monster",
   1325             "energy": 6,
   1326             "sleep": 0
   1327         },
   1328         "rock1": {
   1329             "spawned": false;
   1330             "x": 0,
   1331             "y": 0,
   1332             "color": 0xa0a0a000
   1333         }
   1334     }
   1335 
   1336 
   1337 
   1338 Note that there is some redundancy here which makes it possible to
   1339 access the mixed in fields in different ways and to perform type casts
   1340 to a mixed in type.
   1341 
   1342 A cast can happen through a generated function along the lines of
   1343 `npc.castToPosition()`, or via field a accessor `npc.getPlace()`.
   1344 
   1345 A JSON serializer excludes the intermediate container fields such as
   1346 `place` and `motion` in the example.
   1347 
   1348 A builder may choose to only support the basic interface and required
   1349 each mixed in table or struct be created separately. A more advanced
   1350 builder would alternatively accept adding fields directly, but would
   1351 error if a field is set twice by mixing up the approaches.
   1352 
   1353 If a mixed type has a required field, the required status propagates to
   1354 the parent, but non-required siblings are not required as can be seen in
   1355 the example above.
   1356 
   1357 Mixins also places some constraints on the involved types. It is not
   1358 possible to mix in the same type twice because names would conflict and
   1359 it would no longer be possible to do trivially cast a table or struct
   1360 to one of its kinds. An empty table could be mixed in to
   1361 provide type information but such a type can also only be added once.
   1362 
   1363 Mixing in types introduces the risk of name conflicts. It is not valid
   1364 to mix in a type directly or indirectly in a way that would lead to
   1365 conflicting field names in a containing type.
   1366 
   1367 Note that in the example it is not possible to mix in the pos struct
   1368 twice, otherwise we could have mixed in a Coord class twice to have
   1369 position and velocity, but in that case it would be natural to
   1370 have two fields of Coord type which are not mixed in.
   1371 
   1372 Schema evolution is fully supported because the vtable and field id's
   1373 are kept separate. It is possible to add new fields to any type that
   1374 is mixed in. However, adding fields could lead to name conficts
   1375 which are then reported by the schema compiler.
   1376 
   1377 
   1378 [eclectic.fbs]: https://github.com/dvidelabs/flatcc/blob/master/doc/eclectic.fbs