binary-format.md (59658B)
1 # FlatBuffers Binary Format 2 3 4 <!-- vim-markdown-toc GFM --> 5 6 * [Overview](#overview) 7 * [Memory Blocks](#memory-blocks) 8 * [Example](#example) 9 * [Primitives](#primitives) 10 * [Numerics](#numerics) 11 * [Boolean](#boolean) 12 * [Format Internal Types](#format-internal-types) 13 * [Scalars](#scalars) 14 * [Structs](#structs) 15 * [Internals](#internals) 16 * [Type Hashes](#type-hashes) 17 * [Conflicts](#conflicts) 18 * [Type Hash Variants](#type-hash-variants) 19 * [Unions](#unions) 20 * [Optional Fields](#optional-fields) 21 * [Alignment](#alignment) 22 * [Default Values and Deprecated Values](#default-values-and-deprecated-values) 23 * [Schema Evolution](#schema-evolution) 24 * [Keys and Sorting](#keys-and-sorting) 25 * [Size Limits](#size-limits) 26 * [Verification](#verification) 27 * [Risks](#risks) 28 * [Nested FlatBuffers](#nested-flatbuffers) 29 * [Fixed Length Arrays](#fixed-length-arrays) 30 * [Big Endian FlatBuffers](#big-endian-flatbuffers) 31 * [StructBuffers](#structbuffers) 32 * [StreamBuffers](#streambuffers) 33 * [Bidirectional Buffers](#bidirectional-buffers) 34 * [Possible Future Features](#possible-future-features) 35 * [Force Align](#force-align) 36 * [Mixins](#mixins) 37 38 <!-- vim-markdown-toc --> 39 40 41 ## Overview 42 43 ## Memory Blocks 44 45 A FlatBuffers layout consists of the following types of memory blocks: 46 47 - header 48 - table 49 - vector 50 - string 51 - vtable 52 - (structs) 53 54 Each of these have a contigous memory layout. The header references the 55 root table. Every table references a vtable and stores field in a single 56 block of memory. Some of these fields can be offsets to vectors, strings 57 and vectors. Vectors are one-dimensional blocks where each element is 58 self-contained or stores reference to table or a string based on schema 59 type information. A vtable decides which fields are present in table and 60 where they are stored. Vtables are the key to support schema evolution. 61 A table has no special rules about field ordering except fields must be 62 properly aligned, and if they are ordered by size the will pack more 63 densely but possibly more complicated to construct and the schema may 64 provide guidelines on the preferred order. Two vtables might mean 65 different things but accidentally have the same structure. Such vtables 66 can be shared between different tables. vtable sharing is important when 67 vectors store many similar tables. Structs a dense memory regions of 68 scalar fields and smaller structs. They are mostly found embedded in 69 tables but they are independent blocks when referenced from a union or 70 union vector, or when used as a buffer root. Structs hold no references. 71 72 Space between the above blocks are zero padded and present in order to 73 ensure proper alignment. Structs must be packed as densely as possible 74 according the alignment rules that apply - this ensures that all 75 implementations will agree on the layout. The blocks must not overlap in 76 memory but two blocks may be shared if they represent the same data such 77 as sharing a string. 78 79 FlatBuffers are constructed back to front such that lower level objects 80 such as sub-tables and strings are created first in stored last and such 81 that the root object is stored last and early in the buffer. See also 82 [Stream Buffers](#stream-buffers) for a proposed variation over this 83 theme. 84 85 All addressing in FlatBuffers are relative. The reason for this? When 86 navigating the buffer you don't need to store the buffer start or the 87 buffer length, you can also find a new reference relative the reference 88 you already have. vtables require the table location to find a field, 89 but a table field referencing a string field only needs the table field 90 location, not the table start in order to find the referenced string. 91 This results in efficient navigation. 92 93 ## Example 94 95 Uoffsets add their content to their address and are positive while a 96 tables offset to its vtable is signed and is substracted. A vtables 97 element is added to the start of the table referring to the vtable. 98 99 100 Schema ([eclectic.fbs]) : 101 102 namespace Eclectic; 103 104 enum Fruit : byte { Banana = -1, Orange = 42 } 105 table FooBar { 106 meal : Fruit = Banana; 107 density : long (deprecated); 108 say : string; 109 height : short; 110 } 111 file_identifier "NOOB"; 112 root_type FooBar; 113 114 JSON : 115 116 { "meal": "Orange", "say": "hello", "height": -8000 } 117 118 Buffer : 119 120 header: 121 122 +0x0000 00 01 00 00 ; find root table at offset +0x00000100. 123 +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier 124 125 ... 126 127 table: 128 129 +0x0100 e0 ff ff ff ; 32-bit soffset to vtable location 130 ; two's complement: 2^32 - 0xffffffe0 = -0x20 131 ; effective address: +0x0100 - (-0x20) = +0x0120 132 +0x0104 00 01 00 00 ; 32-bit uoffset string field (FooBar.say) 133 ; find string +0x100 = 256 bytes _from_ here 134 ; = +0x0104 + 0x100 = +0x0204. 135 +0x0108 42d ; 8-bit (FooBar.meal) 136 +0x0109 0 ; 8-bit padding 137 +0x010a -8000d ; 16-bit (FooBar.height) 138 +0x010c ... ; (first byte after table end) 139 140 ... 141 142 vtable: 143 144 +0x0120 0c 00 ; vtable length = 12 bytes 145 +0x0122 0c 00 ; table length = 12 bytes 146 +0x0124 08 00 ; field id 0: +0x08 (meal) 147 +0x0126 00 00 ; field id 1: <missing> (density) 148 +0x0128 04 00 ; field id 2: +0004 (say) 149 +0x012a 0a 00 ; field id 3: +0x0a (height) 150 151 ... 152 153 string: 154 155 +0x0204 05 00 00 00 ; vector element count (5 ubyte elements) 156 +0x0208 'h' 'e' ; vector data 157 +0x020a 'l' 'l' ; vector data 158 +0x020c 'o' ; vector data 159 +0x020d 00 ; zero termination 160 ; special case for string vectors 161 162 ... 163 164 165 Actual FlatCC generated buffer : 166 167 Eclectic.FooBar: 168 0000 08 00 00 00 4e 4f 4f 42 e8 ff ff ff 08 00 00 00 ....NOOB........ 169 0010 2a 00 c0 e0 05 00 00 00 68 65 6c 6c 6f 00 00 00 *.......hello... 170 0020 0c 00 0c 00 08 00 00 00 04 00 0a 00 ............ 171 172 173 Note that FlatCC often places vtables last resulting in `e0 ff ff ff` 174 style vtable offsets, while Googles flatc builder typically places them 175 before the table resulting in `20 00 00 00` style vtable offsets which 176 might help understand why the soffset is subtracted from and not added 177 to the table start. Both forms are equally valid. 178 179 180 ## Primitives 181 182 Primitives are data used to build more complex structures such as 183 strings, vectors, vtables and tables. Although structs are not strictly 184 a primitive, it helps to view them as self-contained primitives. 185 186 187 ### Numerics 188 189 FlatBuffers are based on the following primitives that are 8, 16, 32 and 190 64 bits in size respectively (IEEE-754, two's complement, little endian): 191 192 uint8, uint16, uint32, uint64 (unsigned) 193 int8, int16, int32, int64 (two's complement) 194 float32, float64 (IEEE-754) 195 196 197 ### Boolean 198 199 flatbuffers_bool (uint8) 200 flatbuffers_true (flatbuffers_bool assign as 1, read as != 0) 201 flatbuffers_false (flatbuffers_bool = 0) 202 203 Note that a C99 `bool` type has no defined size or sign so it is not an 204 exact representation of a flatbuffers boolean encoding. 205 206 When a stored value is interpreted as boolean it should not be assumed 207 to be either 1 or 0 but rather as `not equal to 0`. When storing a 208 boolean value or when converting a boolean value to integer before 209 storing, the value should be 1 for true and 0 for false, In C this can 210 be done using `!!x`. 211 212 ### Format Internal Types 213 214 flatbuffers_union_type_t (uint8, NONE = 0, 0 <= type <= 255) 215 flatbuffers_identifier_t (uint8[4]) 216 flatbuffers_uoffset_t (uin32) 217 flatbuffers_soffset_t (int32), 218 flatbuffers_voffset_t (uint16) 219 220 These types can change in FlatBuffer derived formats, but when we say 221 FlatBuffers without further annotations, we mean the above sizes in 222 little endian encoding. We always refer to specific field type and not a 223 specific field size because this allows us to derive new formats easily. 224 225 226 ### Scalars 227 228 To simpify discussion we use the term scalars to mean integers, boolean, 229 floating point and enumerations. Enumerations always have an underlying 230 signed or unsigned integer type used for its representation. 231 232 Scalars are primitives that has a size between 1 and 8 bytes. Scalars 233 are stored aligned to the own size. 234 235 The format generally supports vectors and structs of any scalar type but 236 some language bindings might not have support for all combinations such 237 as arrays of booleans. 238 239 An special case is the encoding of a unions type code which internally 240 is an enumeration but it is not normally permitted in places where we 241 otherwise allow for scalars. 242 243 Another special case is enumerations of type boolean which may not be 244 widely supported, but possible. The binary format is not concerned with 245 this distinction because a boolean is just an integer at this level. 246 247 ### Structs 248 249 A struct is a fixed lengthd block of a fixed number of fields in a specific order 250 defined by a schema. A field is either a scalar, another struct or a fixed length 251 array of these, or a fixed length char array. A struct cannot contain fields that 252 contain itself directly or indirectly. A struct is self-contained and has no 253 references. A struct cannot be empty. 254 255 A schema cannot change the layout of a struct without breaking binary 256 compatibility, unlike tables. 257 258 When used as a table field, a struct is embedded within the table block 259 unless it is union value. A vector of structs are placed in a separate 260 memory block, similar to vectors of scalars. A vector of unions that has 261 a struct as member will reference the struct as an offset, and the 262 struct is then an independent memory block like a table. 263 264 FlatCC supports that a struct can be the root object of a FlatBuffer, but 265 most implementations likely won't support this. Structs as root are very 266 resource efficient. 267 268 Structs cannot have vectors but they can have fixed length array fields. which 269 are equivalent to stacking multiple non-array fields of the same type after each 270 other in a compact notation with similar alignment rules. Additionally arrays 271 can be of char type to have a kind of fixed length string. The char type is not 272 used outside of char arrays. A fixed length array can contain a struct that 273 contains one more fixed length arrays. If the char array type is not support, it 274 can be assumed to be a byte array. 275 276 277 ## Internals 278 279 All content is little endian and offsets are 4 bytes (`uoffset_t`). 280 A new buffer location is found by adding a uoffset to the location where 281 the offset is stored. The first location (offset 0) points to the root 282 table. `uoffset_t` based references are to tables, vectors, and strings. 283 References to vtables and to table fields within a table have a 284 different calculations as discussed below. 285 286 A late addition to the format allow for adding a size prefix before the 287 standard header. When this is done, the builder must know about so it 288 can align content according to the changed starting position. Receivers 289 must also know about the size field just as they must know about the 290 excpected buffer type. 291 292 The next 4 bytes (`sizeof(uoffset_t)`) might represent a 4 byte buffer 293 identifier, or it might be absent but there is no obvious way to know 294 which. The file identifier is typically ASCII characters from the 295 schemas `file_identifier` field padded with 0 bytes but may also contain 296 any custom binary identifier in little endian encoding. See 297 [Type-Hashes](#type-hashes). The 0 identifier should be avoided because 298 the buffer might accidentally contain zero padding when an identifier is 299 absent and because 0 can be used by API's to speficy that no identifier 300 should be stored. 301 302 When reading a buffer, it should be checked that the length is at least 303 8 bytes (2 * `sizeof(uoffset_t)`) Otherwise it is not safe to check the 304 file identifier. 305 306 The root table starts with a 4 byte vtable offset (`soffset_t`). The 307 `soffset_t` has the same size as `uoffset_t` but is signed. 308 309 The vtable is found by *subtracting* the signed 4 byte offset to the 310 location where the vtable offset is stored. Note that the `FlatCC` 311 builder typically stores vtables at the end of the buffer (clustered 312 vtables) and therefore the vtable offset is normally negative. Other 313 builders often store the vtable before the table unless reusing an 314 existing vtable and this makes the soffset positive. 315 (Nested FlatBuffers will not store vtables at the end because it would 316 break compartmentalization). 317 318 The vtable is a table of 2 byte offsets (`voffset_t`). The first two 319 entreis are the vtable size in bytes and the table size in bytes. The 320 next offset is the vtable entry for table field 0. A vtable will always 321 have as many entries as the largest stored field in the table, but it 322 might skip entries that are not needed or known (version independence) - 323 therefore the vtable length must be checked. The table length is only 324 needed by verifiers. A vtable lookup of an out of range table id is the 325 same as a vtable entry that has a zero entry and results in a default 326 value for the expected type. Multiple tables may shared the same vtable, 327 even if they have different types. This is done via deduplication during 328 buffer construction. A table field id maps to a vtable offset using the 329 formula `vtable-start + sizeof(voffset_t) * (field-id + 2)`, iff the 330 result is within the vtable size. 331 332 The vtable entry stores a byte offset relative to the location where the 333 `soffset_t` where stored at the start of the table. A table field is 334 stored immediately after the `soffset_t` or may contain 0 padding. A 335 table field is always aligned to at least its own size, or more if the 336 schema demands it. Strings, sub-tables and vectors are stored as a 337 4-byte `uoffset_t`. Such sub-elements are always found later in the 338 buffer because `uoffset_t` is unsigned. A struct is stored in-place. 339 Conceptually a struct is not different from an integer in this respect, 340 just larger. 341 342 If a sub-table or vector is absent and not just without fields, 0 343 length, the containing table field pointing the sub-table or vector 344 should be absent. Note that structs cannot contain sub-tables or 345 vectors. 346 347 A struct is always 0 padded up to its alignment. A structs alignment 348 is given as the largest size of any non-struct member, or the alignment 349 of a struct member (but not necessarily the size of a struct member), or 350 the schema specified aligment. 351 352 A structs members are stored in-place so the struct fields are known 353 via the the schema, not via information in the binary format 354 355 An enum is represent as its underlying integer type and can be a member 356 of structs, fields and vectors just like integer and floating point 357 scalars. 358 359 A table is always aligned to `sizeof(uoffset_t)`, but may contain 360 internal fields with larger alignment. That is, the table start or end 361 are not affected by alignment requirements of field members, unlike 362 structs. 363 364 A string is a special case of a vector with the extra requirement that a 365 0 byte must follow the counted content, so the following also applies to 366 strings. 367 368 A vector starts with a `uoffset_t` field the gives the length in element 369 counts, not in bytes. Table fields points to the length field of a 370 vector, not the first element. A vector may have 0 length. Note that the 371 FlatCC C binding will return a pointer to a vectors first element which 372 is different from the buffer internal reference. 373 374 All elements of a vector has the same type which is either a scalar type 375 including enums, a struct, or a uoffset reference where all the 376 reference type is to tables all of the same type, all strings, or, 377 for union vectors, references to members of the same union. Because a 378 union member can be a struct, it is possible to have vectors that 379 reference structs instead of embeddding them, but only via unions. It is 380 not possible to have vectors of vectors other than string vectors, 381 except indirectly via vectors containing tables. 382 383 A vectors first element is aligned to the size of `uoffset_t` or the 384 alignment of its element type, or the alignment required by the schema, 385 whichever is larger. Note that the vector itself might be aligned to 4 386 bytes while the first element is aligned to 8 bytes. 387 388 (Currently the schema semantics do no support aligning vectors beyond 389 their element size, but that might change and can be forced when 390 building buffers via dedicated api calls). 391 392 Strings are stored as vectors of type `ubyte_t`, i.e. 8-bit elements. In 393 addition, a trailing zero is always stored. The trailing zero is not 394 counted in the length field of the vector. Technically a string is 395 supposed to be valid UTF-8, but in praxis it is permitted that strings 396 contain 0 bytes or other invalid sequences. It is recommended to 397 explicitly check strings for UTF-8 conformance when this is required 398 rather than to expect this to alwways be true. However, the ubyte vector 399 should be preferred for binary data. 400 401 A string, vector or table may be referenced by other tables and vectors. 402 This is known as a directed acyclic graph (DAG). Because uoffsets are 403 unsigned and because uoffsets are never stored with a zero value (except 404 for null entries in union vectors), it is not possible for a buffer to 405 contain cycles which makes them safe to traverse with too much fear of 406 excessive recursion. This makes it possible to efficiently verify that a 407 buffer does not contain references or content outside of its expected 408 boundaries. 409 410 A vector can also hold unions, but it is not supported by all 411 implementations. A union vector is in reality two separate vectors: a 412 type vector and an offset vector in place of a single unions type and 413 value fields in table. See unions. 414 415 416 ## Type Hashes 417 418 A type hash is a 32-bit value defined as the fnv1a32 hash of a tables or 419 a structs fully qualified name. If the fnv1a32 hash returns 0 it should 420 instead hash the empty string. 0 is used to indicate that a buffer 421 should not store an identifier. 422 423 Every table and struct has a name and optionally also a namespace of one 424 or more levels. The fully qualified name is optional namespace followed 425 by the type name using '.' as a separator. For example 426 "MyGame.Sample.Monster", or "Eclectic.FooBar". 427 428 The type hash can be used as the buffer identifier instead of the schema 429 provided `file_identifier`. The type hash makes it possible to 430 distinguish between different root types from the same schema, and even 431 across schema as long as the namespace is unique. 432 433 Type hashes introduce no changes to the binary format but the application 434 interface must choose to support user defined identifiers or explicitly 435 support type hashes. Alternatively an application can peak directly into 436 the buffer at offset 4 (when `uoffset_t` is 4 bytes long). 437 438 FlatCC generates the following identifier for the "MyGame.Sample.Monster" 439 table: 440 441 #define MyGame_Sample_Monster_type_hash ((flatbuffers_thash_t)0xd5be61b) 442 #define MyGame_Sample_Monster_type_identifier "\x1b\xe6\x5b\x0d" 443 444 But we can also 445 [compute one online](https://www.tools4noobs.com/online_tools/hash/) 446 for our example buffer: 447 448 fnv1a32("Eclectic.FooBar") = 0a604f58 449 450 Thus we can open a hex editor and locate 451 452 +0x0000 00 01 00 00 ; find root table at offset +0x00000100. 453 +0x0004 'N', 'O', 'O', 'B' ; possibly our file identifier 454 455 and replace it with 456 457 +0x0000 00 01 00 00 ; find root table at offset +0x00000100. 458 +0x0004 58 4f 60 0a ; very likely our file identifier identifier 459 460 or generate it with `flatcc`: 461 462 $ bin/flatcc --stdout doc/eclectic.fbs | grep FooBar_type 463 #define Eclectic_FooBar_type_hash ((flatbuffers_thash_t)0xa604f58) 464 #define Eclectic_FooBar_type_identifier "\x58\x4f\x60\x0a" 465 466 467 The following snippet implements fnv1a32, and returns the empty string 468 hash if the hash accidentially should return 0: 469 470 471 static inline flatbuffers_thash_t flatbuffers_type_hash_from_name(const char *name) 472 { 473 uint32_t hash = 2166136261UL; 474 while (*name) { 475 hash ^= (uint32_t)*name; 476 hash = hash * 16777619UL; 477 ++name; 478 } 479 if (hash == 0) { 480 hash = 2166136261UL; 481 } 482 return hash; 483 } 484 485 ### Conflicts 486 487 It is possible to have conficts between two type hashes although the 488 risk is small. Conflicts are not important as long as an application can 489 distinguish between all the types it may encouter in actual use. A 490 switch statement in C will error at compile time for two cases that have 491 the same value, so the problem is easily detectable and fixable by 492 modifying a name or a namespace. 493 494 For global conflict resolution, a type should be identified by its fully 495 qualified name using adequate namespaces. This obviously requires it to 496 be stored separate from the buffer identifier due to size constraints. 497 498 499 ### Type Hash Variants 500 501 If an alternative buffer format is used, the type hash should be 502 modified. For example, if `uoffset_t` is defined as a 64-bit value, the 503 fnv1a64 hash should be used instead. For big endian variants the hash 504 remains unchanged but is byteswapped. The application will use the same 505 id while the acces layer will handle the translation. 506 507 For buffers using structs as roots, the hash remains unchanged because 508 the struct is a unique type in schema. In this way a receiver that does 509 not handle struct roots can avoid trying to read the root as a table. 510 511 For futher variations of the format, a format identifier can be inserted 512 in front of the namespace when generating the hash. There is no formal 513 approach to this, but as an example, lets say we want to use only 1 byte 514 per vtable entry and identify these buffers with type hash using the 515 prefix "ebt:" for example buffer type. We then have the type hash: 516 517 #define type_hash_prefix "ebt:" 518 519 hash = fnv1a32(type_hash_prefix "Eclectic.FooBar"); 520 hash = hash ? hash : fnv1a32(type_hash_prefix); 521 522 If the hash returns 0 we hash the prefix. 523 524 525 ## Unions 526 527 A union is a contruction on top of the above primitives. It consists of 528 a type and a value. 529 530 In the schema a union type is a set of table types with each table name 531 assigned a type enumeration starting from 1. 0 is the type NONE meaning 532 the union has not value assigned. The union type is represented as a 533 ubyte enum type, or in the binary format as a value of type 534 `union_type_t` which for standard FlatBuffers is an 8-bit unsigned code 535 with 0 indicating the union stores not value and a non-zero value 536 indicating the type of the stored union. 537 538 A union is stored in a table as a normal sub-table reference with the 539 only difference being that the offset does not always point to a table 540 of the same type. The 8-bit union type is stored as a separate table 541 field conventially named the same as the union value field except for a 542 `_type` suffix. The value (storing the table offset) MUST have a field 543 ID that is exactly one larger than the type field. If value field is 544 present the type field MUST also be present. If the type is NONE the 545 value field MUST be absent and the type field MAY be absent because a 546 union type always defaults to the value NONE. 547 548 Vectors of unions is a late addition (mid 2017) to the FlatBuffers 549 format. FlatCC supports union vectors as of v0.5.0. 550 551 Vectors of unions have the same two fields as normal unions but they 552 both store a vector and both vectors MUST have the same length or both 553 be absent from the table. The type vector is a vector of 8-bit enums and 554 the value vector is a vector of table offsets. Obviously each type 555 vector element represents the type of the table in the corresponding 556 value element. If an element is of type NONE the value offset must be 557 stored as 0 which is a circular reference. This is the only offset that 558 can have the value 0. 559 560 A later addition (mid 2017) to the format allows for structs and strings 561 to also be member of a union. A union value is always an offset to an 562 independent memory block. For strings this is just the offset to the 563 string. For tables it is the offset to the table, naturally, and for 564 structs, it is an offset to an separate aligned memory block that holds 565 a struct and not an offset to memory inside any other table or struct. 566 FlatCC supports mixed type unions and vectors of these as of v0.5.0. 567 568 ## Optional Fields 569 570 As of mid 2020 the FlatBuffers format introduced optional scalar table fields. 571 There is no change to the binary schema, but the semantics have changed slightly 572 compared to ordinary scalar fields (which remain supported as is): If an 573 optional field is not stored in a table, it is considered to be a null value. An 574 optinal scalar field will have null as its default value, so any representable 575 scalar value will always be stored in the buffer, unlike other scalar fields 576 which by default do not store the field if the value matches the default numeric 577 value. This was already possible before by using `force_add` semantics to force 578 a value to be written even if it was matching the default value, and by 579 providing an `is_present` test when reading a value so that it would be possible 580 to distinguish between a value that happened to be a default value, and a value 581 that was actually absent. However, such techniques were ad-hoc. Optional 582 fields formalize the semantics of null values for scalars. Other field types 583 already have meaningful null values. Only table fields can be optional so struct 584 fields must always assign a value to all members. 585 586 ## Alignment 587 588 All alignments are powers of two between 1 and 256. Large alignments are 589 only possible via schema specified alignments. The format naturally has 590 a maximum alignment of the largest scalar it stores, which is a 64-bit 591 integer or floating point value. Because C malloc typically returns 592 buffers aligned to a least 8 bytes, it is often safe to place buffers in 593 heap allocated memory, but if the target system does not permit 594 unaligned access, or is slow on unaligned access, a buffer should be 595 placed in sufficiently aligned memory. Typically it is a good idea to 596 place buffer in cacheline aligned boundary anyway. 597 598 A buffers alignment is the same as the largest alignment of any 599 object or struct it contains. 600 601 The buffer size is not guaranteed to be aligned to its own alignment 602 unlike struct. Googles `flatc` builder does this, at least when size 603 prefixed. The `FlatCC` tool currently does not, but it might later add 604 an option to pad up to alignment. This would make it simpler to stack 605 similar typed buffers in file - but users can retrieve the buffers 606 alignment and do this manually. Thus, when stacking size prefixed 607 buffers, each buffer should start aligned to its own size starting at 608 the size field, and should also be zero padded up to its own alignment. 609 610 611 ## Default Values and Deprecated Values 612 613 A table can can omit storing any field that is not a required field. For 614 strings, vectors and tables this result in returning a null value 615 different from an empty value when reading the buffer. Struct fields not 616 present in table are also returned as null. 617 618 All fields that can return null do not have a default value. Other 619 values, which are integers, floats and enumerations, can all have 620 default values. The default value is returned if not found in the table. 621 622 If a default value is not specified the default defaults to zero for the 623 corresponding type. If an enumeration does not have a 0 value and no 624 explicit default value, this is a schema error. 625 626 When building a buffer the builder will compare a stored field to the 627 known default value. If it matches the field will simple be skipped. 628 Some builder API's makes it possible to force a default value to be 629 stored and to check if a field is missing when reading the buffer. This 630 can be used to handle NULL values differently from default or missing 631 values. 632 633 A deprecated field is a schema construct. The binary format either stores 634 a table field, or it does not. 635 636 A deprecated field should be treated as not available, as in no way to 637 read the value as opposed to returning a default value regardless of 638 whether the field is present or not. If they for some reason are made 639 accessible the verifier must also understand and verify these fields. 640 641 A deprecated field also cannot be written to a new buffer, although if 642 it against guidelines remains possible to do so, it should be done as if 643 the field was not deprecated. 644 645 Structs cannot have default values and cannot have deprecated fields in 646 stadard FlatBuffers. FlatCC supports marking a struct field as 647 deprecated. This implies the field will always be zeroed and with no 648 trivial accessors. A struct can never change size without breaking 649 support for schema evolution. 650 651 FlatCC JSON parsers allow structs to only set some values. Remaining 652 values will be implicitly zeroed. The C API for writing buffers do not 653 have this concern because a struct can just be written as a C struct 654 so there is no control over which fields a user choose to set or zero. 655 However, structs should be zeroed and padded where data is not otherwise 656 set. This makes it possible to hash and integrity check structs (though 657 this is not an integral part of the format). 658 659 660 ## Schema Evolution 661 662 A table has a known set of low-valued field identifiers. Any unused 663 field id can be used in a future version. If a field (as is normal) is 664 implicitly assigned an id, new fields can only be added at the end of 665 the table. Internally this translates into new versions getting ever 666 larger vtables. Note that vtables are only stored as large as needed for 667 the actual content of table, so a rarely used new field will not cause 668 vtables to grew when unused. 669 670 Enumarations may not change values in future versions. Unions may only 671 added new table names to the end of the union list. 672 673 Structs cannot change size nor content. They cannot evolve. FlatCC 674 permits deprecating fields which means old fields will be zeroed. 675 676 Names can be changed to the extend it make sense to the applications already 677 written around the schema, but it may still break applications relying 678 on some reflective information. For example, a reader may provide the 679 string representation of a numeric enum value. 680 681 New types can be added. For example adding a new table is always safe 682 as long as it does not conflict with any existing schemas using the same 683 namespace. 684 685 Required fields cannot stop being required and they cannot be deprecated. 686 687 Various attributes and other changes form a gray area that will not make 688 the binary format unsafe but may still break due to changes in code 689 generation, serialization to JSON, or similar. For example, a generated 690 constructor that creates a table from a list of positional arguments 691 might break if the field order changes or grows or have fields 692 deprecated. JSON parsers could cease to work on old formats if base64 693 serialization is added subsequently, and so on. 694 695 ## Keys and Sorting 696 697 Keys and sorting is a meta construct driven by the schema. The binary 698 format has no special concept of keys and sorting and a vector can be 699 sorted by one of several keys so it makes no sense to enforce a specific 700 order. 701 702 The basic FlatBuffers format only permit at most one key and generally 703 sorts vectors by that key during buffer construction. FlatCC does not do 704 this both because sorting is not practical while building the buffer and 705 because FlatCC supports sorting by one of several keys. Thus, in general 706 it is not safe to assume that a vector is sorted, but it can be sorted 707 if needed. 708 709 ## Size Limits 710 711 A buffer should never be larger than `2^(sizeof(soffset_t) * 8 - 1) - 1` 712 or `2^31 - 1` i.e. 2GB for standard FlatBuffers. Beyond this safe it is 713 not safe to represent vtable offsets and implementations can no longer 714 use signed types to store `uoffset_t` values. This limit also ensures 715 that all vectors can be represented safely with a signed 32-bit length 716 type. 717 718 The application interface is likely to use a native type for 719 representing sizes and vector indices. If this type is smaller that 720 `sizeof(soffset_t)` or equivalently `sizeof(uoffset_t)` there is a risk 721 of overflow. The simplest way to avoid any issues is to limit the 722 accepted buffer size of the native size type. For example, on some 723 embedded microprocessor systems a C compiler might have a 16-bit int and 724 `size_t` type, even if it supports `uint32_t` as well. In this the safe 725 assumption is to limit buffers to `2^15 - 1` which is very likely more 726 than sufficient on such systems. 727 728 A builder API might also want to prevent vectors from being created when 729 they cannot stay within the size type of the platform when the element 730 size is multipled by the element count. This is deceiving because the 731 element count might easily be within range. Such issues will be rare in 732 praxis but they can be the source of magnificent vulnerabilites if not 733 handled appropriately. 734 735 736 ## Verification 737 738 Verification as discussed here is not just about implementing a 739 verifier. It is as much requirements that any builder must fulfill. 740 741 The objective of verification is to make it safe to read from an 742 untrusted buffer, or a trusted buffer that accidentally has an 743 unexpected type. The verifier does not guarantee that the type is indeed 744 the expeced type exact it can read the buffer identifier if present 745 which is still no guarantee. The verifier cannot make it safe to modify 746 buffers in-place because the cost of doing such a check would be 747 prohitive in the average case. 748 749 A buffer verifier is expected to verify that all objects (strings, 750 vectors and tables) do not have an end position beyond the externally 751 specified buffer size and that all offset are aligned relative to offset 752 zero, and sometimes also relative to actually specified buffer (but 753 sometimes it is desireable to verify buffers that are not stored 754 aligned, such as in network buffers). 755 756 A verifier primarily checks that: 757 758 - the buffer is at least `2 * sizeof(uoffset_t)` such that the root 759 root table and buffer identifier can be checked safely. 760 - any offset being chased is inside the buffer that any data accessed 761 from that resuling location is entirely inside the buffer and aligned 762 notably the vtables first entry must be valid before the vtable can 763 safely validate itself, but this also applies to table, string and 764 vector fields. 765 - any uoffset has size of at least `sizeof(uoffse_t)` to aviod 766 self-reference and is no larger than the largest positive soffset_t 767 value of same size - this ensures that implementations can safely add 768 uoffset_t even if converting to signed first. It also, incidentally, 769 ensures compatibility with StreamBuffers - see below. 770 - vtable size is aligned and does not end outside buffer. 771 - vtable size is at least the two header fields 772 (`2 * `sizeof(voffset_t)`). 773 - required table fields are present. 774 - recursively verify all known fields and ignore other fields. Unknown 775 fields are vtable entries after the largest known field ID of a table. 776 These should be ignored in order to support forward versioning. 777 - deprecated fields are valid if accessors are available to do so, or 778 ignore if the there is no way to access the field by application code. 779 - vectors end within the buffer. 780 - strings end within the buffer and has a zero byte after the end which 781 is also within the buffer. 782 - table fields are aligned relative to buffer start - both structs, 783 scalars, and offset types. 784 - table field size is aligned relative to field start. 785 - any table field does not end outside the tables size as given by the 786 vtable. 787 - table end (without chasing offsets) is not outside buffer. 788 - all data referenced by offsets are also valid within the buffer 789 according the type given by the schema. 790 - verify that recursion depth is limited to a configurable acceptable 791 level for the target system both to protect itself and such that 792 general recursive buffer operations need not be concerned with stack 793 overflow checks (a depth of 100 or so would normally do). 794 - verify that if the union type is NONE the value (offset) field is absent and 795 if it is not NONE that the value field is present. If the union type 796 is known, the table should be verified. If the type is not known 797 the table field should be ignored. A reader using the same schema would 798 see the union as NONE. An unknown union is not an error in order to 799 support forward versioning. 800 - verifies the union value according to the type just like any other 801 field or element. 802 - verify that a union vector always has type vector if the offset vector 803 is present and vice versa. 804 805 A verifier may choose to reject unknown fields and union types, but this 806 should only be an user selectable option, otherwise schema evolution 807 will not be possible. 808 809 A verifier needs to be very careful in how it deals with overflow and 810 signs. Vector elements multiplied by element size can overflow. Adding 811 an invalid offset might become valid due to overflow. In C math on 812 unsigned types yield predictable two's complement overflow while signed 813 overflow is undefined behavior and can and will result in unpredictable 814 values with modern optimizing compilers. The upside is that if the 815 verifier handles all this correctly, the application reader logic can be 816 much simpler while staying safe. 817 818 819 A verifier does __not__ enforce that: 820 821 - structs and other table fields are aligned relative to table start because 822 tables are only aligned to their soffset field. This means a table cannot be 823 copied naively into a new buffer even if it has no offset fields. 824 - the order of individual fields within a table. Even if a schema says 825 something about ordering this should be considered advisory. A 826 verifier may additionally check for ordering for specific 827 applications. Table order does not affect safety nor general buffer 828 expectations. Ordering might affect size and performance. 829 - sorting as specified by schema attributes. A table might be sorted in 830 different ways and an implementation might avoid sorting for 831 performance reasons and other practical reasons. A verifier may, 832 however, offer additional verification to ensure specific vectors or 833 all vectors are sorted according to schema or other guidelines. Lack 834 of sorting might affect expected behavior but will not make the buffer 835 unsafe to access. 836 - that structures do not overlap. Overlap can result in vulnerabilities 837 if a buffer is modified, and no sane builder should create overlaps 838 other than proper DAGs except via a separate compression/decopression 839 stage of no interest to the verifier. 840 - strings are UTF-8 compliant. Lack of compliance may affect expections 841 or may make strings appear shorter or garbled. Worst case a naive 842 UTF-8 reader might reach beyond the end when observing a lead byte 843 suggest data after buffer end, but such a read should discover the 844 terminal 0 before things get out of hand. Being relaxed permits 845 specific use cases where UTF-8 is not suitable without giving up the 846 option to verify buffers. Implementations can add additional 847 verification for specific usecases for example by providing a 848 strict-UTF8 flag to a verifier or by verifying at the application 849 level. This also avoids unnecessary duplicate validation, for example 850 when an API first verifies the buffer then converts strings to an 851 internal heap representation where UTF-8 is validated anyway. 852 - default values are not stored. It is common to force default values to 853 be stored. This may be used to implement NULL values as missing 854 values different from default values. 855 - enum values are only among the enum values of its type. There are many 856 use cases where it is convenient to add enum values for example flags 857 or enums as units e.g. `2 * kilo + 3 * ounce. More generally ordinary 858 integers may have value range restrictions which is also out of scope 859 for verifier. An application may provide additional verification when 860 requested. 861 862 More generally we can say that a verifier is a basic fast assurance that 863 the buffer is safe to access. Any additional verification is application 864 specific. The verifier makes it safe to apply secondary validation. 865 Seconary validation could be automated via schema attributes and may be 866 very useful as such, but it is a separate problem and out of scope for a 867 core binary format verifier. 868 869 A buffer identifier is optional so the verifier should be informed 870 whether an identifier must match a given id. It should check both ASCII 871 text and zero padding not just a string compare. It is non-trivial to 872 decide if the second buffer field is an identifier, or some other data, 873 but if the field does not match the expected identifier, it certainly 874 isn't what is expected. 875 876 Note that it is not entirely trivial to check vector lengths because the 877 element size must be mulplied by the stored element count. For large 878 elements this can lead to overflows. 879 880 881 ## Risks 882 883 Because a buffer can contain DAGs constructed to explode exponentiall, 884 it can be dangerous to print JSON or otherwise copy content blindly 885 if there is no upper limit on the export size. 886 887 In-place modification cannot be trusted because a standard buffer 888 verifier will detect safe read, but will not detect if two objects are 889 overlapping. For example, a table could be stored inside another table. 890 Modifing one table might cause access to the contained table to go out 891 of bounds, for example by directing the vtable elsewhere. 892 893 The platform native integer and size type might not be able to handle 894 large FlatBuffers - see [Size Limits](#size-limits). 895 896 Becaue FlatCC requires buffers to be sorted after builiding, there is 897 risk due to buffer modifications. It is not sufficient to verify buffers 898 after sorting because sorting is done inline. Therefore buffers must be 899 trusted or rewritten before sorting. 900 901 902 ## Nested FlatBuffers 903 904 FlatBuffers can be nested inside other FlatBuffers. In concept this is 905 very simple: a nested buffer is just a chunk of binary data stored in a 906 `ubyte` vector, typically with some convenience methods generated to 907 access a stored buffer. In praxis it adds a lot of complexity. Either 908 a nested buffer must be created strictly separately and copied in as 909 binary data, but often users mix the two builder contexts accidentally 910 storing strings from one buffer inside another. And when done right, the 911 containing ubyte vector might not be aligned appropriately making it 912 invalid to access the buffer without first copying out of the containing 913 buffer except where unaligned access is permitted. Further, a nested 914 FlatBuffer necessarily has a length prefix because any ubyte vector has 915 a length field at its start. Therefore, size prefixed flatbuffers should 916 not normally be stored as nested buffers, but sometimes it is necessary 917 in order have the buffer properly aligned after extraction. 918 919 The FlatCC builder makes it possible to build nested FlatBuffers while 920 the containing table of the parent buffer is still open. It is very 921 careful to ensure alignment and to ensure that vtables are not shared 922 between the two (or more) buffers, otherwise a buffer could not safely 923 be copied out. Users can still make mistakes by storing references from 924 one buffer in another. 925 926 Still, this area is so complex that several bugs have been found. 927 Thus, it is advice to use nested FlatBuffers with some care. 928 929 On the other hand, nested FlatBuffers make it possible to trivially 930 extract parts of FlatBuffer data. Extracting a table would require 931 chasing pointers and could potentially explode due to shared sub-graphs, 932 if not handled carefully. 933 934 ## Fixed Length Arrays 935 936 This feature can be seen as equivalent to repeating a field of the same type 937 multiple times in struct. 938 939 Fixed length array struct fields has been introduced mid 2019. 940 941 A fixed length array is somewhat like a vector of fixed length containing inline 942 fixed length elements with no stored size header. The element type can be 943 scalars, enums and structs but not other fixed length errors (without wrapping 944 them in a struct). 945 946 An array should not be mistaken for vector as vectors are independent objects 947 while arrays are not. Vectors cannot be fixed length. An array can store fixed 948 size arrays inline by wrapping them in a struct and the same applies to unions. 949 950 The binary format of a fixed length vector of length `n` and type `t` can 951 be precisely emulated by created a struct that holds exactly `n` fields 952 of type `t`, `n > 0`. This means that a fixed length array does not 953 store any length information in a header and that it is stored inline within 954 a struct. Alignment follows the structs alignment rules with arrays having the 955 same alignment as their elements and not their entire size. 956 957 The maximum length is limited by the maximum struct size and / or an 958 implementation imposed length limit. Flatcc accepts any array that will fit in 959 struct with a maximum size of 2^16-1 by default but can be compiled with a 960 different setting. Googles flatc implementation currently enforces a maximum 961 element count of 2^16-1. 962 963 Assuming the schema compiler computes the correct alignment for the overall 964 struct, there is no additonal work in verifying a buffer containing a fixed 965 length array because structs are verified based on the outermost structs size 966 and alignment without having to inspect its content. 967 968 Fixed lenght arrays also support char arrays. The `char` type is similar to the 969 `ubyte` or `uint8` type but a char can only exist as a char array like 970 `x:[char:10]`. Chars cannot exist as a standalone struct or table field, and 971 also not as a vector element. Char arrays are like strings, but they contain no 972 length information and no zero terminator. They are expected to be endian 973 neutral and to contain ASCII or UTF-8 encoded text zero padded up the array 974 size. Text can contain embedded nulls and other control characters. In JSON form 975 the text is printed with embedded null characters but stripped from trailing 976 null characters and a parser will padd the missing null characters. 977 978 979 The following example uses fixed length arrays. The example is followed by the 980 equivalent representation without such arrays. 981 982 struct Basics { 983 int a; 984 int b; 985 } 986 987 struct MyStruct { 988 x: int; 989 z: [short:3]; 990 y: float; 991 w: [Basics:2]; 992 name: [char:4]; 993 } 994 995 // NOT VALID - use a struct wrapper: 996 table MyTable { 997 t: [ubyte:2]; 998 m: [MyStruct:2]; 999 } 1000 1001 Equivalent representation: 1002 1003 struct MyStructEquivalent { 1004 x: int; 1005 z1: short; 1006 z2: short; 1007 z3: short; 1008 y: float; 1009 wa1: int; 1010 wa2: int; 1011 name1: ubyte; 1012 name2: ubyte; 1013 name3: ubyte; 1014 name4: ubyte; 1015 } 1016 1017 struct MyStructArrayEquivalent { 1018 s1: MyStructEquivalent; 1019 s2: MyStructEquivalent; 1020 } 1021 1022 struct tEquivalent { 1023 t1: ubyte; 1024 t2: ubyte; 1025 } 1026 1027 table MyTableEquivalent { 1028 t: tEquivalent; 1029 m: MyStructArrayEquivalent; 1030 } 1031 1032 1033 Note that forced zero-termination can be obtained by adding a trailing ubyte 1034 field since uninitialized struct fields should be zeroed: 1035 1036 struct Text { 1037 str: [char:255]; 1038 zterm: ubyte; 1039 } 1040 1041 Likewise a length prefix field could be added if the applications involved know 1042 how to interpret such a field: 1043 1044 struct Text { 1045 len: ubyte; 1046 str: [char:254]; 1047 zterm: ubyte; 1048 } 1049 1050 The above is just an example and not part of the format as such. 1051 1052 1053 ## Big Endian FlatBuffers 1054 1055 FlatBuffers are formally always little endian and even on big-endian 1056 platforms they are reasonably efficient to access. 1057 1058 However it is possible to compile FlatBuffers with native big endian 1059 suppport using the FlatCC tool. The procedure is out of scope for this 1060 text, but the binary format is discussed here: 1061 1062 All fields have exactly the same type and size as the little endian 1063 format but all scalars including floating point values are stored 1064 byteswapped within their field size. Offset types are also byteswapped. 1065 1066 The buffer identifier is stored byte swapped if present. For example 1067 the 4-byte "MONS" identifier becomes "SNOM" in big endian format. It is 1068 therefore reasonably easy to avoid accidentially mixing little- and 1069 big-endian buffers. However, it is not trivial to handle identifers that 1070 are not exactly 4-bytes. "HI\0\0" could be "IH\0\0" or "\0\0IH". It is 1071 recommended to always use 4-byte identifies to avoid this problem. See 1072 the FlatCC release 0.4.0 big endian release for details. 1073 1074 When using type hash identifiers the identifier is stored as a big 1075 endian encoded hash value. The user application will the hash in its 1076 native form and accessor code will do the conversion as for other 1077 values. 1078 1079 1080 ## StructBuffers 1081 1082 _NOTE: the Google FlatBuffer project originally documented structs as 1083 valid root objects, but never actually implemented it, and has as of mid 1084 2020 changed the specification to disallow root structs as covered in 1085 this section. FlatCC for C has been supporting root structs for a long 1086 time, and they can provide significant speed advantages, so FlatCC will 1087 continue to support these._ 1088 1089 Unlike tables, structs are are usually embedded in in a fixed memory 1090 block representing a table, in a vector, or embedded inline in other 1091 structs, but can also be independent when used in a union. 1092 1093 The root object in FlatBuffers is conventionally expected to be a table, 1094 but it can also be struct. FlatCC supports StructBuffers. Since structs 1095 do not contain references, such buffers are truly flat. Most 1096 implementations are not likely to support structs are root but even so 1097 they are very useful: 1098 1099 It is possible to create very compact and very fast buffers this way. 1100 They can be used where one would otherwise consider using manual structs 1101 or memory blocks but with the advantage of a system and language 1102 independent schema. 1103 1104 StructBuffers may be particularly interesting for the Big Endian 1105 variant of FlatBuffers for two reasons: the first being that performance 1106 likely matters when using such buffers and thus Big Endian platforms 1107 might want them. The second reason is the network byte order is 1108 traditionally big endian, and this has proven very difficult to change, 1109 even in new evolving IETF standards. StructBuffers can be used to manage 1110 non-trival big endian encoded structs, especially structs containing 1111 other structs, even when the receiver does not understand FlatBuffers as 1112 concept since the header can just be dropped or trivially documented. 1113 1114 1115 ## StreamBuffers 1116 1117 StreamBuffers are so far only a concept although some implementations 1118 may already be able to read them. The format is documented to aid 1119 possible future implementations. 1120 1121 StreamBuffers makes it possible to store partially completed buffers 1122 for example by writing directly to disk or by sending partial buffer 1123 data over a network. Traditional FlatBuffers require an extra copying 1124 step to make this possible, and if the writes are partial, each section 1125 written must also store the segment length to support reassembly. 1126 StreamBuffers avoid this problem. 1127 1128 StreamBuffers treat `uoffset_t` the same as `soffset_t` when the special 1129 limitation that `uoffset_t` is always negative when viewed as two's 1130 complement values. 1131 1132 The implication is that everthing a table or vector references must be 1133 stored earlier in the buffer, except vtables that can be stored freely. 1134 Existing reader implementations that treat `uoffset_t` as signed, such as 1135 Javascript, will be able to read StreamBuffers with no modification. 1136 Other readers can easily be modified by casting the uoffset value to 1137 signed before it. 1138 1139 Verifiers must ensure that any buffer verified always stores all 1140 `uoffset_t` with the same sign. This ensures the DAG structure is 1141 preserved without cycles. 1142 1143 The root table offset remains positive as an exception and points to the 1144 root buffer. A stream buffer root table will be the last table in the 1145 buffer. 1146 1147 The root table offset may be replaced with a root table offset at the 1148 end of the buffer instead of the start. An implementation may also 1149 zero the initial offset and update it later. In either case the buffer 1150 should be aligned accordingly. 1151 1152 Some may prefer the traditional FlatBuffer approach because the root 1153 table is stored and it is somehow easier or faster to access, but modern 1154 CPU cache systems do not care much about the order of access as long as 1155 as their is some aspect of locality of reference and the same applies to 1156 disk access while network access likely will have the end of the buffer 1157 in hot cache as this is the last sent. The average distance between 1158 objects will be the same for both FlatBuffers and StreamBuffers. 1159 1160 1161 ## Bidirectional Buffers 1162 1163 Bidirectional Buffers is a generalization of StreamBuffers and FlatBuffers 1164 and so far only exists as an idea. 1165 1166 FlatBuffers and StreamBuffers only allow one direction because it 1167 guarantees easy cycle rejection. Cycles are unwanted because it is 1168 expensive to verify buffers with cycles and because recursive navigation 1169 might never terminate. 1170 1171 As it happens, but we can also easily reject cycles in bidirectional 1172 buffers if we apply certain constraints that are fully backwards 1173 compatible with existing FlatBuffers that have a size below 2GB. 1174 1175 The rule is that when an offset changes direction relative to a parent 1176 objects direction, the object creating the change of direction becomes a 1177 new start or end of buffer for anything reachable via that offset. 1178 1179 The root table R is reached via forward reference from the buffer 1180 header. Its boundaries are itself and the buffer end. If the this table 1181 has an offset pointing back, for example to new table X, then table X 1182 must see the buffer start and R as two boundaries. X must not directly 1183 or indirectly reach outside this region. Likewise, if the table R points 1184 to new table Y in the forward direction, then Y is bounded by itself and 1185 the buffer end. 1186 1187 A lower bound is the end of a table or a vector althoug we just say the 1188 table or vector is a boundary. An upper bound is until the start of the 1189 table or vector, or the end of the buffer. 1190 1191 When a buffer is verified, the boundaries are initially the buffer start 1192 and buffer end. When the references from the root table are followed, 1193 there new boundaries are either buffer start to root table, or root 1194 table to end, depending on the offsets direction. And so forth 1195 recursively. 1196 1197 We can relax the requirement that simple vectors and vtables cannot 1198 cross these boundaries, except for the hard buffer limits, but it does 1199 compicate things and is likely not necessary. 1200 1201 Nested FlatBuffers already follow similar boundary rules. 1202 1203 The point of Bidirectional buffers is not that it would be interesting 1204 to store things in both directions, but to provide a single coherent 1205 buffer model for both ways of storing buffers without making a special 1206 case. This will allow StreamBuffers to co-exist with FlatBuffers without 1207 any change, and it will allow procedures to build larger buffers to make 1208 their own changes on how to build their subsection of the buffer. 1209 1210 We still need to make allowance for keeping the buffer headers root 1211 pointer implicit by context, at least as an option. Otherwise it is not, 1212 in general, possible to start streaming buffer content before the entire 1213 buffer is written. 1214 1215 1216 ## Possible Future Features 1217 1218 This is highly speculative, but documents some ideas that have been 1219 floating in order to avoid incompatible custom extensions on the same 1220 theme. Still these might never be implemented or end up being 1221 implemented differently. Be warned. 1222 1223 ### Force Align 1224 1225 `force_align` attribute supported on fields of structs, scalars and 1226 and vectors of fixed length elements. 1227 1228 ### Mixins 1229 1230 A mixin is a table or a struct that is apparently expanded into the table 1231 or struct that contains it. 1232 1233 Mixins add new accessors to make it appear as if the fields of the mixed 1234 in type is copied into the containing type although physically this 1235 isn't the case. Mixins can include types that themselves have mixins 1236 and these mixed in fields are also expanded. 1237 1238 A `mixin` is an attribute applied to table or a struct field when that 1239 field has a struct or a table type. The binary format is unchanged and 1240 the table or struct will continue to work as if it wasn't a mixin and 1241 is therefore fully backwards compatible. 1242 1243 Example: 1244 1245 struct Position { 1246 spawned: bool(required); 1247 x: int; 1248 y: int; 1249 } 1250 1251 table Motion { 1252 place: Position(mixin); 1253 vx: int = 0; 1254 vy: int = 0; 1255 } 1256 1257 table Status { 1258 title: string; 1259 energy: int; 1260 sleep: int; 1261 } 1262 1263 table NPC { 1264 npcid: int; 1265 motion: Motion(mixin); 1266 stat: Status(mixin); 1267 } 1268 1269 table Rock { 1270 here: Position(mixin); 1271 color: uint32 = 0xa0a0a000; 1272 } 1273 1274 table Main { 1275 npc1: NPC; 1276 rock1: Rock; 1277 } 1278 1279 root_type Main; 1280 1281 1282 Here the table NPC and Rock will appear with read accessors is if they have the fields: 1283 1284 table NPC { 1285 npcid: int; 1286 spawned: bool(required); 1287 x: int; 1288 y: int; 1289 vx: int = 0; 1290 vy: int = 0; 1291 title: string; 1292 energy: int; 1293 sleep: int; 1294 place: Position; 1295 motion: Motion(required); 1296 stat: Status; 1297 } 1298 1299 table Rock { 1300 spawned: bool(required); 1301 x: int; 1302 y: int; 1303 here: Position(required); 1304 color: uint32 = 0xa0a0a000; 1305 } 1306 1307 table Main { 1308 npc1: NPC; 1309 rock1: Rock; 1310 } 1311 1312 root_type Main; 1313 1314 or in JSON: 1315 1316 { 1317 "npc1": { 1318 "npcid": 1, 1319 "spawned": true; 1320 "x": 2, 1321 "y": 3, 1322 "vx": -4, 1323 "vy": 5, 1324 "title": "Monster", 1325 "energy": 6, 1326 "sleep": 0 1327 }, 1328 "rock1": { 1329 "spawned": false; 1330 "x": 0, 1331 "y": 0, 1332 "color": 0xa0a0a000 1333 } 1334 } 1335 1336 1337 1338 Note that there is some redundancy here which makes it possible to 1339 access the mixed in fields in different ways and to perform type casts 1340 to a mixed in type. 1341 1342 A cast can happen through a generated function along the lines of 1343 `npc.castToPosition()`, or via field a accessor `npc.getPlace()`. 1344 1345 A JSON serializer excludes the intermediate container fields such as 1346 `place` and `motion` in the example. 1347 1348 A builder may choose to only support the basic interface and required 1349 each mixed in table or struct be created separately. A more advanced 1350 builder would alternatively accept adding fields directly, but would 1351 error if a field is set twice by mixing up the approaches. 1352 1353 If a mixed type has a required field, the required status propagates to 1354 the parent, but non-required siblings are not required as can be seen in 1355 the example above. 1356 1357 Mixins also places some constraints on the involved types. It is not 1358 possible to mix in the same type twice because names would conflict and 1359 it would no longer be possible to do trivially cast a table or struct 1360 to one of its kinds. An empty table could be mixed in to 1361 provide type information but such a type can also only be added once. 1362 1363 Mixing in types introduces the risk of name conflicts. It is not valid 1364 to mix in a type directly or indirectly in a way that would lead to 1365 conflicting field names in a containing type. 1366 1367 Note that in the example it is not possible to mix in the pos struct 1368 twice, otherwise we could have mixed in a Coord class twice to have 1369 position and velocity, but in that case it would be natural to 1370 have two fields of Coord type which are not mixed in. 1371 1372 Schema evolution is fully supported because the vtable and field id's 1373 are kept separate. It is possible to add new fields to any type that 1374 is mixed in. However, adding fields could lead to name conficts 1375 which are then reported by the schema compiler. 1376 1377 1378 [eclectic.fbs]: https://github.com/dvidelabs/flatcc/blob/master/doc/eclectic.fbs