security.md (14220B)
1 # Security Considerations 2 3 This sections covers questions such as when is it safe to access a 4 buffer without risking access violation, buffer overruns, or denial of 5 service attacks, but cannot possibly cover all security aspects. 6 7 ## Reading Buffers 8 9 When reading a buffer you have to know the schema type before 10 reading and it is preferable to know the size of the buffer although not 11 strictly required. If the type is not known, the `file_type`, aka buffer 12 type, can be checked. This no guarantee due to collisions with other 13 data formats and because the identifier field may be absent or 14 misleading. The identifier therefore works best on buffers that can be 15 trusted. 16 17 If a buffer cannot be trusted, such as when receiving it over a public 18 network, it may be the case that buffer type is known, but it is not 19 known if someone uses an incorrect implementation of FlatBuffers, or if 20 the buffer has somehow been corrupted in transit, or someone 21 intentionally tampered with the buffer. In this case the buffer can be 22 verified. A verification does not prove that the buffer has the correct 23 type, but it does prove that it is safe to read (not write) from the 24 buffer. The buffer size must be known in order to verify the buffer. If 25 the buffer has a wrong type, but still (unlikey but possible) passes 26 verification, then unexpected data may be read from the buffer, but it 27 will not cause any crashes when using the API correctly. 28 29 It is preferable to know the required alignment of a buffer which isn't 30 trivially available unless retrieved from the builder when the buffer is 31 created. The buffer alignment can be deduced from the schema. 32 33 On many systems there is no real problem in accessing a buffer 34 unaligned, but for systems where it matters, care must be taken because 35 unaligned access can result in slow performance or access violations. 36 Even on systems where alignment matters, a standard malloc operation is 37 often sufficient because it normally aligns to the largest word that 38 could cause access violations when unaligned. For special use case such 39 as GPU memory access more alignment may be needed and FlatBuffers 40 support higher alignments in the schema. Portable `aligned_alloc` and 41 `aligned_free` support methods are available to help allocate memory with 42 sufficient alignment. Because compile time flags can change between 43 compilation of the runtime library and the application, 44 `flatcc_builder_aligned_free` ensures a consistent deallocation method 45 for aligned buffers allocated by the runtime library. 46 47 A verifier for C requires the buffer to placed in aligned memory and it 48 will fail if the buffer content is not properly aligned relative to the 49 buffer or to an absolute memory address regardless of whether the 50 current systems requires alignment or not. Therefore a buffer verified 51 on one system is safe to use on all systems. One could use this fact to 52 sign a buffer, but this is beyond the scope of FlatBuffers itself, and 53 verifying a signature is likely much slower than re-verifying a buffer 54 when a verifier is available. 55 56 Note: it would be helpful if the verifier allowed verification only 57 relative to the buffer start instead of requiring the absolute addresses 58 to be aligned. This would allow verification of buffers before copying 59 them out of unaligned locations in network buffers and also allow 60 subsequent reading of such buffers without copying iff the system 61 supports unaligned access. However, the verifier does not currently 62 support this. 63 64 It is not always safe to verify a buffer. A buffer can be constructed to 65 trigger deep nesting. The FlatCC verifier has a hard coded non-exact 66 limit of about 100 levels. This is to protection stack recursion. If the 67 limit is exceeded, the verifier will safely fail. The limit can be 68 changed with a compile time flag. If the limit is too permissive a 69 system may run into stack overflow, but it is unlikely on most systems 70 today. Typical application code may have similar recursive access 71 functions. Therefore it is likely that recursion is safe if the verifier 72 succeeds but it depends on the application. 73 74 A buffer can point to the same data from multiple places. This is known 75 as a DAG. The verifier rejects cycles that could lead to infinite loops 76 during application traversal but does permit DAGs. For normal use DAGs 77 are safe but it is possible to maliciously construct a buffer with a 78 long vector where all elements points to a table that also has a vector 79 of a similar nature. After a few levels, this can lead to a finite but 80 exponentially large number of places to visit. The current FlatCC 81 verifier does not protect against this but Googles flatc compiler has a 82 verifier the limits the number of visited tables. 83 84 When reading a buffer in C no memory allocation takes place after the 85 buffer has initially been placed in memory. For example, strings can be 86 read directly as C strings and strings are 0-terminated. A string might 87 contain embedded 0 bytes which is not strictly correct but permitted. 88 This will result in a shorter string if used naively as a C string 89 without reading the string length via the API but it will still be safe. 90 Other languages might have to allocate memory for objects such as 91 strings though. 92 93 A field can generally be absent. Scalar fields are always safe to 94 access, even if they are absent, because they have a default value that 95 will be returned. Tables, Vectors, Strings, and Structs may return null 96 when a field is absent. This is perfectly valid but if the application 97 does not check for null, this can lead to an access violation. 98 99 A field can marked 'required' in the schema. If it is required, it will 100 never return null unless the buffer is invalid. A verifier will detect 101 this. On a practical note some FlatBuffer builders might not enforce the 102 required field and readers do not always verify buffers before access 103 (nor should they have to) - therefore an application is advised to check 104 return values even on required fields unless the buffer is entirely 105 trusted. 106 107 If a buffer is verified, it is safe to access all vector elements up to 108 its size. Access of elements via API calls do not necessarily check for 109 out of bounds but some might. 110 111 A buffer may also be encoded in big endian format. This is not standard, 112 but FlatCC supports for systems that are primarily big endian. The 113 buffer identifier will usually detect the difference because the 114 identifier will be byte swapped. A reader therefore need to be aware of 115 this possiblity, but most often this is not a concern since standard 116 FlatBuffers are always little endian. The verifier will likely fail an 117 unpexcted endian encoding but at least make it safe to access. 118 119 120 ## Thread Safety 121 122 There is no thread safety on the FlatBuffers API but read access does 123 not mutate any state. Every read location is a temporary variable so as 124 long as the application code is otherwise sane, it is safe read a buffer 125 from multiple threads and if the buffer is placed on cache line 126 alignment (typically 64 or 128 bytes) it is also efficient without false 127 sharing. 128 129 A verifier is also safe to use because it it only reads from a buffer. 130 131 A builder is inherently NOT safe for multihreaded access. However, with 132 proper synchronization there is nothing preventing one thread from doing 133 the grunt work and another putting the high level pieces together as 134 long as only one thread at a time is access the builder object, or the 135 associated allocator and emitter objects. From a performance perspective 136 this doesn't make much sense, but it might from an architectural 137 perspective. 138 139 A builder object can be cleared and reused after a buffer is constructed 140 or abandoned. The clear operation can optionally reduce the amount of 141 memory or keep all the memory from the previous operation. In either 142 case it is safe for new thread to use the builder after it is cleared 143 but two threads cannot use the builder at the same time. 144 145 It is fairly cheap to create a new builder object, but of course cheaper 146 to reuse existing memory. Often the best option is for each thread to 147 have its own builder and own memory and defer any sharing to the point 148 where the buffer is finished and readable. 149 150 151 ## Schema Evolution 152 153 Accessing a buffer that was created by a more recent of a FlatBuffers 154 schema is safe iff the new version of the schema was created according 155 the guidelines for schema evolution - notably no change of field 156 identifiers or existing enum values and no removal or deprecation of 157 required fields. Googles flatc tool can check if a new schema version is 158 safe. 159 160 Fields that are not required but deprecated in a new version will still 161 be safe to access by old version but they will likely read default or 162 null values and should be prepared for this. 163 164 165 ## Copying or Printing Buffer Content 166 167 Even if it is safe to read a buffer, it is not safe to copy or even 168 print a buffer because a DAG can unfold to consume much more output 169 space than the given input. In data compression this is known as a Zip 170 Bomb but even without malicious intent, users need to aware of the 171 potential expansion. This is also a concern when printing JSON. 172 173 A table also cannot be trivially copied based on memory content because 174 it has offsets to other content. This is not an issue when using any 175 official API but may be if a new buffer is attempted to be constructed 176 by Frankenstein from parts of existing buffers. 177 178 Nested buffers are not allowed to share any content with its parents, 179 siblings or child buffers for similar reasons. 180 181 The verifier should complain if a buffer goes out if bounds. 182 183 184 ## Modifying Buffers 185 186 It is not safe to modify buffers in-place unless the buffer is 187 absolutely trusted. Verifying a buffer is not enough. FlatC does not 188 provide any means to modify a buffer in-place but it is not hard to 189 achieve this is if so desired. It is especially easy to do this with 190 structs, so if this is needed this is the way to do it. 191 192 Modifying a buffer is unsafe because it is possible to place one table 193 inside another table, as an example, even if this is not valid. Such 194 overlaps are too expensive to verify by a standard verifier. As long as 195 the buffer is not modified this does not pose any real problem, but if a 196 field is modified in one table, it might cause a field of another table 197 to point out of bounds. This is so obvious an attack vector that anyone 198 wanting to hack a system is likely to use this approach. Therefore 199 in-place modification should be avoided unless on a trusted platform. 200 For example, a trusted network might bump a time-to-live counter when 201 passing buffers around. 202 203 Even if it is safe to modify buffers, this will not work on platforms 204 that require endian conversion. This is usually big endian platforms, 205 but it is possible to compile flatbuffers with native big endian format 206 as well. 207 platforms unless extra precautions are taken. FlatCC has a lot of 208 low-level `to/from_pe` calls that performs the proper 209 210 211 ## Building Buffers 212 213 A buffer can be constructed incorrectly in a large number of ways that 214 are not efficient to detect at runtime. 215 216 When a buffer is constructed with a debug library then assertions will 217 sometimes find the most obvious problems such as closing a table after 218 opening a vector. FlatCC is quite permissive in the order of object 219 creation but the nesting order must be respected, and it cannot type 220 check all data. Other FlatBuffer builders typically require that child 221 objects are completed created before a parent object is started. FlatCC 222 does not require this but will internally organize objects in a 223 comptible way. This removes a number of potential mistakes, but not all. 224 225 Notably a table from a parent object or any other external reference 226 should not be used in a nested buffer. 227 228 It is a good idea to run a verifier on a constructed buffer at least 229 until some confidence has been gained in the code building buffers. 230 231 If a buffer needs to be constructed with sorted keys this cannot be done 232 during construction, unlike the C++ API because the builder allocates as 233 little memory as possible. Instead the reader interface supports a 234 mutable cast for use with a sort operation. This sort operation must 235 only be used on absolutely trusted buffers and verification is not 236 sufficient if malicous overlaps can be expected. 237 238 The builder will normally consume very little memory. It operates a few 239 stacks and a small hash table in additional to a circular buffer to 240 consum temporary buffer output. It is not possible to access constructed 241 buffer objects buffer the buffer is complete because data may span 242 multiple buffers. Once a buffer is complete the content can be 243 copied out, or a support function can be used allocate new memory with 244 the final buffer as content. 245 246 The internal memory can grow large when the buffer grows large, 247 naturally. In addition the temporary stacks may grow large if there are 248 are large tables or notably large vectors that cannot be be copied 249 directly to the output buffers. This creates a potential for memory 250 allocation errors, especially on constrained systems. 251 252 The builder returns error codes, but it is tedious to check these. It is 253 not necessary to check return codes if the API is used correctly and if 254 there are no allocation errors. It is possible to provide a custom 255 allocator and a custom emitter. These can detect memory failures early 256 making it potentially safe to use the builder API without any per 257 operation checks. 258 259 The generated JSON parser checks all return codes and can be used to 260 construct a buffer safely, especially since the buffer is naturally 261 bounded by the size of the JSON input. JSON printing, on the other hand, 262 can potentially explode, as discussed earlier. 263 264 FlatCC generated create calls such as `MyGame_Example_Monster_create()` 265 will not be compatible across versions if there are deprecated fields 266 even if the schema change otherwise respects schema evolutation rules. 267 This is mostly a concern if new fields are added because compilation 268 will otherwise break on argument count mismatch. Prior to flatcc-0.5.3 269 argument order could change if the field (id: x) attribute was used 270 which could lead to buffers with unexpected content. JSON parsers that 271 support constructors (objects given as an array of create arguments) 272 have similar concerns but here trailing arguments can be optional.