README.md (8783B)
1 The official C implementation of BLAKE3. 2 3 # Example 4 5 An example program that hashes bytes from standard input and prints the 6 result: 7 8 ```c 9 #include "blake3.h" 10 #include <stdio.h> 11 #include <unistd.h> 12 13 int main() { 14 // Initialize the hasher. 15 blake3_hasher hasher; 16 blake3_hasher_init(&hasher); 17 18 // Read input bytes from stdin. 19 unsigned char buf[65536]; 20 ssize_t n; 21 while ((n = read(STDIN_FILENO, buf, sizeof(buf))) > 0) { 22 blake3_hasher_update(&hasher, buf, n); 23 } 24 25 // Finalize the hash. BLAKE3_OUT_LEN is the default output length, 32 bytes. 26 uint8_t output[BLAKE3_OUT_LEN]; 27 blake3_hasher_finalize(&hasher, output, BLAKE3_OUT_LEN); 28 29 // Print the hash as hexadecimal. 30 for (size_t i = 0; i < BLAKE3_OUT_LEN; i++) { 31 printf("%02x", output[i]); 32 } 33 printf("\n"); 34 return 0; 35 } 36 ``` 37 38 The code above is included in this directory as `example.c`. If you're 39 on x86\_64 with a Unix-like OS, you can compile a working binary like 40 this: 41 42 ```bash 43 gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c \ 44 blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \ 45 blake3_avx512_x86-64_unix.S 46 ``` 47 48 # API 49 50 ## The Struct 51 52 ```c 53 typedef struct { 54 // private fields 55 } blake3_hasher; 56 ``` 57 58 An incremental BLAKE3 hashing state, which can accept any number of 59 updates. This implementation doesn't allocate any heap memory, but 60 `sizeof(blake3_hasher)` itself is relatively large, currently 1912 bytes 61 on x86-64. This size can be reduced by restricting the maximum input 62 length, as described in Section 5.4 of [the BLAKE3 63 spec](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), 64 but this implementation doesn't currently support that strategy. 65 66 ## Common API Functions 67 68 ```c 69 void blake3_hasher_init( 70 blake3_hasher *self); 71 ``` 72 73 Initialize a `blake3_hasher` in the default hashing mode. 74 75 --- 76 77 ```c 78 void blake3_hasher_update( 79 blake3_hasher *self, 80 const void *input, 81 size_t input_len); 82 ``` 83 84 Add input to the hasher. This can be called any number of times. 85 86 --- 87 88 ```c 89 void blake3_hasher_finalize( 90 const blake3_hasher *self, 91 uint8_t *out, 92 size_t out_len); 93 ``` 94 95 Finalize the hasher and emit an output of any length. This doesn't 96 modify the hasher itself, and it's possible to finalize again after 97 adding more input. The constant `BLAKE3_OUT_LEN` provides the default 98 output length, 32 bytes. 99 100 ## Less Common API Functions 101 102 ```c 103 void blake3_hasher_init_keyed( 104 blake3_hasher *self, 105 const uint8_t key[BLAKE3_KEY_LEN]); 106 ``` 107 108 Initialize a `blake3_hasher` in the keyed hashing mode. The key must be 109 exactly 32 bytes. 110 111 --- 112 113 ```c 114 void blake3_hasher_init_derive_key( 115 blake3_hasher *self, 116 const char *context); 117 ``` 118 119 Initialize a `blake3_hasher` in the key derivation mode. The context 120 string is given as an initialization parameter, and afterwards input key 121 material should be given with `blake3_hasher_update`. The context string 122 is a null-terminated C string which should be **hardcoded, globally 123 unique, and application-specific**. The context string should not 124 include any dynamic input like salts, nonces, or identifiers read from a 125 database at runtime. A good default format for the context string is 126 `"[application] [commit timestamp] [purpose]"`, e.g., `"example.com 127 2019-12-25 16:18:03 session tokens v1"`. 128 129 This function is intended for application code written in C. For 130 language bindings, see `blake3_hasher_init_derive_key_raw` below. 131 132 --- 133 134 ```c 135 void blake3_hasher_init_derive_key_raw( 136 blake3_hasher *self, 137 const void *context, 138 size_t context_len); 139 ``` 140 141 As `blake3_hasher_init_derive_key` above, except that the context string 142 is given as a pointer to an array of arbitrary bytes with a provided 143 length. This is intended for writing language bindings, where C string 144 conversion would add unnecessary overhead and new error cases. Unicode 145 strings should be encoded as UTF-8. 146 147 Application code in C should prefer `blake3_hasher_init_derive_key`, 148 which takes the context as a C string. If you need to use arbitrary 149 bytes as a context string in application code, consider whether you're 150 violating the requirement that context strings should be hardcoded. 151 152 --- 153 154 ```c 155 void blake3_hasher_finalize_seek( 156 const blake3_hasher *self, 157 uint64_t seek, 158 uint8_t *out, 159 size_t out_len); 160 ``` 161 162 The same as `blake3_hasher_finalize`, but with an additional `seek` 163 parameter for the starting byte position in the output stream. To 164 efficiently stream a large output without allocating memory, call this 165 function in a loop, incrementing `seek` by the output length each time. 166 167 # Building 168 169 This implementation is just C and assembly files. It doesn't include a 170 public-facing build system. (The `Makefile` in this directory is only 171 for testing.) Instead, the intention is that you can include these files 172 in whatever build system you're already using. This section describes 173 the commands your build system should execute, or which you can execute 174 by hand. Note that these steps may change in future versions. 175 176 ## x86 177 178 Dynamic dispatch is enabled by default on x86. The implementation will 179 query the CPU at runtime to detect SIMD support, and it will use the 180 widest instruction set available. By default, `blake3_dispatch.c` 181 expects to be linked with code for five different instruction sets: 182 portable C, SSE2, SSE4.1, AVX2, and AVX-512. 183 184 For each of the x86 SIMD instruction sets, two versions are available, 185 one in assembly (which is further divided into three flavors: Unix, 186 Windows MSVC, and Windows GNU) and one using C intrinsics. The assembly 187 versions are generally preferred: they perform better, they perform more 188 consistently across different compilers, and they build more quickly. On 189 the other hand, the assembly versions are x86\_64-only, and you need to 190 select the right flavor for your target platform. 191 192 Here's an example of building a shared library on x86\_64 Linux using 193 the assembly implementations: 194 195 ```bash 196 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \ 197 blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \ 198 blake3_avx512_x86-64_unix.S 199 ``` 200 201 When building the intrinsics-based implementations, you need to build 202 each implementation separately, with the corresponding instruction set 203 explicitly enabled in the compiler. Here's the same shared library using 204 the intrinsics-based implementations: 205 206 ```bash 207 gcc -c -fPIC -O3 -msse2 blake3_sse2.c -o blake3_sse2.o 208 gcc -c -fPIC -O3 -msse4.1 blake3_sse41.c -o blake3_sse41.o 209 gcc -c -fPIC -O3 -mavx2 blake3_avx2.c -o blake3_avx2.o 210 gcc -c -fPIC -O3 -mavx512f -mavx512vl blake3_avx512.c -o blake3_avx512.o 211 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \ 212 blake3_avx2.o blake3_avx512.o blake3_sse41.o blake3_sse2.o 213 ``` 214 215 Note above that building `blake3_avx512.c` requires both `-mavx512f` and 216 `-mavx512vl` under GCC and Clang. Under MSVC, the single `/arch:AVX512` 217 flag is sufficient. The MSVC equivalent of `-mavx2` is `/arch:AVX2`. 218 MSVC enables SSE2 and SSE4.1 by defaut, and it doesn't have a 219 corresponding flag. 220 221 If you want to omit SIMD code entirely, you need to explicitly disable 222 each instruction set. Here's an example of building a shared library on 223 x86 with only portable code: 224 225 ```bash 226 gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 \ 227 -DBLAKE3_NO_AVX512 blake3.c blake3_dispatch.c blake3_portable.c 228 ``` 229 230 ## ARM NEON 231 232 The NEON implementation is not enabled by default on ARM, since not all 233 ARM targets support it. To enable it, set `BLAKE3_USE_NEON=1`. Here's an 234 example of building a shared library on ARM Linux with NEON support: 235 236 ```bash 237 gcc -shared -O3 -o libblake3.so -DBLAKE3_USE_NEON blake3.c blake3_dispatch.c \ 238 blake3_portable.c blake3_neon.c 239 ``` 240 241 Note that on some targets (ARMv7 in particular), extra flags may be 242 required to activate NEON support in the compiler. If you see an error 243 like... 244 245 ``` 246 /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/9.2.0/include/arm_neon.h:635:1: error: inlining failed 247 in call to always_inline ‘vaddq_u32’: target specific option mismatch 248 ``` 249 250 ...then you may need to add something like `-mfpu=neon-vfpv4 251 -mfloat-abi=hard`. 252 253 ## Other Platforms 254 255 The portable implementation should work on most other architectures. For 256 example: 257 258 ```bash 259 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c 260 ``` 261 262 # Differences from the Rust Implementation 263 264 The single-threaded Rust and C implementations use the same algorithms, 265 and their performance is the same if you use the assembly 266 implementations or if you compile the intrinsics-based implementations 267 with Clang. (Both Clang and rustc are LLVM-based.) 268 269 The C implementation doesn't currently include any multithreading 270 optimizations. OpenMP support or similar might be added in the future.