chibipub

experimental activitypub node in C
git clone git://jb55.com/chibipub
Log | Files | Refs | README | LICENSE

README.md (8783B)


      1 The official C implementation of BLAKE3.
      2 
      3 # Example
      4 
      5 An example program that hashes bytes from standard input and prints the
      6 result:
      7 
      8 ```c
      9 #include "blake3.h"
     10 #include <stdio.h>
     11 #include <unistd.h>
     12 
     13 int main() {
     14   // Initialize the hasher.
     15   blake3_hasher hasher;
     16   blake3_hasher_init(&hasher);
     17 
     18   // Read input bytes from stdin.
     19   unsigned char buf[65536];
     20   ssize_t n;
     21   while ((n = read(STDIN_FILENO, buf, sizeof(buf))) > 0) {
     22     blake3_hasher_update(&hasher, buf, n);
     23   }
     24 
     25   // Finalize the hash. BLAKE3_OUT_LEN is the default output length, 32 bytes.
     26   uint8_t output[BLAKE3_OUT_LEN];
     27   blake3_hasher_finalize(&hasher, output, BLAKE3_OUT_LEN);
     28 
     29   // Print the hash as hexadecimal.
     30   for (size_t i = 0; i < BLAKE3_OUT_LEN; i++) {
     31     printf("%02x", output[i]);
     32   }
     33   printf("\n");
     34   return 0;
     35 }
     36 ```
     37 
     38 The code above is included in this directory as `example.c`. If you're
     39 on x86\_64 with a Unix-like OS, you can compile a working binary like
     40 this:
     41 
     42 ```bash
     43 gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c \
     44     blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \
     45     blake3_avx512_x86-64_unix.S
     46 ```
     47 
     48 # API
     49 
     50 ## The Struct
     51 
     52 ```c
     53 typedef struct {
     54   // private fields
     55 } blake3_hasher;
     56 ```
     57 
     58 An incremental BLAKE3 hashing state, which can accept any number of
     59 updates. This implementation doesn't allocate any heap memory, but
     60 `sizeof(blake3_hasher)` itself is relatively large, currently 1912 bytes
     61 on x86-64. This size can be reduced by restricting the maximum input
     62 length, as described in Section 5.4 of [the BLAKE3
     63 spec](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf),
     64 but this implementation doesn't currently support that strategy.
     65 
     66 ## Common API Functions
     67 
     68 ```c
     69 void blake3_hasher_init(
     70   blake3_hasher *self);
     71 ```
     72 
     73 Initialize a `blake3_hasher` in the default hashing mode.
     74 
     75 ---
     76 
     77 ```c
     78 void blake3_hasher_update(
     79   blake3_hasher *self,
     80   const void *input,
     81   size_t input_len);
     82 ```
     83 
     84 Add input to the hasher. This can be called any number of times.
     85 
     86 ---
     87 
     88 ```c
     89 void blake3_hasher_finalize(
     90   const blake3_hasher *self,
     91   uint8_t *out,
     92   size_t out_len);
     93 ```
     94 
     95 Finalize the hasher and emit an output of any length. This doesn't
     96 modify the hasher itself, and it's possible to finalize again after
     97 adding more input. The constant `BLAKE3_OUT_LEN` provides the default
     98 output length, 32 bytes.
     99 
    100 ## Less Common API Functions
    101 
    102 ```c
    103 void blake3_hasher_init_keyed(
    104   blake3_hasher *self,
    105   const uint8_t key[BLAKE3_KEY_LEN]);
    106 ```
    107 
    108 Initialize a `blake3_hasher` in the keyed hashing mode. The key must be
    109 exactly 32 bytes.
    110 
    111 ---
    112 
    113 ```c
    114 void blake3_hasher_init_derive_key(
    115   blake3_hasher *self,
    116   const char *context);
    117 ```
    118 
    119 Initialize a `blake3_hasher` in the key derivation mode. The context
    120 string is given as an initialization parameter, and afterwards input key
    121 material should be given with `blake3_hasher_update`. The context string
    122 is a null-terminated C string which should be **hardcoded, globally
    123 unique, and application-specific**. The context string should not
    124 include any dynamic input like salts, nonces, or identifiers read from a
    125 database at runtime. A good default format for the context string is
    126 `"[application] [commit timestamp] [purpose]"`, e.g., `"example.com
    127 2019-12-25 16:18:03 session tokens v1"`.
    128 
    129 This function is intended for application code written in C. For
    130 language bindings, see `blake3_hasher_init_derive_key_raw` below.
    131 
    132 ---
    133 
    134 ```c
    135 void blake3_hasher_init_derive_key_raw(
    136   blake3_hasher *self,
    137   const void *context,
    138   size_t context_len);
    139 ```
    140 
    141 As `blake3_hasher_init_derive_key` above, except that the context string
    142 is given as a pointer to an array of arbitrary bytes with a provided
    143 length. This is intended for writing language bindings, where C string
    144 conversion would add unnecessary overhead and new error cases. Unicode
    145 strings should be encoded as UTF-8.
    146 
    147 Application code in C should prefer `blake3_hasher_init_derive_key`,
    148 which takes the context as a C string. If you need to use arbitrary
    149 bytes as a context string in application code, consider whether you're
    150 violating the requirement that context strings should be hardcoded.
    151 
    152 ---
    153 
    154 ```c
    155 void blake3_hasher_finalize_seek(
    156   const blake3_hasher *self,
    157   uint64_t seek,
    158   uint8_t *out,
    159   size_t out_len);
    160 ```
    161 
    162 The same as `blake3_hasher_finalize`, but with an additional `seek`
    163 parameter for the starting byte position in the output stream. To
    164 efficiently stream a large output without allocating memory, call this
    165 function in a loop, incrementing `seek` by the output length each time.
    166 
    167 # Building
    168 
    169 This implementation is just C and assembly files. It doesn't include a
    170 public-facing build system. (The `Makefile` in this directory is only
    171 for testing.) Instead, the intention is that you can include these files
    172 in whatever build system you're already using. This section describes
    173 the commands your build system should execute, or which you can execute
    174 by hand. Note that these steps may change in future versions.
    175 
    176 ## x86
    177 
    178 Dynamic dispatch is enabled by default on x86. The implementation will
    179 query the CPU at runtime to detect SIMD support, and it will use the
    180 widest instruction set available. By default, `blake3_dispatch.c`
    181 expects to be linked with code for five different instruction sets:
    182 portable C, SSE2, SSE4.1, AVX2, and AVX-512.
    183 
    184 For each of the x86 SIMD instruction sets, two versions are available,
    185 one in assembly (which is further divided into three flavors: Unix,
    186 Windows MSVC, and Windows GNU) and one using C intrinsics. The assembly
    187 versions are generally preferred: they perform better, they perform more
    188 consistently across different compilers, and they build more quickly. On
    189 the other hand, the assembly versions are x86\_64-only, and you need to
    190 select the right flavor for your target platform.
    191 
    192 Here's an example of building a shared library on x86\_64 Linux using
    193 the assembly implementations:
    194 
    195 ```bash
    196 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \
    197     blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \
    198     blake3_avx512_x86-64_unix.S
    199 ```
    200 
    201 When building the intrinsics-based implementations, you need to build
    202 each implementation separately, with the corresponding instruction set
    203 explicitly enabled in the compiler. Here's the same shared library using
    204 the intrinsics-based implementations:
    205 
    206 ```bash
    207 gcc -c -fPIC -O3 -msse2 blake3_sse2.c -o blake3_sse2.o
    208 gcc -c -fPIC -O3 -msse4.1 blake3_sse41.c -o blake3_sse41.o
    209 gcc -c -fPIC -O3 -mavx2 blake3_avx2.c -o blake3_avx2.o
    210 gcc -c -fPIC -O3 -mavx512f -mavx512vl blake3_avx512.c -o blake3_avx512.o
    211 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \
    212     blake3_avx2.o blake3_avx512.o blake3_sse41.o blake3_sse2.o
    213 ```
    214 
    215 Note above that building `blake3_avx512.c` requires both `-mavx512f` and
    216 `-mavx512vl` under GCC and Clang. Under MSVC, the single `/arch:AVX512`
    217 flag is sufficient. The MSVC equivalent of `-mavx2` is `/arch:AVX2`.
    218 MSVC enables SSE2 and SSE4.1 by defaut, and it doesn't have a
    219 corresponding flag.
    220 
    221 If you want to omit SIMD code entirely, you need to explicitly disable
    222 each instruction set. Here's an example of building a shared library on
    223 x86 with only portable code:
    224 
    225 ```bash
    226 gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 \
    227     -DBLAKE3_NO_AVX512 blake3.c blake3_dispatch.c blake3_portable.c
    228 ```
    229 
    230 ## ARM NEON
    231 
    232 The NEON implementation is not enabled by default on ARM, since not all
    233 ARM targets support it. To enable it, set `BLAKE3_USE_NEON=1`. Here's an
    234 example of building a shared library on ARM Linux with NEON support:
    235 
    236 ```bash
    237 gcc -shared -O3 -o libblake3.so -DBLAKE3_USE_NEON blake3.c blake3_dispatch.c \
    238     blake3_portable.c blake3_neon.c
    239 ```
    240 
    241 Note that on some targets (ARMv7 in particular), extra flags may be
    242 required to activate NEON support in the compiler. If you see an error
    243 like...
    244 
    245 ```
    246 /usr/lib/gcc/armv7l-unknown-linux-gnueabihf/9.2.0/include/arm_neon.h:635:1: error: inlining failed
    247 in call to always_inline ‘vaddq_u32’: target specific option mismatch
    248 ```
    249 
    250 ...then you may need to add something like `-mfpu=neon-vfpv4
    251 -mfloat-abi=hard`.
    252 
    253 ## Other Platforms
    254 
    255 The portable implementation should work on most other architectures. For
    256 example:
    257 
    258 ```bash
    259 gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c
    260 ```
    261 
    262 # Differences from the Rust Implementation
    263 
    264 The single-threaded Rust and C implementations use the same algorithms,
    265 and their performance is the same if you use the assembly
    266 implementations or if you compile the intrinsics-based implementations
    267 with Clang. (Both Clang and rustc are LLVM-based.)
    268 
    269 The C implementation doesn't currently include any multithreading
    270 optimizations. OpenMP support or similar might be added in the future.