lnvis

nanovg lightning network visualizer
git clone git://jb55.com/lnvis
Log | Files | Refs | README | LICENSE

stb_image.h (227413B)


      1 /* stb_image - v2.10 - public domain image loader - http://nothings.org/stb_image.h
      2                                      no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45    Revision 2.00 release notes:
     46 
     47       - Progressive JPEG is now supported.
     48 
     49       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
     50 
     51       - x86 platforms now make use of SSE2 SIMD instructions for
     52         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
     53         This work was done by Fabian "ryg" Giesen. SSE2 is used by
     54         default, but NEON must be enabled explicitly; see docs.
     55 
     56         With other JPEG optimizations included in this version, we see
     57         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
     58         on a JPEG on an ARM machine, relative to previous versions of this
     59         library. The same results will not obtain for all JPGs and for all
     60         x86/ARM machines. (Note that progressive JPEGs are significantly
     61         slower to decode than regular JPEGs.) This doesn't mean that this
     62         is the fastest JPEG decoder in the land; rather, it brings it
     63         closer to parity with standard libraries. If you want the fastest
     64         decode, look elsewhere. (See "Philosophy" section of docs below.)
     65 
     66         See final bullet items below for more info on SIMD.
     67 
     68       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
     69         the memory allocator. Unlike other STBI libraries, these macros don't
     70         support a context parameter, so if you need to pass a context in to
     71         the allocator, you'll have to store it in a global or a thread-local
     72         variable.
     73 
     74       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
     75         STBI_NO_LINEAR.
     76             STBI_NO_HDR:     suppress implementation of .hdr reader format
     77             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
     78 
     79       - You can suppress implementation of any of the decoders to reduce
     80         your code footprint by #defining one or more of the following
     81         symbols before creating the implementation.
     82 
     83             STBI_NO_JPEG
     84             STBI_NO_PNG
     85             STBI_NO_BMP
     86             STBI_NO_PSD
     87             STBI_NO_TGA
     88             STBI_NO_GIF
     89             STBI_NO_HDR
     90             STBI_NO_PIC
     91             STBI_NO_PNM   (.ppm and .pgm)
     92 
     93       - You can request *only* certain decoders and suppress all other ones
     94         (this will be more forward-compatible, as addition of new decoders
     95         doesn't require you to disable them explicitly):
     96 
     97             STBI_ONLY_JPEG
     98             STBI_ONLY_PNG
     99             STBI_ONLY_BMP
    100             STBI_ONLY_PSD
    101             STBI_ONLY_TGA
    102             STBI_ONLY_GIF
    103             STBI_ONLY_HDR
    104             STBI_ONLY_PIC
    105             STBI_ONLY_PNM   (.ppm and .pgm)
    106 
    107          Note that you can define multiples of these, and you will get all
    108          of them ("only x" and "only y" is interpreted to mean "only x&y").
    109 
    110        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    111          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    112 
    113       - Compilation of all SIMD code can be suppressed with
    114             #define STBI_NO_SIMD
    115         It should not be necessary to disable SIMD unless you have issues
    116         compiling (e.g. using an x86 compiler which doesn't support SSE
    117         intrinsics or that doesn't support the method used to detect
    118         SSE2 support at run-time), and even those can be reported as
    119         bugs so I can refine the built-in compile-time checking to be
    120         smarter.
    121 
    122       - The old STBI_SIMD system which allowed installing a user-defined
    123         IDCT etc. has been removed. If you need this, don't upgrade. My
    124         assumption is that almost nobody was doing this, and those who
    125         were will find the built-in SIMD more satisfactory anyway.
    126 
    127       - RGB values computed for JPEG images are slightly different from
    128         previous versions of stb_image. (This is due to using less
    129         integer precision in SIMD.) The C code has been adjusted so
    130         that the same RGB values will be computed regardless of whether
    131         SIMD support is available, so your app should always produce
    132         consistent results. But these results are slightly different from
    133         previous versions. (Specifically, about 3% of available YCbCr values
    134         will compute different RGB results from pre-1.49 versions by +-1;
    135         most of the deviating values are one smaller in the G channel.)
    136 
    137       - If you must produce consistent results with previous versions of
    138         stb_image, #define STBI_JPEG_OLD and you will get the same results
    139         you used to; however, you will not get the SIMD speedups for
    140         the YCbCr-to-RGB conversion step (although you should still see
    141         significant JPEG speedup from the other changes).
    142 
    143         Please note that STBI_JPEG_OLD is a temporary feature; it will be
    144         removed in future versions of the library. It is only intended for
    145         near-term back-compatibility use.
    146 
    147 
    148    Latest revision history:
    149       2.10  (2016-01-22) avoid warning introduced in 2.09
    150       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
    151       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
    152       2.07  (2015-09-13) partial animated GIF support
    153                          limited 16-bit PSD support
    154                          minor bugs, code cleanup, and compiler warnings
    155       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
    156       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
    157       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
    158       2.03  (2015-04-12) additional corruption checking
    159                          stbi_set_flip_vertically_on_load
    160                          fix NEON support; fix mingw support
    161       2.02  (2015-01-19) fix incorrect assert, fix warning
    162       2.01  (2015-01-17) fix various warnings
    163       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
    164       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
    165                          progressive JPEG
    166                          PGM/PPM support
    167                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
    168                          STBI_NO_*, STBI_ONLY_*
    169                          GIF bugfix
    170       1.48  (2014-12-14) fix incorrectly-named assert()
    171       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
    172                          optimize PNG
    173                          fix bug in interlaced PNG with user-specified channel count
    174 
    175    See end of file for full revision history.
    176 
    177 
    178  ============================    Contributors    =========================
    179 
    180  Image formats                          Extensions, features
    181     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
    182     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
    183     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
    184     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
    185     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
    186     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
    187     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
    188     urraka@github (animated gif)           Junggon Kim (PNM comments)
    189                                            Daniel Gibson (16-bit TGA)
    190 
    191  Optimizations & bugfixes
    192     Fabian "ryg" Giesen
    193     Arseny Kapoulkine
    194 
    195  Bug & warning fixes
    196     Marc LeBlanc            David Woo          Guillaume George   Martins Mozeiko
    197     Christpher Lloyd        Martin Golini      Jerry Jansson      Joseph Thomson
    198     Dave Moore              Roy Eltham         Hayaki Saito       Phil Jordan
    199     Won Chun                Luke Graham        Johan Duparc       Nathan Reed
    200     the Horde3D community   Thomas Ruf         Ronny Chevalier    Nick Verigakis
    201     Janez Zemva             John Bartholomew   Michal Cichon      svdijk@github
    202     Jonathan Blow           Ken Hamada         Tero Hanninen      Baldur Karlsson
    203     Laurent Gomila          Cort Stratton      Sergio Gonzalez    romigrou@github
    204     Aruelien Pocheville     Thibault Reuille   Cass Everitt
    205     Ryamond Barbiero        Paul Du Bois       Engin Manap
    206     Blazej Dariusz Roszkowski
    207     Michaelangel007@github
    208 
    209 
    210 LICENSE
    211 
    212 This software is in the public domain. Where that dedication is not
    213 recognized, you are granted a perpetual, irrevocable license to copy,
    214 distribute, and modify this file as you see fit.
    215 
    216 */
    217 
    218 #ifndef STBI_INCLUDE_STB_IMAGE_H
    219 #define STBI_INCLUDE_STB_IMAGE_H
    220 
    221 // DOCUMENTATION
    222 //
    223 // Limitations:
    224 //    - no 16-bit-per-channel PNG
    225 //    - no 12-bit-per-channel JPEG
    226 //    - no JPEGs with arithmetic coding
    227 //    - no 1-bit BMP
    228 //    - GIF always returns *comp=4
    229 //
    230 // Basic usage (see HDR discussion below for HDR usage):
    231 //    int x,y,n;
    232 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    233 //    // ... process data if not NULL ...
    234 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    235 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    236 //    // ... but 'n' will always be the number that it would have been if you said 0
    237 //    stbi_image_free(data)
    238 //
    239 // Standard parameters:
    240 //    int *x       -- outputs image width in pixels
    241 //    int *y       -- outputs image height in pixels
    242 //    int *comp    -- outputs # of image components in image file
    243 //    int req_comp -- if non-zero, # of image components requested in result
    244 //
    245 // The return value from an image loader is an 'unsigned char *' which points
    246 // to the pixel data, or NULL on an allocation failure or if the image is
    247 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    248 // with each pixel consisting of N interleaved 8-bit components; the first
    249 // pixel pointed to is top-left-most in the image. There is no padding between
    250 // image scanlines or between pixels, regardless of format. The number of
    251 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
    252 // If req_comp is non-zero, *comp has the number of components that _would_
    253 // have been output otherwise. E.g. if you set req_comp to 4, you will always
    254 // get RGBA output, but you can check *comp to see if it's trivially opaque
    255 // because e.g. there were only 3 channels in the source image.
    256 //
    257 // An output image with N components has the following components interleaved
    258 // in this order in each pixel:
    259 //
    260 //     N=#comp     components
    261 //       1           grey
    262 //       2           grey, alpha
    263 //       3           red, green, blue
    264 //       4           red, green, blue, alpha
    265 //
    266 // If image loading fails for any reason, the return value will be NULL,
    267 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
    268 // can be queried for an extremely brief, end-user unfriendly explanation
    269 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
    270 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    271 // more user-friendly ones.
    272 //
    273 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    274 //
    275 // ===========================================================================
    276 //
    277 // Philosophy
    278 //
    279 // stb libraries are designed with the following priorities:
    280 //
    281 //    1. easy to use
    282 //    2. easy to maintain
    283 //    3. good performance
    284 //
    285 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    286 // and for best performance I may provide less-easy-to-use APIs that give higher
    287 // performance, in addition to the easy to use ones. Nevertheless, it's important
    288 // to keep in mind that from the standpoint of you, a client of this library,
    289 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
    290 //
    291 // Some secondary priorities arise directly from the first two, some of which
    292 // make more explicit reasons why performance can't be emphasized.
    293 //
    294 //    - Portable ("ease of use")
    295 //    - Small footprint ("easy to maintain")
    296 //    - No dependencies ("ease of use")
    297 //
    298 // ===========================================================================
    299 //
    300 // I/O callbacks
    301 //
    302 // I/O callbacks allow you to read from arbitrary sources, like packaged
    303 // files or some other source. Data read from callbacks are processed
    304 // through a small internal buffer (currently 128 bytes) to try to reduce
    305 // overhead.
    306 //
    307 // The three functions you must define are "read" (reads some bytes of data),
    308 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    309 //
    310 // ===========================================================================
    311 //
    312 // SIMD support
    313 //
    314 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    315 // supported by the compiler. For ARM Neon support, you must explicitly
    316 // request it.
    317 //
    318 // (The old do-it-yourself SIMD API is no longer supported in the current
    319 // code.)
    320 //
    321 // On x86, SSE2 will automatically be used when available based on a run-time
    322 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    323 // the typical path is to have separate builds for NEON and non-NEON devices
    324 // (at least this is true for iOS and Android). Therefore, the NEON support is
    325 // toggled by a build flag: define STBI_NEON to get NEON loops.
    326 //
    327 // The output of the JPEG decoder is slightly different from versions where
    328 // SIMD support was introduced (that is, for versions before 1.49). The
    329 // difference is only +-1 in the 8-bit RGB channels, and only on a small
    330 // fraction of pixels. You can force the pre-1.49 behavior by defining
    331 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
    332 // and hence cost some performance.
    333 //
    334 // If for some reason you do not want to use any of SIMD code, or if
    335 // you have issues compiling it, you can disable it entirely by
    336 // defining STBI_NO_SIMD.
    337 //
    338 // ===========================================================================
    339 //
    340 // HDR image support   (disable by defining STBI_NO_HDR)
    341 //
    342 // stb_image now supports loading HDR images in general, and currently
    343 // the Radiance .HDR file format, although the support is provided
    344 // generically. You can still load any file through the existing interface;
    345 // if you attempt to load an HDR file, it will be automatically remapped to
    346 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    347 // both of these constants can be reconfigured through this interface:
    348 //
    349 //     stbi_hdr_to_ldr_gamma(2.2f);
    350 //     stbi_hdr_to_ldr_scale(1.0f);
    351 //
    352 // (note, do not use _inverse_ constants; stbi_image will invert them
    353 // appropriately).
    354 //
    355 // Additionally, there is a new, parallel interface for loading files as
    356 // (linear) floats to preserve the full dynamic range:
    357 //
    358 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    359 //
    360 // If you load LDR images through this interface, those images will
    361 // be promoted to floating point values, run through the inverse of
    362 // constants corresponding to the above:
    363 //
    364 //     stbi_ldr_to_hdr_scale(1.0f);
    365 //     stbi_ldr_to_hdr_gamma(2.2f);
    366 //
    367 // Finally, given a filename (or an open file or memory block--see header
    368 // file for details) containing image data, you can query for the "most
    369 // appropriate" interface to use (that is, whether the image is HDR or
    370 // not), using:
    371 //
    372 //     stbi_is_hdr(char *filename);
    373 //
    374 // ===========================================================================
    375 //
    376 // iPhone PNG support:
    377 //
    378 // By default we convert iphone-formatted PNGs back to RGB, even though
    379 // they are internally encoded differently. You can disable this conversion
    380 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
    381 // you will always just get the native iphone "format" through (which
    382 // is BGR stored in RGB).
    383 //
    384 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    385 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    386 // says there's premultiplied data (currently only happens in iPhone images,
    387 // and only if iPhone convert-to-rgb processing is on).
    388 //
    389 
    390 
    391 #ifndef STBI_NO_STDIO
    392 #include <stdio.h>
    393 #endif // STBI_NO_STDIO
    394 
    395 #define STBI_VERSION 1
    396 
    397 enum
    398 {
    399    STBI_default = 0, // only used for req_comp
    400 
    401    STBI_grey       = 1,
    402    STBI_grey_alpha = 2,
    403    STBI_rgb        = 3,
    404    STBI_rgb_alpha  = 4
    405 };
    406 
    407 typedef unsigned char stbi_uc;
    408 
    409 #ifdef __cplusplus
    410 extern "C" {
    411 #endif
    412 
    413 #ifdef STB_IMAGE_STATIC
    414 #define STBIDEF static
    415 #else
    416 #define STBIDEF extern
    417 #endif
    418 
    419 //////////////////////////////////////////////////////////////////////////////
    420 //
    421 // PRIMARY API - works on images of any type
    422 //
    423 
    424 //
    425 // load image by filename, open file, or memory buffer
    426 //
    427 
    428 typedef struct
    429 {
    430    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    431    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    432    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    433 } stbi_io_callbacks;
    434 
    435 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
    436 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
    437 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
    438 
    439 #ifndef STBI_NO_STDIO
    440 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
    441 // for stbi_load_from_file, file pointer is left pointing immediately after image
    442 #endif
    443 
    444 #ifndef STBI_NO_LINEAR
    445    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
    446    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
    447    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
    448 
    449    #ifndef STBI_NO_STDIO
    450    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
    451    #endif
    452 #endif
    453 
    454 #ifndef STBI_NO_HDR
    455    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    456    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    457 #endif // STBI_NO_HDR
    458 
    459 #ifndef STBI_NO_LINEAR
    460    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    461    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    462 #endif // STBI_NO_LINEAR
    463 
    464 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    465 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    466 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    467 #ifndef STBI_NO_STDIO
    468 STBIDEF int      stbi_is_hdr          (char const *filename);
    469 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    470 #endif // STBI_NO_STDIO
    471 
    472 
    473 // get a VERY brief reason for failure
    474 // NOT THREADSAFE
    475 STBIDEF const char *stbi_failure_reason  (void);
    476 
    477 // free the loaded image -- this is just free()
    478 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    479 
    480 // get image dimensions & components without fully decoding
    481 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    482 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    483 
    484 #ifndef STBI_NO_STDIO
    485 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
    486 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
    487 
    488 #endif
    489 
    490 
    491 
    492 // for image formats that explicitly notate that they have premultiplied alpha,
    493 // we just return the colors as stored in the file. set this flag to force
    494 // unpremultiplication. results are undefined if the unpremultiply overflow.
    495 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    496 
    497 // indicate whether we should process iphone images back to canonical format,
    498 // or just pass them through "as-is"
    499 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    500 
    501 // flip the image vertically, so the first pixel in the output array is the bottom left
    502 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    503 
    504 // ZLIB client - used by PNG, available for other purposes
    505 
    506 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    507 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    508 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    509 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    510 
    511 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    512 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    513 
    514 
    515 #ifdef __cplusplus
    516 }
    517 #endif
    518 
    519 //
    520 //
    521 ////   end header file   /////////////////////////////////////////////////////
    522 #endif // STBI_INCLUDE_STB_IMAGE_H
    523 
    524 #ifdef STB_IMAGE_IMPLEMENTATION
    525 
    526 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    527   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    528   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    529   || defined(STBI_ONLY_ZLIB)
    530    #ifndef STBI_ONLY_JPEG
    531    #define STBI_NO_JPEG
    532    #endif
    533    #ifndef STBI_ONLY_PNG
    534    #define STBI_NO_PNG
    535    #endif
    536    #ifndef STBI_ONLY_BMP
    537    #define STBI_NO_BMP
    538    #endif
    539    #ifndef STBI_ONLY_PSD
    540    #define STBI_NO_PSD
    541    #endif
    542    #ifndef STBI_ONLY_TGA
    543    #define STBI_NO_TGA
    544    #endif
    545    #ifndef STBI_ONLY_GIF
    546    #define STBI_NO_GIF
    547    #endif
    548    #ifndef STBI_ONLY_HDR
    549    #define STBI_NO_HDR
    550    #endif
    551    #ifndef STBI_ONLY_PIC
    552    #define STBI_NO_PIC
    553    #endif
    554    #ifndef STBI_ONLY_PNM
    555    #define STBI_NO_PNM
    556    #endif
    557 #endif
    558 
    559 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    560 #define STBI_NO_ZLIB
    561 #endif
    562 
    563 
    564 #include <stdarg.h>
    565 #include <stddef.h> // ptrdiff_t on osx
    566 #include <stdlib.h>
    567 #include <string.h>
    568 
    569 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    570 #include <math.h>  // ldexp
    571 #endif
    572 
    573 #ifndef STBI_NO_STDIO
    574 #include <stdio.h>
    575 #endif
    576 
    577 #ifndef STBI_ASSERT
    578 #include <assert.h>
    579 #define STBI_ASSERT(x) assert(x)
    580 #endif
    581 
    582 
    583 #ifndef _MSC_VER
    584    #ifdef __cplusplus
    585    #define stbi_inline inline
    586    #else
    587    #define stbi_inline
    588    #endif
    589 #else
    590    #define stbi_inline __forceinline
    591 #endif
    592 
    593 
    594 #ifdef _MSC_VER
    595 typedef unsigned short stbi__uint16;
    596 typedef   signed short stbi__int16;
    597 typedef unsigned int   stbi__uint32;
    598 typedef   signed int   stbi__int32;
    599 #else
    600 #include <stdint.h>
    601 typedef uint16_t stbi__uint16;
    602 typedef int16_t  stbi__int16;
    603 typedef uint32_t stbi__uint32;
    604 typedef int32_t  stbi__int32;
    605 #endif
    606 
    607 // should produce compiler error if size is wrong
    608 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    609 
    610 #ifdef _MSC_VER
    611 #define STBI_NOTUSED(v)  (void)(v)
    612 #else
    613 #define STBI_NOTUSED(v)  (void)sizeof(v)
    614 #endif
    615 
    616 #ifdef _MSC_VER
    617 #define STBI_HAS_LROTL
    618 #endif
    619 
    620 #ifdef STBI_HAS_LROTL
    621    #define stbi_lrot(x,y)  _lrotl(x,y)
    622 #else
    623    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
    624 #endif
    625 
    626 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
    627 // ok
    628 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
    629 // ok
    630 #else
    631 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
    632 #endif
    633 
    634 #ifndef STBI_MALLOC
    635 #define STBI_MALLOC(sz)           malloc(sz)
    636 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
    637 #define STBI_FREE(p)              free(p)
    638 #endif
    639 
    640 #ifndef STBI_REALLOC_SIZED
    641 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
    642 #endif
    643 
    644 // x86/x64 detection
    645 #if defined(__x86_64__) || defined(_M_X64)
    646 #define STBI__X64_TARGET
    647 #elif defined(__i386) || defined(_M_IX86)
    648 #define STBI__X86_TARGET
    649 #endif
    650 
    651 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    652 // NOTE: not clear do we actually need this for the 64-bit path?
    653 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    654 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
    655 // this is just broken and gcc are jerks for not fixing it properly
    656 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
    657 #define STBI_NO_SIMD
    658 #endif
    659 
    660 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    661 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    662 //
    663 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    664 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    665 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    666 // simultaneously enabling "-mstackrealign".
    667 //
    668 // See https://github.com/nothings/stb/issues/81 for more information.
    669 //
    670 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    671 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    672 #define STBI_NO_SIMD
    673 #endif
    674 
    675 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
    676 #define STBI_SSE2
    677 #include <emmintrin.h>
    678 
    679 #ifdef _MSC_VER
    680 
    681 #if _MSC_VER >= 1400  // not VC6
    682 #include <intrin.h> // __cpuid
    683 static int stbi__cpuid3(void)
    684 {
    685    int info[4];
    686    __cpuid(info,1);
    687    return info[3];
    688 }
    689 #else
    690 static int stbi__cpuid3(void)
    691 {
    692    int res;
    693    __asm {
    694       mov  eax,1
    695       cpuid
    696       mov  res,edx
    697    }
    698    return res;
    699 }
    700 #endif
    701 
    702 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    703 
    704 static int stbi__sse2_available()
    705 {
    706    int info3 = stbi__cpuid3();
    707    return ((info3 >> 26) & 1) != 0;
    708 }
    709 #else // assume GCC-style if not VC++
    710 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    711 
    712 static int stbi__sse2_available()
    713 {
    714 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
    715    // GCC 4.8+ has a nice way to do this
    716    return __builtin_cpu_supports("sse2");
    717 #else
    718    // portable way to do this, preferably without using GCC inline ASM?
    719    // just bail for now.
    720    return 0;
    721 #endif
    722 }
    723 #endif
    724 #endif
    725 
    726 // ARM NEON
    727 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    728 #undef STBI_NEON
    729 #endif
    730 
    731 #ifdef STBI_NEON
    732 #include <arm_neon.h>
    733 // assume GCC or Clang on ARM targets
    734 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    735 #endif
    736 
    737 #ifndef STBI_SIMD_ALIGN
    738 #define STBI_SIMD_ALIGN(type, name) type name
    739 #endif
    740 
    741 ///////////////////////////////////////////////
    742 //
    743 //  stbi__context struct and start_xxx functions
    744 
    745 // stbi__context structure is our basic context used by all images, so it
    746 // contains all the IO context, plus some basic image information
    747 typedef struct
    748 {
    749    stbi__uint32 img_x, img_y;
    750    int img_n, img_out_n;
    751 
    752    stbi_io_callbacks io;
    753    void *io_user_data;
    754 
    755    int read_from_callbacks;
    756    int buflen;
    757    stbi_uc buffer_start[128];
    758 
    759    stbi_uc *img_buffer, *img_buffer_end;
    760    stbi_uc *img_buffer_original, *img_buffer_original_end;
    761 } stbi__context;
    762 
    763 
    764 static void stbi__refill_buffer(stbi__context *s);
    765 
    766 // initialize a memory-decode context
    767 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    768 {
    769    s->io.read = NULL;
    770    s->read_from_callbacks = 0;
    771    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    772    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    773 }
    774 
    775 // initialize a callback-based context
    776 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    777 {
    778    s->io = *c;
    779    s->io_user_data = user;
    780    s->buflen = sizeof(s->buffer_start);
    781    s->read_from_callbacks = 1;
    782    s->img_buffer_original = s->buffer_start;
    783    stbi__refill_buffer(s);
    784    s->img_buffer_original_end = s->img_buffer_end;
    785 }
    786 
    787 #ifndef STBI_NO_STDIO
    788 
    789 static int stbi__stdio_read(void *user, char *data, int size)
    790 {
    791    return (int) fread(data,1,size,(FILE*) user);
    792 }
    793 
    794 static void stbi__stdio_skip(void *user, int n)
    795 {
    796    fseek((FILE*) user, n, SEEK_CUR);
    797 }
    798 
    799 static int stbi__stdio_eof(void *user)
    800 {
    801    return feof((FILE*) user);
    802 }
    803 
    804 static stbi_io_callbacks stbi__stdio_callbacks =
    805 {
    806    stbi__stdio_read,
    807    stbi__stdio_skip,
    808    stbi__stdio_eof,
    809 };
    810 
    811 static void stbi__start_file(stbi__context *s, FILE *f)
    812 {
    813    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    814 }
    815 
    816 //static void stop_file(stbi__context *s) { }
    817 
    818 #endif // !STBI_NO_STDIO
    819 
    820 static void stbi__rewind(stbi__context *s)
    821 {
    822    // conceptually rewind SHOULD rewind to the beginning of the stream,
    823    // but we just rewind to the beginning of the initial buffer, because
    824    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    825    s->img_buffer = s->img_buffer_original;
    826    s->img_buffer_end = s->img_buffer_original_end;
    827 }
    828 
    829 #ifndef STBI_NO_JPEG
    830 static int      stbi__jpeg_test(stbi__context *s);
    831 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    832 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    833 #endif
    834 
    835 #ifndef STBI_NO_PNG
    836 static int      stbi__png_test(stbi__context *s);
    837 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    838 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    839 #endif
    840 
    841 #ifndef STBI_NO_BMP
    842 static int      stbi__bmp_test(stbi__context *s);
    843 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    844 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    845 #endif
    846 
    847 #ifndef STBI_NO_TGA
    848 static int      stbi__tga_test(stbi__context *s);
    849 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    850 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    851 #endif
    852 
    853 #ifndef STBI_NO_PSD
    854 static int      stbi__psd_test(stbi__context *s);
    855 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    856 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    857 #endif
    858 
    859 #ifndef STBI_NO_HDR
    860 static int      stbi__hdr_test(stbi__context *s);
    861 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    862 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    863 #endif
    864 
    865 #ifndef STBI_NO_PIC
    866 static int      stbi__pic_test(stbi__context *s);
    867 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    868 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    869 #endif
    870 
    871 #ifndef STBI_NO_GIF
    872 static int      stbi__gif_test(stbi__context *s);
    873 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    874 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    875 #endif
    876 
    877 #ifndef STBI_NO_PNM
    878 static int      stbi__pnm_test(stbi__context *s);
    879 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    880 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    881 #endif
    882 
    883 // this is not threadsafe
    884 static const char *stbi__g_failure_reason;
    885 
    886 STBIDEF const char *stbi_failure_reason(void)
    887 {
    888    return stbi__g_failure_reason;
    889 }
    890 
    891 static int stbi__err(const char *str)
    892 {
    893    stbi__g_failure_reason = str;
    894    return 0;
    895 }
    896 
    897 static void *stbi__malloc(size_t size)
    898 {
    899     return STBI_MALLOC(size);
    900 }
    901 
    902 // stbi__err - error
    903 // stbi__errpf - error returning pointer to float
    904 // stbi__errpuc - error returning pointer to unsigned char
    905 
    906 #ifdef STBI_NO_FAILURE_STRINGS
    907    #define stbi__err(x,y)  0
    908 #elif defined(STBI_FAILURE_USERMSG)
    909    #define stbi__err(x,y)  stbi__err(y)
    910 #else
    911    #define stbi__err(x,y)  stbi__err(x)
    912 #endif
    913 
    914 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
    915 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
    916 
    917 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
    918 {
    919    STBI_FREE(retval_from_stbi_load);
    920 }
    921 
    922 #ifndef STBI_NO_LINEAR
    923 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
    924 #endif
    925 
    926 #ifndef STBI_NO_HDR
    927 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
    928 #endif
    929 
    930 static int stbi__vertically_flip_on_load = 0;
    931 
    932 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
    933 {
    934     stbi__vertically_flip_on_load = flag_true_if_should_flip;
    935 }
    936 
    937 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
    938 {
    939    #ifndef STBI_NO_JPEG
    940    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
    941    #endif
    942    #ifndef STBI_NO_PNG
    943    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
    944    #endif
    945    #ifndef STBI_NO_BMP
    946    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
    947    #endif
    948    #ifndef STBI_NO_GIF
    949    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
    950    #endif
    951    #ifndef STBI_NO_PSD
    952    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
    953    #endif
    954    #ifndef STBI_NO_PIC
    955    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
    956    #endif
    957    #ifndef STBI_NO_PNM
    958    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
    959    #endif
    960 
    961    #ifndef STBI_NO_HDR
    962    if (stbi__hdr_test(s)) {
    963       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
    964       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
    965    }
    966    #endif
    967 
    968    #ifndef STBI_NO_TGA
    969    // test tga last because it's a crappy test!
    970    if (stbi__tga_test(s))
    971       return stbi__tga_load(s,x,y,comp,req_comp);
    972    #endif
    973 
    974    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
    975 }
    976 
    977 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
    978 {
    979    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
    980 
    981    if (stbi__vertically_flip_on_load && result != NULL) {
    982       int w = *x, h = *y;
    983       int depth = req_comp ? req_comp : *comp;
    984       int row,col,z;
    985       stbi_uc temp;
    986 
    987       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
    988       for (row = 0; row < (h>>1); row++) {
    989          for (col = 0; col < w; col++) {
    990             for (z = 0; z < depth; z++) {
    991                temp = result[(row * w + col) * depth + z];
    992                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
    993                result[((h - row - 1) * w + col) * depth + z] = temp;
    994             }
    995          }
    996       }
    997    }
    998 
    999    return result;
   1000 }
   1001 
   1002 #ifndef STBI_NO_HDR
   1003 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1004 {
   1005    if (stbi__vertically_flip_on_load && result != NULL) {
   1006       int w = *x, h = *y;
   1007       int depth = req_comp ? req_comp : *comp;
   1008       int row,col,z;
   1009       float temp;
   1010 
   1011       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
   1012       for (row = 0; row < (h>>1); row++) {
   1013          for (col = 0; col < w; col++) {
   1014             for (z = 0; z < depth; z++) {
   1015                temp = result[(row * w + col) * depth + z];
   1016                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
   1017                result[((h - row - 1) * w + col) * depth + z] = temp;
   1018             }
   1019          }
   1020       }
   1021    }
   1022 }
   1023 #endif
   1024 
   1025 #ifndef STBI_NO_STDIO
   1026 
   1027 static FILE *stbi__fopen(char const *filename, char const *mode)
   1028 {
   1029    FILE *f;
   1030 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1031    if (0 != fopen_s(&f, filename, mode))
   1032       f=0;
   1033 #else
   1034    f = fopen(filename, mode);
   1035 #endif
   1036    return f;
   1037 }
   1038 
   1039 
   1040 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1041 {
   1042    FILE *f = stbi__fopen(filename, "rb");
   1043    unsigned char *result;
   1044    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1045    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1046    fclose(f);
   1047    return result;
   1048 }
   1049 
   1050 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1051 {
   1052    unsigned char *result;
   1053    stbi__context s;
   1054    stbi__start_file(&s,f);
   1055    result = stbi__load_flip(&s,x,y,comp,req_comp);
   1056    if (result) {
   1057       // need to 'unget' all the characters in the IO buffer
   1058       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1059    }
   1060    return result;
   1061 }
   1062 #endif //!STBI_NO_STDIO
   1063 
   1064 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1065 {
   1066    stbi__context s;
   1067    stbi__start_mem(&s,buffer,len);
   1068    return stbi__load_flip(&s,x,y,comp,req_comp);
   1069 }
   1070 
   1071 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1072 {
   1073    stbi__context s;
   1074    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1075    return stbi__load_flip(&s,x,y,comp,req_comp);
   1076 }
   1077 
   1078 #ifndef STBI_NO_LINEAR
   1079 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1080 {
   1081    unsigned char *data;
   1082    #ifndef STBI_NO_HDR
   1083    if (stbi__hdr_test(s)) {
   1084       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
   1085       if (hdr_data)
   1086          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1087       return hdr_data;
   1088    }
   1089    #endif
   1090    data = stbi__load_flip(s, x, y, comp, req_comp);
   1091    if (data)
   1092       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1093    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1094 }
   1095 
   1096 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1097 {
   1098    stbi__context s;
   1099    stbi__start_mem(&s,buffer,len);
   1100    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1101 }
   1102 
   1103 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1104 {
   1105    stbi__context s;
   1106    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1107    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1108 }
   1109 
   1110 #ifndef STBI_NO_STDIO
   1111 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1112 {
   1113    float *result;
   1114    FILE *f = stbi__fopen(filename, "rb");
   1115    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1116    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1117    fclose(f);
   1118    return result;
   1119 }
   1120 
   1121 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1122 {
   1123    stbi__context s;
   1124    stbi__start_file(&s,f);
   1125    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1126 }
   1127 #endif // !STBI_NO_STDIO
   1128 
   1129 #endif // !STBI_NO_LINEAR
   1130 
   1131 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1132 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1133 // reports false!
   1134 
   1135 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1136 {
   1137    #ifndef STBI_NO_HDR
   1138    stbi__context s;
   1139    stbi__start_mem(&s,buffer,len);
   1140    return stbi__hdr_test(&s);
   1141    #else
   1142    STBI_NOTUSED(buffer);
   1143    STBI_NOTUSED(len);
   1144    return 0;
   1145    #endif
   1146 }
   1147 
   1148 #ifndef STBI_NO_STDIO
   1149 STBIDEF int      stbi_is_hdr          (char const *filename)
   1150 {
   1151    FILE *f = stbi__fopen(filename, "rb");
   1152    int result=0;
   1153    if (f) {
   1154       result = stbi_is_hdr_from_file(f);
   1155       fclose(f);
   1156    }
   1157    return result;
   1158 }
   1159 
   1160 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
   1161 {
   1162    #ifndef STBI_NO_HDR
   1163    stbi__context s;
   1164    stbi__start_file(&s,f);
   1165    return stbi__hdr_test(&s);
   1166    #else
   1167    STBI_NOTUSED(f);
   1168    return 0;
   1169    #endif
   1170 }
   1171 #endif // !STBI_NO_STDIO
   1172 
   1173 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1174 {
   1175    #ifndef STBI_NO_HDR
   1176    stbi__context s;
   1177    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1178    return stbi__hdr_test(&s);
   1179    #else
   1180    STBI_NOTUSED(clbk);
   1181    STBI_NOTUSED(user);
   1182    return 0;
   1183    #endif
   1184 }
   1185 
   1186 #ifndef STBI_NO_LINEAR
   1187 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1188 
   1189 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1190 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1191 #endif
   1192 
   1193 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1194 
   1195 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1196 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1197 
   1198 
   1199 //////////////////////////////////////////////////////////////////////////////
   1200 //
   1201 // Common code used by all image loaders
   1202 //
   1203 
   1204 enum
   1205 {
   1206    STBI__SCAN_load=0,
   1207    STBI__SCAN_type,
   1208    STBI__SCAN_header
   1209 };
   1210 
   1211 static void stbi__refill_buffer(stbi__context *s)
   1212 {
   1213    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1214    if (n == 0) {
   1215       // at end of file, treat same as if from memory, but need to handle case
   1216       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1217       s->read_from_callbacks = 0;
   1218       s->img_buffer = s->buffer_start;
   1219       s->img_buffer_end = s->buffer_start+1;
   1220       *s->img_buffer = 0;
   1221    } else {
   1222       s->img_buffer = s->buffer_start;
   1223       s->img_buffer_end = s->buffer_start + n;
   1224    }
   1225 }
   1226 
   1227 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1228 {
   1229    if (s->img_buffer < s->img_buffer_end)
   1230       return *s->img_buffer++;
   1231    if (s->read_from_callbacks) {
   1232       stbi__refill_buffer(s);
   1233       return *s->img_buffer++;
   1234    }
   1235    return 0;
   1236 }
   1237 
   1238 stbi_inline static int stbi__at_eof(stbi__context *s)
   1239 {
   1240    if (s->io.read) {
   1241       if (!(s->io.eof)(s->io_user_data)) return 0;
   1242       // if feof() is true, check if buffer = end
   1243       // special case: we've only got the special 0 character at the end
   1244       if (s->read_from_callbacks == 0) return 1;
   1245    }
   1246 
   1247    return s->img_buffer >= s->img_buffer_end;
   1248 }
   1249 
   1250 static void stbi__skip(stbi__context *s, int n)
   1251 {
   1252    if (n < 0) {
   1253       s->img_buffer = s->img_buffer_end;
   1254       return;
   1255    }
   1256    if (s->io.read) {
   1257       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1258       if (blen < n) {
   1259          s->img_buffer = s->img_buffer_end;
   1260          (s->io.skip)(s->io_user_data, n - blen);
   1261          return;
   1262       }
   1263    }
   1264    s->img_buffer += n;
   1265 }
   1266 
   1267 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1268 {
   1269    if (s->io.read) {
   1270       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1271       if (blen < n) {
   1272          int res, count;
   1273 
   1274          memcpy(buffer, s->img_buffer, blen);
   1275 
   1276          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1277          res = (count == (n-blen));
   1278          s->img_buffer = s->img_buffer_end;
   1279          return res;
   1280       }
   1281    }
   1282 
   1283    if (s->img_buffer+n <= s->img_buffer_end) {
   1284       memcpy(buffer, s->img_buffer, n);
   1285       s->img_buffer += n;
   1286       return 1;
   1287    } else
   1288       return 0;
   1289 }
   1290 
   1291 static int stbi__get16be(stbi__context *s)
   1292 {
   1293    int z = stbi__get8(s);
   1294    return (z << 8) + stbi__get8(s);
   1295 }
   1296 
   1297 static stbi__uint32 stbi__get32be(stbi__context *s)
   1298 {
   1299    stbi__uint32 z = stbi__get16be(s);
   1300    return (z << 16) + stbi__get16be(s);
   1301 }
   1302 
   1303 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1304 // nothing
   1305 #else
   1306 static int stbi__get16le(stbi__context *s)
   1307 {
   1308    int z = stbi__get8(s);
   1309    return z + (stbi__get8(s) << 8);
   1310 }
   1311 #endif
   1312 
   1313 #ifndef STBI_NO_BMP
   1314 static stbi__uint32 stbi__get32le(stbi__context *s)
   1315 {
   1316    stbi__uint32 z = stbi__get16le(s);
   1317    return z + (stbi__get16le(s) << 16);
   1318 }
   1319 #endif
   1320 
   1321 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1322 
   1323 
   1324 //////////////////////////////////////////////////////////////////////////////
   1325 //
   1326 //  generic converter from built-in img_n to req_comp
   1327 //    individual types do this automatically as much as possible (e.g. jpeg
   1328 //    does all cases internally since it needs to colorspace convert anyway,
   1329 //    and it never has alpha, so very few cases ). png can automatically
   1330 //    interleave an alpha=255 channel, but falls back to this for other cases
   1331 //
   1332 //  assume data buffer is malloced, so malloc a new one and free that one
   1333 //  only failure mode is malloc failing
   1334 
   1335 static stbi_uc stbi__compute_y(int r, int g, int b)
   1336 {
   1337    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1338 }
   1339 
   1340 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1341 {
   1342    int i,j;
   1343    unsigned char *good;
   1344 
   1345    if (req_comp == img_n) return data;
   1346    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1347 
   1348    good = (unsigned char *) stbi__malloc(req_comp * x * y);
   1349    if (good == NULL) {
   1350       STBI_FREE(data);
   1351       return stbi__errpuc("outofmem", "Out of memory");
   1352    }
   1353 
   1354    for (j=0; j < (int) y; ++j) {
   1355       unsigned char *src  = data + j * x * img_n   ;
   1356       unsigned char *dest = good + j * x * req_comp;
   1357 
   1358       #define COMBO(a,b)  ((a)*8+(b))
   1359       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1360       // convert source image with img_n components to one with req_comp components;
   1361       // avoid switch per pixel, so use switch per scanline and massive macros
   1362       switch (COMBO(img_n, req_comp)) {
   1363          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
   1364          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
   1365          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
   1366          CASE(2,1) dest[0]=src[0]; break;
   1367          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
   1368          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
   1369          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
   1370          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
   1371          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
   1372          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
   1373          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
   1374          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
   1375          default: STBI_ASSERT(0);
   1376       }
   1377       #undef CASE
   1378    }
   1379 
   1380    STBI_FREE(data);
   1381    return good;
   1382 }
   1383 
   1384 #ifndef STBI_NO_LINEAR
   1385 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1386 {
   1387    int i,k,n;
   1388    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
   1389    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1390    // compute number of non-alpha components
   1391    if (comp & 1) n = comp; else n = comp-1;
   1392    for (i=0; i < x*y; ++i) {
   1393       for (k=0; k < n; ++k) {
   1394          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1395       }
   1396       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
   1397    }
   1398    STBI_FREE(data);
   1399    return output;
   1400 }
   1401 #endif
   1402 
   1403 #ifndef STBI_NO_HDR
   1404 #define stbi__float2int(x)   ((int) (x))
   1405 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1406 {
   1407    int i,k,n;
   1408    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
   1409    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1410    // compute number of non-alpha components
   1411    if (comp & 1) n = comp; else n = comp-1;
   1412    for (i=0; i < x*y; ++i) {
   1413       for (k=0; k < n; ++k) {
   1414          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1415          if (z < 0) z = 0;
   1416          if (z > 255) z = 255;
   1417          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1418       }
   1419       if (k < comp) {
   1420          float z = data[i*comp+k] * 255 + 0.5f;
   1421          if (z < 0) z = 0;
   1422          if (z > 255) z = 255;
   1423          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1424       }
   1425    }
   1426    STBI_FREE(data);
   1427    return output;
   1428 }
   1429 #endif
   1430 
   1431 //////////////////////////////////////////////////////////////////////////////
   1432 //
   1433 //  "baseline" JPEG/JFIF decoder
   1434 //
   1435 //    simple implementation
   1436 //      - doesn't support delayed output of y-dimension
   1437 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1438 //      - doesn't try to recover corrupt jpegs
   1439 //      - doesn't allow partial loading, loading multiple at once
   1440 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1441 //      - allocates lots of intermediate memory (full size of all components)
   1442 //        - non-interleaved case requires this anyway
   1443 //        - allows good upsampling (see next)
   1444 //    high-quality
   1445 //      - upsampled channels are bilinearly interpolated, even across blocks
   1446 //      - quality integer IDCT derived from IJG's 'slow'
   1447 //    performance
   1448 //      - fast huffman; reasonable integer IDCT
   1449 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1450 //      - uses a lot of intermediate memory, could cache poorly
   1451 
   1452 #ifndef STBI_NO_JPEG
   1453 
   1454 // huffman decoding acceleration
   1455 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1456 
   1457 typedef struct
   1458 {
   1459    stbi_uc  fast[1 << FAST_BITS];
   1460    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1461    stbi__uint16 code[256];
   1462    stbi_uc  values[256];
   1463    stbi_uc  size[257];
   1464    unsigned int maxcode[18];
   1465    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1466 } stbi__huffman;
   1467 
   1468 typedef struct
   1469 {
   1470    stbi__context *s;
   1471    stbi__huffman huff_dc[4];
   1472    stbi__huffman huff_ac[4];
   1473    stbi_uc dequant[4][64];
   1474    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1475 
   1476 // sizes for components, interleaved MCUs
   1477    int img_h_max, img_v_max;
   1478    int img_mcu_x, img_mcu_y;
   1479    int img_mcu_w, img_mcu_h;
   1480 
   1481 // definition of jpeg image component
   1482    struct
   1483    {
   1484       int id;
   1485       int h,v;
   1486       int tq;
   1487       int hd,ha;
   1488       int dc_pred;
   1489 
   1490       int x,y,w2,h2;
   1491       stbi_uc *data;
   1492       void *raw_data, *raw_coeff;
   1493       stbi_uc *linebuf;
   1494       short   *coeff;   // progressive only
   1495       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1496    } img_comp[4];
   1497 
   1498    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1499    int            code_bits;   // number of valid bits
   1500    unsigned char  marker;      // marker seen while filling entropy buffer
   1501    int            nomore;      // flag if we saw a marker so must stop
   1502 
   1503    int            progressive;
   1504    int            spec_start;
   1505    int            spec_end;
   1506    int            succ_high;
   1507    int            succ_low;
   1508    int            eob_run;
   1509 
   1510    int scan_n, order[4];
   1511    int restart_interval, todo;
   1512 
   1513 // kernels
   1514    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1515    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1516    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1517 } stbi__jpeg;
   1518 
   1519 static int stbi__build_huffman(stbi__huffman *h, int *count)
   1520 {
   1521    int i,j,k=0,code;
   1522    // build size list for each symbol (from JPEG spec)
   1523    for (i=0; i < 16; ++i)
   1524       for (j=0; j < count[i]; ++j)
   1525          h->size[k++] = (stbi_uc) (i+1);
   1526    h->size[k] = 0;
   1527 
   1528    // compute actual symbols (from jpeg spec)
   1529    code = 0;
   1530    k = 0;
   1531    for(j=1; j <= 16; ++j) {
   1532       // compute delta to add to code to compute symbol id
   1533       h->delta[j] = k - code;
   1534       if (h->size[k] == j) {
   1535          while (h->size[k] == j)
   1536             h->code[k++] = (stbi__uint16) (code++);
   1537          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   1538       }
   1539       // compute largest code + 1 for this size, preshifted as needed later
   1540       h->maxcode[j] = code << (16-j);
   1541       code <<= 1;
   1542    }
   1543    h->maxcode[j] = 0xffffffff;
   1544 
   1545    // build non-spec acceleration table; 255 is flag for not-accelerated
   1546    memset(h->fast, 255, 1 << FAST_BITS);
   1547    for (i=0; i < k; ++i) {
   1548       int s = h->size[i];
   1549       if (s <= FAST_BITS) {
   1550          int c = h->code[i] << (FAST_BITS-s);
   1551          int m = 1 << (FAST_BITS-s);
   1552          for (j=0; j < m; ++j) {
   1553             h->fast[c+j] = (stbi_uc) i;
   1554          }
   1555       }
   1556    }
   1557    return 1;
   1558 }
   1559 
   1560 // build a table that decodes both magnitude and value of small ACs in
   1561 // one go.
   1562 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   1563 {
   1564    int i;
   1565    for (i=0; i < (1 << FAST_BITS); ++i) {
   1566       stbi_uc fast = h->fast[i];
   1567       fast_ac[i] = 0;
   1568       if (fast < 255) {
   1569          int rs = h->values[fast];
   1570          int run = (rs >> 4) & 15;
   1571          int magbits = rs & 15;
   1572          int len = h->size[fast];
   1573 
   1574          if (magbits && len + magbits <= FAST_BITS) {
   1575             // magnitude code followed by receive_extend code
   1576             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   1577             int m = 1 << (magbits - 1);
   1578             if (k < m) k += (-1 << magbits) + 1;
   1579             // if the result is small enough, we can fit it in fast_ac table
   1580             if (k >= -128 && k <= 127)
   1581                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
   1582          }
   1583       }
   1584    }
   1585 }
   1586 
   1587 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   1588 {
   1589    do {
   1590       int b = j->nomore ? 0 : stbi__get8(j->s);
   1591       if (b == 0xff) {
   1592          int c = stbi__get8(j->s);
   1593          if (c != 0) {
   1594             j->marker = (unsigned char) c;
   1595             j->nomore = 1;
   1596             return;
   1597          }
   1598       }
   1599       j->code_buffer |= b << (24 - j->code_bits);
   1600       j->code_bits += 8;
   1601    } while (j->code_bits <= 24);
   1602 }
   1603 
   1604 // (1 << n) - 1
   1605 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   1606 
   1607 // decode a jpeg huffman value from the bitstream
   1608 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   1609 {
   1610    unsigned int temp;
   1611    int c,k;
   1612 
   1613    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1614 
   1615    // look at the top FAST_BITS and determine what symbol ID it is,
   1616    // if the code is <= FAST_BITS
   1617    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1618    k = h->fast[c];
   1619    if (k < 255) {
   1620       int s = h->size[k];
   1621       if (s > j->code_bits)
   1622          return -1;
   1623       j->code_buffer <<= s;
   1624       j->code_bits -= s;
   1625       return h->values[k];
   1626    }
   1627 
   1628    // naive test is to shift the code_buffer down so k bits are
   1629    // valid, then test against maxcode. To speed this up, we've
   1630    // preshifted maxcode left so that it has (16-k) 0s at the
   1631    // end; in other words, regardless of the number of bits, it
   1632    // wants to be compared against something shifted to have 16;
   1633    // that way we don't need to shift inside the loop.
   1634    temp = j->code_buffer >> 16;
   1635    for (k=FAST_BITS+1 ; ; ++k)
   1636       if (temp < h->maxcode[k])
   1637          break;
   1638    if (k == 17) {
   1639       // error! code not found
   1640       j->code_bits -= 16;
   1641       return -1;
   1642    }
   1643 
   1644    if (k > j->code_bits)
   1645       return -1;
   1646 
   1647    // convert the huffman code to the symbol id
   1648    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   1649    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   1650 
   1651    // convert the id to a symbol
   1652    j->code_bits -= k;
   1653    j->code_buffer <<= k;
   1654    return h->values[c];
   1655 }
   1656 
   1657 // bias[n] = (-1<<n) + 1
   1658 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   1659 
   1660 // combined JPEG 'receive' and JPEG 'extend', since baseline
   1661 // always extends everything it receives.
   1662 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   1663 {
   1664    unsigned int k;
   1665    int sgn;
   1666    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1667 
   1668    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
   1669    k = stbi_lrot(j->code_buffer, n);
   1670    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
   1671    j->code_buffer = k & ~stbi__bmask[n];
   1672    k &= stbi__bmask[n];
   1673    j->code_bits -= n;
   1674    return k + (stbi__jbias[n] & ~sgn);
   1675 }
   1676 
   1677 // get some unsigned bits
   1678 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   1679 {
   1680    unsigned int k;
   1681    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1682    k = stbi_lrot(j->code_buffer, n);
   1683    j->code_buffer = k & ~stbi__bmask[n];
   1684    k &= stbi__bmask[n];
   1685    j->code_bits -= n;
   1686    return k;
   1687 }
   1688 
   1689 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   1690 {
   1691    unsigned int k;
   1692    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   1693    k = j->code_buffer;
   1694    j->code_buffer <<= 1;
   1695    --j->code_bits;
   1696    return k & 0x80000000;
   1697 }
   1698 
   1699 // given a value that's at position X in the zigzag stream,
   1700 // where does it appear in the 8x8 matrix coded as row-major?
   1701 static stbi_uc stbi__jpeg_dezigzag[64+15] =
   1702 {
   1703     0,  1,  8, 16,  9,  2,  3, 10,
   1704    17, 24, 32, 25, 18, 11,  4,  5,
   1705    12, 19, 26, 33, 40, 48, 41, 34,
   1706    27, 20, 13,  6,  7, 14, 21, 28,
   1707    35, 42, 49, 56, 57, 50, 43, 36,
   1708    29, 22, 15, 23, 30, 37, 44, 51,
   1709    58, 59, 52, 45, 38, 31, 39, 46,
   1710    53, 60, 61, 54, 47, 55, 62, 63,
   1711    // let corrupt input sample past end
   1712    63, 63, 63, 63, 63, 63, 63, 63,
   1713    63, 63, 63, 63, 63, 63, 63
   1714 };
   1715 
   1716 // decode one 64-entry block--
   1717 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
   1718 {
   1719    int diff,dc,k;
   1720    int t;
   1721 
   1722    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1723    t = stbi__jpeg_huff_decode(j, hdc);
   1724    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1725 
   1726    // 0 all the ac values now so we can do it 32-bits at a time
   1727    memset(data,0,64*sizeof(data[0]));
   1728 
   1729    diff = t ? stbi__extend_receive(j, t) : 0;
   1730    dc = j->img_comp[b].dc_pred + diff;
   1731    j->img_comp[b].dc_pred = dc;
   1732    data[0] = (short) (dc * dequant[0]);
   1733 
   1734    // decode AC components, see JPEG spec
   1735    k = 1;
   1736    do {
   1737       unsigned int zig;
   1738       int c,r,s;
   1739       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1740       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1741       r = fac[c];
   1742       if (r) { // fast-AC path
   1743          k += (r >> 4) & 15; // run
   1744          s = r & 15; // combined length
   1745          j->code_buffer <<= s;
   1746          j->code_bits -= s;
   1747          // decode into unzigzag'd location
   1748          zig = stbi__jpeg_dezigzag[k++];
   1749          data[zig] = (short) ((r >> 8) * dequant[zig]);
   1750       } else {
   1751          int rs = stbi__jpeg_huff_decode(j, hac);
   1752          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1753          s = rs & 15;
   1754          r = rs >> 4;
   1755          if (s == 0) {
   1756             if (rs != 0xf0) break; // end block
   1757             k += 16;
   1758          } else {
   1759             k += r;
   1760             // decode into unzigzag'd location
   1761             zig = stbi__jpeg_dezigzag[k++];
   1762             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   1763          }
   1764       }
   1765    } while (k < 64);
   1766    return 1;
   1767 }
   1768 
   1769 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   1770 {
   1771    int diff,dc;
   1772    int t;
   1773    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   1774 
   1775    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1776 
   1777    if (j->succ_high == 0) {
   1778       // first scan for DC coefficient, must be first
   1779       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   1780       t = stbi__jpeg_huff_decode(j, hdc);
   1781       diff = t ? stbi__extend_receive(j, t) : 0;
   1782 
   1783       dc = j->img_comp[b].dc_pred + diff;
   1784       j->img_comp[b].dc_pred = dc;
   1785       data[0] = (short) (dc << j->succ_low);
   1786    } else {
   1787       // refinement scan for DC coefficient
   1788       if (stbi__jpeg_get_bit(j))
   1789          data[0] += (short) (1 << j->succ_low);
   1790    }
   1791    return 1;
   1792 }
   1793 
   1794 // @OPTIMIZE: store non-zigzagged during the decode passes,
   1795 // and only de-zigzag when dequantizing
   1796 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   1797 {
   1798    int k;
   1799    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   1800 
   1801    if (j->succ_high == 0) {
   1802       int shift = j->succ_low;
   1803 
   1804       if (j->eob_run) {
   1805          --j->eob_run;
   1806          return 1;
   1807       }
   1808 
   1809       k = j->spec_start;
   1810       do {
   1811          unsigned int zig;
   1812          int c,r,s;
   1813          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1814          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1815          r = fac[c];
   1816          if (r) { // fast-AC path
   1817             k += (r >> 4) & 15; // run
   1818             s = r & 15; // combined length
   1819             j->code_buffer <<= s;
   1820             j->code_bits -= s;
   1821             zig = stbi__jpeg_dezigzag[k++];
   1822             data[zig] = (short) ((r >> 8) << shift);
   1823          } else {
   1824             int rs = stbi__jpeg_huff_decode(j, hac);
   1825             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1826             s = rs & 15;
   1827             r = rs >> 4;
   1828             if (s == 0) {
   1829                if (r < 15) {
   1830                   j->eob_run = (1 << r);
   1831                   if (r)
   1832                      j->eob_run += stbi__jpeg_get_bits(j, r);
   1833                   --j->eob_run;
   1834                   break;
   1835                }
   1836                k += 16;
   1837             } else {
   1838                k += r;
   1839                zig = stbi__jpeg_dezigzag[k++];
   1840                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
   1841             }
   1842          }
   1843       } while (k <= j->spec_end);
   1844    } else {
   1845       // refinement scan for these AC coefficients
   1846 
   1847       short bit = (short) (1 << j->succ_low);
   1848 
   1849       if (j->eob_run) {
   1850          --j->eob_run;
   1851          for (k = j->spec_start; k <= j->spec_end; ++k) {
   1852             short *p = &data[stbi__jpeg_dezigzag[k]];
   1853             if (*p != 0)
   1854                if (stbi__jpeg_get_bit(j))
   1855                   if ((*p & bit)==0) {
   1856                      if (*p > 0)
   1857                         *p += bit;
   1858                      else
   1859                         *p -= bit;
   1860                   }
   1861          }
   1862       } else {
   1863          k = j->spec_start;
   1864          do {
   1865             int r,s;
   1866             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   1867             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1868             s = rs & 15;
   1869             r = rs >> 4;
   1870             if (s == 0) {
   1871                if (r < 15) {
   1872                   j->eob_run = (1 << r) - 1;
   1873                   if (r)
   1874                      j->eob_run += stbi__jpeg_get_bits(j, r);
   1875                   r = 64; // force end of block
   1876                } else {
   1877                   // r=15 s=0 should write 16 0s, so we just do
   1878                   // a run of 15 0s and then write s (which is 0),
   1879                   // so we don't have to do anything special here
   1880                }
   1881             } else {
   1882                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   1883                // sign bit
   1884                if (stbi__jpeg_get_bit(j))
   1885                   s = bit;
   1886                else
   1887                   s = -bit;
   1888             }
   1889 
   1890             // advance by r
   1891             while (k <= j->spec_end) {
   1892                short *p = &data[stbi__jpeg_dezigzag[k++]];
   1893                if (*p != 0) {
   1894                   if (stbi__jpeg_get_bit(j))
   1895                      if ((*p & bit)==0) {
   1896                         if (*p > 0)
   1897                            *p += bit;
   1898                         else
   1899                            *p -= bit;
   1900                      }
   1901                } else {
   1902                   if (r == 0) {
   1903                      *p = (short) s;
   1904                      break;
   1905                   }
   1906                   --r;
   1907                }
   1908             }
   1909          } while (k <= j->spec_end);
   1910       }
   1911    }
   1912    return 1;
   1913 }
   1914 
   1915 // take a -128..127 value and stbi__clamp it and convert to 0..255
   1916 stbi_inline static stbi_uc stbi__clamp(int x)
   1917 {
   1918    // trick to use a single test to catch both cases
   1919    if ((unsigned int) x > 255) {
   1920       if (x < 0) return 0;
   1921       if (x > 255) return 255;
   1922    }
   1923    return (stbi_uc) x;
   1924 }
   1925 
   1926 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   1927 #define stbi__fsh(x)  ((x) << 12)
   1928 
   1929 // derived from jidctint -- DCT_ISLOW
   1930 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   1931    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   1932    p2 = s2;                                    \
   1933    p3 = s6;                                    \
   1934    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   1935    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   1936    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   1937    p2 = s0;                                    \
   1938    p3 = s4;                                    \
   1939    t0 = stbi__fsh(p2+p3);                      \
   1940    t1 = stbi__fsh(p2-p3);                      \
   1941    x0 = t0+t3;                                 \
   1942    x3 = t0-t3;                                 \
   1943    x1 = t1+t2;                                 \
   1944    x2 = t1-t2;                                 \
   1945    t0 = s7;                                    \
   1946    t1 = s5;                                    \
   1947    t2 = s3;                                    \
   1948    t3 = s1;                                    \
   1949    p3 = t0+t2;                                 \
   1950    p4 = t1+t3;                                 \
   1951    p1 = t0+t3;                                 \
   1952    p2 = t1+t2;                                 \
   1953    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   1954    t0 = t0*stbi__f2f( 0.298631336f);           \
   1955    t1 = t1*stbi__f2f( 2.053119869f);           \
   1956    t2 = t2*stbi__f2f( 3.072711026f);           \
   1957    t3 = t3*stbi__f2f( 1.501321110f);           \
   1958    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   1959    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   1960    p3 = p3*stbi__f2f(-1.961570560f);           \
   1961    p4 = p4*stbi__f2f(-0.390180644f);           \
   1962    t3 += p1+p4;                                \
   1963    t2 += p2+p3;                                \
   1964    t1 += p2+p4;                                \
   1965    t0 += p1+p3;
   1966 
   1967 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   1968 {
   1969    int i,val[64],*v=val;
   1970    stbi_uc *o;
   1971    short *d = data;
   1972 
   1973    // columns
   1974    for (i=0; i < 8; ++i,++d, ++v) {
   1975       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   1976       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   1977            && d[40]==0 && d[48]==0 && d[56]==0) {
   1978          //    no shortcut                 0     seconds
   1979          //    (1|2|3|4|5|6|7)==0          0     seconds
   1980          //    all separate               -0.047 seconds
   1981          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   1982          int dcterm = d[0] << 2;
   1983          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   1984       } else {
   1985          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   1986          // constants scaled things up by 1<<12; let's bring them back
   1987          // down, but keep 2 extra bits of precision
   1988          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   1989          v[ 0] = (x0+t3) >> 10;
   1990          v[56] = (x0-t3) >> 10;
   1991          v[ 8] = (x1+t2) >> 10;
   1992          v[48] = (x1-t2) >> 10;
   1993          v[16] = (x2+t1) >> 10;
   1994          v[40] = (x2-t1) >> 10;
   1995          v[24] = (x3+t0) >> 10;
   1996          v[32] = (x3-t0) >> 10;
   1997       }
   1998    }
   1999 
   2000    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2001       // no fast case since the first 1D IDCT spread components out
   2002       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2003       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2004       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2005       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2006       // so we want to round that, which means adding 0.5 * 1<<17,
   2007       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2008       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2009       x0 += 65536 + (128<<17);
   2010       x1 += 65536 + (128<<17);
   2011       x2 += 65536 + (128<<17);
   2012       x3 += 65536 + (128<<17);
   2013       // tried computing the shifts into temps, or'ing the temps to see
   2014       // if any were out of range, but that was slower
   2015       o[0] = stbi__clamp((x0+t3) >> 17);
   2016       o[7] = stbi__clamp((x0-t3) >> 17);
   2017       o[1] = stbi__clamp((x1+t2) >> 17);
   2018       o[6] = stbi__clamp((x1-t2) >> 17);
   2019       o[2] = stbi__clamp((x2+t1) >> 17);
   2020       o[5] = stbi__clamp((x2-t1) >> 17);
   2021       o[3] = stbi__clamp((x3+t0) >> 17);
   2022       o[4] = stbi__clamp((x3-t0) >> 17);
   2023    }
   2024 }
   2025 
   2026 #ifdef STBI_SSE2
   2027 // sse2 integer IDCT. not the fastest possible implementation but it
   2028 // produces bit-identical results to the generic C version so it's
   2029 // fully "transparent".
   2030 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2031 {
   2032    // This is constructed to match our regular (generic) integer IDCT exactly.
   2033    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2034    __m128i tmp;
   2035 
   2036    // dot product constant: even elems=x, odd elems=y
   2037    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2038 
   2039    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2040    // out(1) = c1[even]*x + c1[odd]*y
   2041    #define dct_rot(out0,out1, x,y,c0,c1) \
   2042       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2043       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2044       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2045       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2046       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2047       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2048 
   2049    // out = in << 12  (in 16-bit, out 32-bit)
   2050    #define dct_widen(out, in) \
   2051       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2052       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2053 
   2054    // wide add
   2055    #define dct_wadd(out, a, b) \
   2056       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2057       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2058 
   2059    // wide sub
   2060    #define dct_wsub(out, a, b) \
   2061       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2062       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2063 
   2064    // butterfly a/b, add bias, then shift by "s" and pack
   2065    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2066       { \
   2067          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2068          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2069          dct_wadd(sum, abiased, b); \
   2070          dct_wsub(dif, abiased, b); \
   2071          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2072          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2073       }
   2074 
   2075    // 8-bit interleave step (for transposes)
   2076    #define dct_interleave8(a, b) \
   2077       tmp = a; \
   2078       a = _mm_unpacklo_epi8(a, b); \
   2079       b = _mm_unpackhi_epi8(tmp, b)
   2080 
   2081    // 16-bit interleave step (for transposes)
   2082    #define dct_interleave16(a, b) \
   2083       tmp = a; \
   2084       a = _mm_unpacklo_epi16(a, b); \
   2085       b = _mm_unpackhi_epi16(tmp, b)
   2086 
   2087    #define dct_pass(bias,shift) \
   2088       { \
   2089          /* even part */ \
   2090          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2091          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2092          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2093          dct_widen(t0e, sum04); \
   2094          dct_widen(t1e, dif04); \
   2095          dct_wadd(x0, t0e, t3e); \
   2096          dct_wsub(x3, t0e, t3e); \
   2097          dct_wadd(x1, t1e, t2e); \
   2098          dct_wsub(x2, t1e, t2e); \
   2099          /* odd part */ \
   2100          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2101          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2102          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2103          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2104          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2105          dct_wadd(x4, y0o, y4o); \
   2106          dct_wadd(x5, y1o, y5o); \
   2107          dct_wadd(x6, y2o, y5o); \
   2108          dct_wadd(x7, y3o, y4o); \
   2109          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2110          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2111          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2112          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2113       }
   2114 
   2115    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2116    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2117    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2118    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2119    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2120    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2121    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2122    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2123 
   2124    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2125    __m128i bias_0 = _mm_set1_epi32(512);
   2126    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2127 
   2128    // load
   2129    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2130    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2131    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2132    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2133    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2134    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2135    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2136    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2137 
   2138    // column pass
   2139    dct_pass(bias_0, 10);
   2140 
   2141    {
   2142       // 16bit 8x8 transpose pass 1
   2143       dct_interleave16(row0, row4);
   2144       dct_interleave16(row1, row5);
   2145       dct_interleave16(row2, row6);
   2146       dct_interleave16(row3, row7);
   2147 
   2148       // transpose pass 2
   2149       dct_interleave16(row0, row2);
   2150       dct_interleave16(row1, row3);
   2151       dct_interleave16(row4, row6);
   2152       dct_interleave16(row5, row7);
   2153 
   2154       // transpose pass 3
   2155       dct_interleave16(row0, row1);
   2156       dct_interleave16(row2, row3);
   2157       dct_interleave16(row4, row5);
   2158       dct_interleave16(row6, row7);
   2159    }
   2160 
   2161    // row pass
   2162    dct_pass(bias_1, 17);
   2163 
   2164    {
   2165       // pack
   2166       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2167       __m128i p1 = _mm_packus_epi16(row2, row3);
   2168       __m128i p2 = _mm_packus_epi16(row4, row5);
   2169       __m128i p3 = _mm_packus_epi16(row6, row7);
   2170 
   2171       // 8bit 8x8 transpose pass 1
   2172       dct_interleave8(p0, p2); // a0e0a1e1...
   2173       dct_interleave8(p1, p3); // c0g0c1g1...
   2174 
   2175       // transpose pass 2
   2176       dct_interleave8(p0, p1); // a0c0e0g0...
   2177       dct_interleave8(p2, p3); // b0d0f0h0...
   2178 
   2179       // transpose pass 3
   2180       dct_interleave8(p0, p2); // a0b0c0d0...
   2181       dct_interleave8(p1, p3); // a4b4c4d4...
   2182 
   2183       // store
   2184       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2185       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2186       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2187       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2188       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2189       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2190       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2191       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2192    }
   2193 
   2194 #undef dct_const
   2195 #undef dct_rot
   2196 #undef dct_widen
   2197 #undef dct_wadd
   2198 #undef dct_wsub
   2199 #undef dct_bfly32o
   2200 #undef dct_interleave8
   2201 #undef dct_interleave16
   2202 #undef dct_pass
   2203 }
   2204 
   2205 #endif // STBI_SSE2
   2206 
   2207 #ifdef STBI_NEON
   2208 
   2209 // NEON integer IDCT. should produce bit-identical
   2210 // results to the generic C version.
   2211 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2212 {
   2213    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2214 
   2215    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2216    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2217    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2218    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2219    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2220    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2221    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2222    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2223    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2224    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2225    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2226    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2227 
   2228 #define dct_long_mul(out, inq, coeff) \
   2229    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2230    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2231 
   2232 #define dct_long_mac(out, acc, inq, coeff) \
   2233    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2234    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2235 
   2236 #define dct_widen(out, inq) \
   2237    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2238    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2239 
   2240 // wide add
   2241 #define dct_wadd(out, a, b) \
   2242    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2243    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2244 
   2245 // wide sub
   2246 #define dct_wsub(out, a, b) \
   2247    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2248    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2249 
   2250 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2251 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2252    { \
   2253       dct_wadd(sum, a, b); \
   2254       dct_wsub(dif, a, b); \
   2255       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2256       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2257    }
   2258 
   2259 #define dct_pass(shiftop, shift) \
   2260    { \
   2261       /* even part */ \
   2262       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2263       dct_long_mul(p1e, sum26, rot0_0); \
   2264       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2265       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2266       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2267       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2268       dct_widen(t0e, sum04); \
   2269       dct_widen(t1e, dif04); \
   2270       dct_wadd(x0, t0e, t3e); \
   2271       dct_wsub(x3, t0e, t3e); \
   2272       dct_wadd(x1, t1e, t2e); \
   2273       dct_wsub(x2, t1e, t2e); \
   2274       /* odd part */ \
   2275       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2276       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2277       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2278       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2279       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2280       dct_long_mul(p5o, sumodd, rot1_0); \
   2281       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2282       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2283       dct_long_mul(p3o, sum37, rot2_0); \
   2284       dct_long_mul(p4o, sum15, rot2_1); \
   2285       dct_wadd(sump13o, p1o, p3o); \
   2286       dct_wadd(sump24o, p2o, p4o); \
   2287       dct_wadd(sump23o, p2o, p3o); \
   2288       dct_wadd(sump14o, p1o, p4o); \
   2289       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2290       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2291       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2292       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2293       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2294       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2295       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2296       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2297    }
   2298 
   2299    // load
   2300    row0 = vld1q_s16(data + 0*8);
   2301    row1 = vld1q_s16(data + 1*8);
   2302    row2 = vld1q_s16(data + 2*8);
   2303    row3 = vld1q_s16(data + 3*8);
   2304    row4 = vld1q_s16(data + 4*8);
   2305    row5 = vld1q_s16(data + 5*8);
   2306    row6 = vld1q_s16(data + 6*8);
   2307    row7 = vld1q_s16(data + 7*8);
   2308 
   2309    // add DC bias
   2310    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2311 
   2312    // column pass
   2313    dct_pass(vrshrn_n_s32, 10);
   2314 
   2315    // 16bit 8x8 transpose
   2316    {
   2317 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2318 // whether compilers actually get this is another story, sadly.
   2319 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2320 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2321 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2322 
   2323       // pass 1
   2324       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2325       dct_trn16(row2, row3);
   2326       dct_trn16(row4, row5);
   2327       dct_trn16(row6, row7);
   2328 
   2329       // pass 2
   2330       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2331       dct_trn32(row1, row3);
   2332       dct_trn32(row4, row6);
   2333       dct_trn32(row5, row7);
   2334 
   2335       // pass 3
   2336       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2337       dct_trn64(row1, row5);
   2338       dct_trn64(row2, row6);
   2339       dct_trn64(row3, row7);
   2340 
   2341 #undef dct_trn16
   2342 #undef dct_trn32
   2343 #undef dct_trn64
   2344    }
   2345 
   2346    // row pass
   2347    // vrshrn_n_s32 only supports shifts up to 16, we need
   2348    // 17. so do a non-rounding shift of 16 first then follow
   2349    // up with a rounding shift by 1.
   2350    dct_pass(vshrn_n_s32, 16);
   2351 
   2352    {
   2353       // pack and round
   2354       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2355       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2356       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2357       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2358       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2359       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2360       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2361       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2362 
   2363       // again, these can translate into one instruction, but often don't.
   2364 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2365 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2366 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2367 
   2368       // sadly can't use interleaved stores here since we only write
   2369       // 8 bytes to each scan line!
   2370 
   2371       // 8x8 8-bit transpose pass 1
   2372       dct_trn8_8(p0, p1);
   2373       dct_trn8_8(p2, p3);
   2374       dct_trn8_8(p4, p5);
   2375       dct_trn8_8(p6, p7);
   2376 
   2377       // pass 2
   2378       dct_trn8_16(p0, p2);
   2379       dct_trn8_16(p1, p3);
   2380       dct_trn8_16(p4, p6);
   2381       dct_trn8_16(p5, p7);
   2382 
   2383       // pass 3
   2384       dct_trn8_32(p0, p4);
   2385       dct_trn8_32(p1, p5);
   2386       dct_trn8_32(p2, p6);
   2387       dct_trn8_32(p3, p7);
   2388 
   2389       // store
   2390       vst1_u8(out, p0); out += out_stride;
   2391       vst1_u8(out, p1); out += out_stride;
   2392       vst1_u8(out, p2); out += out_stride;
   2393       vst1_u8(out, p3); out += out_stride;
   2394       vst1_u8(out, p4); out += out_stride;
   2395       vst1_u8(out, p5); out += out_stride;
   2396       vst1_u8(out, p6); out += out_stride;
   2397       vst1_u8(out, p7);
   2398 
   2399 #undef dct_trn8_8
   2400 #undef dct_trn8_16
   2401 #undef dct_trn8_32
   2402    }
   2403 
   2404 #undef dct_long_mul
   2405 #undef dct_long_mac
   2406 #undef dct_widen
   2407 #undef dct_wadd
   2408 #undef dct_wsub
   2409 #undef dct_bfly32o
   2410 #undef dct_pass
   2411 }
   2412 
   2413 #endif // STBI_NEON
   2414 
   2415 #define STBI__MARKER_none  0xff
   2416 // if there's a pending marker from the entropy stream, return that
   2417 // otherwise, fetch from the stream and get a marker. if there's no
   2418 // marker, return 0xff, which is never a valid marker value
   2419 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2420 {
   2421    stbi_uc x;
   2422    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2423    x = stbi__get8(j->s);
   2424    if (x != 0xff) return STBI__MARKER_none;
   2425    while (x == 0xff)
   2426       x = stbi__get8(j->s);
   2427    return x;
   2428 }
   2429 
   2430 // in each scan, we'll have scan_n components, and the order
   2431 // of the components is specified by order[]
   2432 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2433 
   2434 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2435 // the dc prediction
   2436 static void stbi__jpeg_reset(stbi__jpeg *j)
   2437 {
   2438    j->code_bits = 0;
   2439    j->code_buffer = 0;
   2440    j->nomore = 0;
   2441    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
   2442    j->marker = STBI__MARKER_none;
   2443    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2444    j->eob_run = 0;
   2445    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2446    // since we don't even allow 1<<30 pixels
   2447 }
   2448 
   2449 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2450 {
   2451    stbi__jpeg_reset(z);
   2452    if (!z->progressive) {
   2453       if (z->scan_n == 1) {
   2454          int i,j;
   2455          STBI_SIMD_ALIGN(short, data[64]);
   2456          int n = z->order[0];
   2457          // non-interleaved data, we just need to process one block at a time,
   2458          // in trivial scanline order
   2459          // number of blocks to do just depends on how many actual "pixels" this
   2460          // component has, independent of interleaved MCU blocking and such
   2461          int w = (z->img_comp[n].x+7) >> 3;
   2462          int h = (z->img_comp[n].y+7) >> 3;
   2463          for (j=0; j < h; ++j) {
   2464             for (i=0; i < w; ++i) {
   2465                int ha = z->img_comp[n].ha;
   2466                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2467                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2468                // every data block is an MCU, so countdown the restart interval
   2469                if (--z->todo <= 0) {
   2470                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2471                   // if it's NOT a restart, then just bail, so we get corrupt data
   2472                   // rather than no data
   2473                   if (!STBI__RESTART(z->marker)) return 1;
   2474                   stbi__jpeg_reset(z);
   2475                }
   2476             }
   2477          }
   2478          return 1;
   2479       } else { // interleaved
   2480          int i,j,k,x,y;
   2481          STBI_SIMD_ALIGN(short, data[64]);
   2482          for (j=0; j < z->img_mcu_y; ++j) {
   2483             for (i=0; i < z->img_mcu_x; ++i) {
   2484                // scan an interleaved mcu... process scan_n components in order
   2485                for (k=0; k < z->scan_n; ++k) {
   2486                   int n = z->order[k];
   2487                   // scan out an mcu's worth of this component; that's just determined
   2488                   // by the basic H and V specified for the component
   2489                   for (y=0; y < z->img_comp[n].v; ++y) {
   2490                      for (x=0; x < z->img_comp[n].h; ++x) {
   2491                         int x2 = (i*z->img_comp[n].h + x)*8;
   2492                         int y2 = (j*z->img_comp[n].v + y)*8;
   2493                         int ha = z->img_comp[n].ha;
   2494                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2495                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2496                      }
   2497                   }
   2498                }
   2499                // after all interleaved components, that's an interleaved MCU,
   2500                // so now count down the restart interval
   2501                if (--z->todo <= 0) {
   2502                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2503                   if (!STBI__RESTART(z->marker)) return 1;
   2504                   stbi__jpeg_reset(z);
   2505                }
   2506             }
   2507          }
   2508          return 1;
   2509       }
   2510    } else {
   2511       if (z->scan_n == 1) {
   2512          int i,j;
   2513          int n = z->order[0];
   2514          // non-interleaved data, we just need to process one block at a time,
   2515          // in trivial scanline order
   2516          // number of blocks to do just depends on how many actual "pixels" this
   2517          // component has, independent of interleaved MCU blocking and such
   2518          int w = (z->img_comp[n].x+7) >> 3;
   2519          int h = (z->img_comp[n].y+7) >> 3;
   2520          for (j=0; j < h; ++j) {
   2521             for (i=0; i < w; ++i) {
   2522                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2523                if (z->spec_start == 0) {
   2524                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2525                      return 0;
   2526                } else {
   2527                   int ha = z->img_comp[n].ha;
   2528                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   2529                      return 0;
   2530                }
   2531                // every data block is an MCU, so countdown the restart interval
   2532                if (--z->todo <= 0) {
   2533                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2534                   if (!STBI__RESTART(z->marker)) return 1;
   2535                   stbi__jpeg_reset(z);
   2536                }
   2537             }
   2538          }
   2539          return 1;
   2540       } else { // interleaved
   2541          int i,j,k,x,y;
   2542          for (j=0; j < z->img_mcu_y; ++j) {
   2543             for (i=0; i < z->img_mcu_x; ++i) {
   2544                // scan an interleaved mcu... process scan_n components in order
   2545                for (k=0; k < z->scan_n; ++k) {
   2546                   int n = z->order[k];
   2547                   // scan out an mcu's worth of this component; that's just determined
   2548                   // by the basic H and V specified for the component
   2549                   for (y=0; y < z->img_comp[n].v; ++y) {
   2550                      for (x=0; x < z->img_comp[n].h; ++x) {
   2551                         int x2 = (i*z->img_comp[n].h + x);
   2552                         int y2 = (j*z->img_comp[n].v + y);
   2553                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   2554                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2555                            return 0;
   2556                      }
   2557                   }
   2558                }
   2559                // after all interleaved components, that's an interleaved MCU,
   2560                // so now count down the restart interval
   2561                if (--z->todo <= 0) {
   2562                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2563                   if (!STBI__RESTART(z->marker)) return 1;
   2564                   stbi__jpeg_reset(z);
   2565                }
   2566             }
   2567          }
   2568          return 1;
   2569       }
   2570    }
   2571 }
   2572 
   2573 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
   2574 {
   2575    int i;
   2576    for (i=0; i < 64; ++i)
   2577       data[i] *= dequant[i];
   2578 }
   2579 
   2580 static void stbi__jpeg_finish(stbi__jpeg *z)
   2581 {
   2582    if (z->progressive) {
   2583       // dequantize and idct the data
   2584       int i,j,n;
   2585       for (n=0; n < z->s->img_n; ++n) {
   2586          int w = (z->img_comp[n].x+7) >> 3;
   2587          int h = (z->img_comp[n].y+7) >> 3;
   2588          for (j=0; j < h; ++j) {
   2589             for (i=0; i < w; ++i) {
   2590                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2591                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   2592                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2593             }
   2594          }
   2595       }
   2596    }
   2597 }
   2598 
   2599 static int stbi__process_marker(stbi__jpeg *z, int m)
   2600 {
   2601    int L;
   2602    switch (m) {
   2603       case STBI__MARKER_none: // no marker found
   2604          return stbi__err("expected marker","Corrupt JPEG");
   2605 
   2606       case 0xDD: // DRI - specify restart interval
   2607          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   2608          z->restart_interval = stbi__get16be(z->s);
   2609          return 1;
   2610 
   2611       case 0xDB: // DQT - define quantization table
   2612          L = stbi__get16be(z->s)-2;
   2613          while (L > 0) {
   2614             int q = stbi__get8(z->s);
   2615             int p = q >> 4;
   2616             int t = q & 15,i;
   2617             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
   2618             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   2619             for (i=0; i < 64; ++i)
   2620                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
   2621             L -= 65;
   2622          }
   2623          return L==0;
   2624 
   2625       case 0xC4: // DHT - define huffman table
   2626          L = stbi__get16be(z->s)-2;
   2627          while (L > 0) {
   2628             stbi_uc *v;
   2629             int sizes[16],i,n=0;
   2630             int q = stbi__get8(z->s);
   2631             int tc = q >> 4;
   2632             int th = q & 15;
   2633             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   2634             for (i=0; i < 16; ++i) {
   2635                sizes[i] = stbi__get8(z->s);
   2636                n += sizes[i];
   2637             }
   2638             L -= 17;
   2639             if (tc == 0) {
   2640                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   2641                v = z->huff_dc[th].values;
   2642             } else {
   2643                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   2644                v = z->huff_ac[th].values;
   2645             }
   2646             for (i=0; i < n; ++i)
   2647                v[i] = stbi__get8(z->s);
   2648             if (tc != 0)
   2649                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   2650             L -= n;
   2651          }
   2652          return L==0;
   2653    }
   2654    // check for comment block or APP blocks
   2655    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   2656       stbi__skip(z->s, stbi__get16be(z->s)-2);
   2657       return 1;
   2658    }
   2659    return 0;
   2660 }
   2661 
   2662 // after we see SOS
   2663 static int stbi__process_scan_header(stbi__jpeg *z)
   2664 {
   2665    int i;
   2666    int Ls = stbi__get16be(z->s);
   2667    z->scan_n = stbi__get8(z->s);
   2668    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   2669    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   2670    for (i=0; i < z->scan_n; ++i) {
   2671       int id = stbi__get8(z->s), which;
   2672       int q = stbi__get8(z->s);
   2673       for (which = 0; which < z->s->img_n; ++which)
   2674          if (z->img_comp[which].id == id)
   2675             break;
   2676       if (which == z->s->img_n) return 0; // no match
   2677       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   2678       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   2679       z->order[i] = which;
   2680    }
   2681 
   2682    {
   2683       int aa;
   2684       z->spec_start = stbi__get8(z->s);
   2685       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   2686       aa = stbi__get8(z->s);
   2687       z->succ_high = (aa >> 4);
   2688       z->succ_low  = (aa & 15);
   2689       if (z->progressive) {
   2690          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   2691             return stbi__err("bad SOS", "Corrupt JPEG");
   2692       } else {
   2693          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2694          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2695          z->spec_end = 63;
   2696       }
   2697    }
   2698 
   2699    return 1;
   2700 }
   2701 
   2702 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   2703 {
   2704    stbi__context *s = z->s;
   2705    int Lf,p,i,q, h_max=1,v_max=1,c;
   2706    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   2707    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   2708    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   2709    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   2710    c = stbi__get8(s);
   2711    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
   2712    s->img_n = c;
   2713    for (i=0; i < c; ++i) {
   2714       z->img_comp[i].data = NULL;
   2715       z->img_comp[i].linebuf = NULL;
   2716    }
   2717 
   2718    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   2719 
   2720    for (i=0; i < s->img_n; ++i) {
   2721       z->img_comp[i].id = stbi__get8(s);
   2722       if (z->img_comp[i].id != i+1)   // JFIF requires
   2723          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
   2724             return stbi__err("bad component ID","Corrupt JPEG");
   2725       q = stbi__get8(s);
   2726       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   2727       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   2728       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   2729    }
   2730 
   2731    if (scan != STBI__SCAN_load) return 1;
   2732 
   2733    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   2734 
   2735    for (i=0; i < s->img_n; ++i) {
   2736       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   2737       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   2738    }
   2739 
   2740    // compute interleaved mcu info
   2741    z->img_h_max = h_max;
   2742    z->img_v_max = v_max;
   2743    z->img_mcu_w = h_max * 8;
   2744    z->img_mcu_h = v_max * 8;
   2745    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   2746    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   2747 
   2748    for (i=0; i < s->img_n; ++i) {
   2749       // number of effective pixels (e.g. for non-interleaved MCU)
   2750       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   2751       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   2752       // to simplify generation, we'll allocate enough memory to decode
   2753       // the bogus oversized data from using interleaved MCUs and their
   2754       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   2755       // discard the extra data until colorspace conversion
   2756       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   2757       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   2758       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
   2759 
   2760       if (z->img_comp[i].raw_data == NULL) {
   2761          for(--i; i >= 0; --i) {
   2762             STBI_FREE(z->img_comp[i].raw_data);
   2763             z->img_comp[i].raw_data = NULL;
   2764          }
   2765          return stbi__err("outofmem", "Out of memory");
   2766       }
   2767       // align blocks for idct using mmx/sse
   2768       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   2769       z->img_comp[i].linebuf = NULL;
   2770       if (z->progressive) {
   2771          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
   2772          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
   2773          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
   2774          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   2775       } else {
   2776          z->img_comp[i].coeff = 0;
   2777          z->img_comp[i].raw_coeff = 0;
   2778       }
   2779    }
   2780 
   2781    return 1;
   2782 }
   2783 
   2784 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   2785 #define stbi__DNL(x)         ((x) == 0xdc)
   2786 #define stbi__SOI(x)         ((x) == 0xd8)
   2787 #define stbi__EOI(x)         ((x) == 0xd9)
   2788 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   2789 #define stbi__SOS(x)         ((x) == 0xda)
   2790 
   2791 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   2792 
   2793 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   2794 {
   2795    int m;
   2796    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   2797    m = stbi__get_marker(z);
   2798    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   2799    if (scan == STBI__SCAN_type) return 1;
   2800    m = stbi__get_marker(z);
   2801    while (!stbi__SOF(m)) {
   2802       if (!stbi__process_marker(z,m)) return 0;
   2803       m = stbi__get_marker(z);
   2804       while (m == STBI__MARKER_none) {
   2805          // some files have extra padding after their blocks, so ok, we'll scan
   2806          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   2807          m = stbi__get_marker(z);
   2808       }
   2809    }
   2810    z->progressive = stbi__SOF_progressive(m);
   2811    if (!stbi__process_frame_header(z, scan)) return 0;
   2812    return 1;
   2813 }
   2814 
   2815 // decode image to YCbCr format
   2816 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   2817 {
   2818    int m;
   2819    for (m = 0; m < 4; m++) {
   2820       j->img_comp[m].raw_data = NULL;
   2821       j->img_comp[m].raw_coeff = NULL;
   2822    }
   2823    j->restart_interval = 0;
   2824    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   2825    m = stbi__get_marker(j);
   2826    while (!stbi__EOI(m)) {
   2827       if (stbi__SOS(m)) {
   2828          if (!stbi__process_scan_header(j)) return 0;
   2829          if (!stbi__parse_entropy_coded_data(j)) return 0;
   2830          if (j->marker == STBI__MARKER_none ) {
   2831             // handle 0s at the end of image data from IP Kamera 9060
   2832             while (!stbi__at_eof(j->s)) {
   2833                int x = stbi__get8(j->s);
   2834                if (x == 255) {
   2835                   j->marker = stbi__get8(j->s);
   2836                   break;
   2837                } else if (x != 0) {
   2838                   return stbi__err("junk before marker", "Corrupt JPEG");
   2839                }
   2840             }
   2841             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   2842          }
   2843       } else {
   2844          if (!stbi__process_marker(j, m)) return 0;
   2845       }
   2846       m = stbi__get_marker(j);
   2847    }
   2848    if (j->progressive)
   2849       stbi__jpeg_finish(j);
   2850    return 1;
   2851 }
   2852 
   2853 // static jfif-centered resampling (across block boundaries)
   2854 
   2855 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   2856                                     int w, int hs);
   2857 
   2858 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   2859 
   2860 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2861 {
   2862    STBI_NOTUSED(out);
   2863    STBI_NOTUSED(in_far);
   2864    STBI_NOTUSED(w);
   2865    STBI_NOTUSED(hs);
   2866    return in_near;
   2867 }
   2868 
   2869 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2870 {
   2871    // need to generate two samples vertically for every one in input
   2872    int i;
   2873    STBI_NOTUSED(hs);
   2874    for (i=0; i < w; ++i)
   2875       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   2876    return out;
   2877 }
   2878 
   2879 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2880 {
   2881    // need to generate two samples horizontally for every one in input
   2882    int i;
   2883    stbi_uc *input = in_near;
   2884 
   2885    if (w == 1) {
   2886       // if only one sample, can't do any interpolation
   2887       out[0] = out[1] = input[0];
   2888       return out;
   2889    }
   2890 
   2891    out[0] = input[0];
   2892    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   2893    for (i=1; i < w-1; ++i) {
   2894       int n = 3*input[i]+2;
   2895       out[i*2+0] = stbi__div4(n+input[i-1]);
   2896       out[i*2+1] = stbi__div4(n+input[i+1]);
   2897    }
   2898    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   2899    out[i*2+1] = input[w-1];
   2900 
   2901    STBI_NOTUSED(in_far);
   2902    STBI_NOTUSED(hs);
   2903 
   2904    return out;
   2905 }
   2906 
   2907 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   2908 
   2909 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2910 {
   2911    // need to generate 2x2 samples for every one in input
   2912    int i,t0,t1;
   2913    if (w == 1) {
   2914       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   2915       return out;
   2916    }
   2917 
   2918    t1 = 3*in_near[0] + in_far[0];
   2919    out[0] = stbi__div4(t1+2);
   2920    for (i=1; i < w; ++i) {
   2921       t0 = t1;
   2922       t1 = 3*in_near[i]+in_far[i];
   2923       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   2924       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   2925    }
   2926    out[w*2-1] = stbi__div4(t1+2);
   2927 
   2928    STBI_NOTUSED(hs);
   2929 
   2930    return out;
   2931 }
   2932 
   2933 #if defined(STBI_SSE2) || defined(STBI_NEON)
   2934 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2935 {
   2936    // need to generate 2x2 samples for every one in input
   2937    int i=0,t0,t1;
   2938 
   2939    if (w == 1) {
   2940       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   2941       return out;
   2942    }
   2943 
   2944    t1 = 3*in_near[0] + in_far[0];
   2945    // process groups of 8 pixels for as long as we can.
   2946    // note we can't handle the last pixel in a row in this loop
   2947    // because we need to handle the filter boundary conditions.
   2948    for (; i < ((w-1) & ~7); i += 8) {
   2949 #if defined(STBI_SSE2)
   2950       // load and perform the vertical filtering pass
   2951       // this uses 3*x + y = 4*x + (y - x)
   2952       __m128i zero  = _mm_setzero_si128();
   2953       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   2954       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   2955       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   2956       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   2957       __m128i diff  = _mm_sub_epi16(farw, nearw);
   2958       __m128i nears = _mm_slli_epi16(nearw, 2);
   2959       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   2960 
   2961       // horizontal filter works the same based on shifted vers of current
   2962       // row. "prev" is current row shifted right by 1 pixel; we need to
   2963       // insert the previous pixel value (from t1).
   2964       // "next" is current row shifted left by 1 pixel, with first pixel
   2965       // of next block of 8 pixels added in.
   2966       __m128i prv0 = _mm_slli_si128(curr, 2);
   2967       __m128i nxt0 = _mm_srli_si128(curr, 2);
   2968       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   2969       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   2970 
   2971       // horizontal filter, polyphase implementation since it's convenient:
   2972       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   2973       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   2974       // note the shared term.
   2975       __m128i bias  = _mm_set1_epi16(8);
   2976       __m128i curs = _mm_slli_epi16(curr, 2);
   2977       __m128i prvd = _mm_sub_epi16(prev, curr);
   2978       __m128i nxtd = _mm_sub_epi16(next, curr);
   2979       __m128i curb = _mm_add_epi16(curs, bias);
   2980       __m128i even = _mm_add_epi16(prvd, curb);
   2981       __m128i odd  = _mm_add_epi16(nxtd, curb);
   2982 
   2983       // interleave even and odd pixels, then undo scaling.
   2984       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   2985       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   2986       __m128i de0  = _mm_srli_epi16(int0, 4);
   2987       __m128i de1  = _mm_srli_epi16(int1, 4);
   2988 
   2989       // pack and write output
   2990       __m128i outv = _mm_packus_epi16(de0, de1);
   2991       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   2992 #elif defined(STBI_NEON)
   2993       // load and perform the vertical filtering pass
   2994       // this uses 3*x + y = 4*x + (y - x)
   2995       uint8x8_t farb  = vld1_u8(in_far + i);
   2996       uint8x8_t nearb = vld1_u8(in_near + i);
   2997       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   2998       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   2999       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3000 
   3001       // horizontal filter works the same based on shifted vers of current
   3002       // row. "prev" is current row shifted right by 1 pixel; we need to
   3003       // insert the previous pixel value (from t1).
   3004       // "next" is current row shifted left by 1 pixel, with first pixel
   3005       // of next block of 8 pixels added in.
   3006       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3007       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3008       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3009       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3010 
   3011       // horizontal filter, polyphase implementation since it's convenient:
   3012       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3013       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3014       // note the shared term.
   3015       int16x8_t curs = vshlq_n_s16(curr, 2);
   3016       int16x8_t prvd = vsubq_s16(prev, curr);
   3017       int16x8_t nxtd = vsubq_s16(next, curr);
   3018       int16x8_t even = vaddq_s16(curs, prvd);
   3019       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3020 
   3021       // undo scaling and round, then store with even/odd phases interleaved
   3022       uint8x8x2_t o;
   3023       o.val[0] = vqrshrun_n_s16(even, 4);
   3024       o.val[1] = vqrshrun_n_s16(odd,  4);
   3025       vst2_u8(out + i*2, o);
   3026 #endif
   3027 
   3028       // "previous" value for next iter
   3029       t1 = 3*in_near[i+7] + in_far[i+7];
   3030    }
   3031 
   3032    t0 = t1;
   3033    t1 = 3*in_near[i] + in_far[i];
   3034    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3035 
   3036    for (++i; i < w; ++i) {
   3037       t0 = t1;
   3038       t1 = 3*in_near[i]+in_far[i];
   3039       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3040       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3041    }
   3042    out[w*2-1] = stbi__div4(t1+2);
   3043 
   3044    STBI_NOTUSED(hs);
   3045 
   3046    return out;
   3047 }
   3048 #endif
   3049 
   3050 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3051 {
   3052    // resample with nearest-neighbor
   3053    int i,j;
   3054    STBI_NOTUSED(in_far);
   3055    for (i=0; i < w; ++i)
   3056       for (j=0; j < hs; ++j)
   3057          out[i*hs+j] = in_near[i];
   3058    return out;
   3059 }
   3060 
   3061 #ifdef STBI_JPEG_OLD
   3062 // this is the same YCbCr-to-RGB calculation that stb_image has used
   3063 // historically before the algorithm changes in 1.49
   3064 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
   3065 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3066 {
   3067    int i;
   3068    for (i=0; i < count; ++i) {
   3069       int y_fixed = (y[i] << 16) + 32768; // rounding
   3070       int r,g,b;
   3071       int cr = pcr[i] - 128;
   3072       int cb = pcb[i] - 128;
   3073       r = y_fixed + cr*float2fixed(1.40200f);
   3074       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
   3075       b = y_fixed                            + cb*float2fixed(1.77200f);
   3076       r >>= 16;
   3077       g >>= 16;
   3078       b >>= 16;
   3079       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3080       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3081       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3082       out[0] = (stbi_uc)r;
   3083       out[1] = (stbi_uc)g;
   3084       out[2] = (stbi_uc)b;
   3085       out[3] = 255;
   3086       out += step;
   3087    }
   3088 }
   3089 #else
   3090 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3091 // to make sure the code produces the same results in both SIMD and scalar
   3092 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3093 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3094 {
   3095    int i;
   3096    for (i=0; i < count; ++i) {
   3097       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3098       int r,g,b;
   3099       int cr = pcr[i] - 128;
   3100       int cb = pcb[i] - 128;
   3101       r = y_fixed +  cr* float2fixed(1.40200f);
   3102       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
   3103       b = y_fixed                               +   cb* float2fixed(1.77200f);
   3104       r >>= 20;
   3105       g >>= 20;
   3106       b >>= 20;
   3107       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3108       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3109       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3110       out[0] = (stbi_uc)r;
   3111       out[1] = (stbi_uc)g;
   3112       out[2] = (stbi_uc)b;
   3113       out[3] = 255;
   3114       out += step;
   3115    }
   3116 }
   3117 #endif
   3118 
   3119 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3120 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3121 {
   3122    int i = 0;
   3123 
   3124 #ifdef STBI_SSE2
   3125    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3126    // it's useful in practice (you wouldn't use it for textures, for example).
   3127    // so just accelerate step == 4 case.
   3128    if (step == 4) {
   3129       // this is a fairly straightforward implementation and not super-optimized.
   3130       __m128i signflip  = _mm_set1_epi8(-0x80);
   3131       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3132       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3133       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3134       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3135       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3136       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3137 
   3138       for (; i+7 < count; i += 8) {
   3139          // load
   3140          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3141          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3142          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3143          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3144          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3145 
   3146          // unpack to short (and left-shift cr, cb by 8)
   3147          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3148          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3149          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3150 
   3151          // color transform
   3152          __m128i yws = _mm_srli_epi16(yw, 4);
   3153          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3154          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3155          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3156          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3157          __m128i rws = _mm_add_epi16(cr0, yws);
   3158          __m128i gwt = _mm_add_epi16(cb0, yws);
   3159          __m128i bws = _mm_add_epi16(yws, cb1);
   3160          __m128i gws = _mm_add_epi16(gwt, cr1);
   3161 
   3162          // descale
   3163          __m128i rw = _mm_srai_epi16(rws, 4);
   3164          __m128i bw = _mm_srai_epi16(bws, 4);
   3165          __m128i gw = _mm_srai_epi16(gws, 4);
   3166 
   3167          // back to byte, set up for transpose
   3168          __m128i brb = _mm_packus_epi16(rw, bw);
   3169          __m128i gxb = _mm_packus_epi16(gw, xw);
   3170 
   3171          // transpose to interleave channels
   3172          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3173          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3174          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3175          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3176 
   3177          // store
   3178          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3179          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3180          out += 32;
   3181       }
   3182    }
   3183 #endif
   3184 
   3185 #ifdef STBI_NEON
   3186    // in this version, step=3 support would be easy to add. but is there demand?
   3187    if (step == 4) {
   3188       // this is a fairly straightforward implementation and not super-optimized.
   3189       uint8x8_t signflip = vdup_n_u8(0x80);
   3190       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3191       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3192       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3193       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3194 
   3195       for (; i+7 < count; i += 8) {
   3196          // load
   3197          uint8x8_t y_bytes  = vld1_u8(y + i);
   3198          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3199          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3200          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3201          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3202 
   3203          // expand to s16
   3204          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3205          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3206          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3207 
   3208          // color transform
   3209          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3210          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3211          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3212          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3213          int16x8_t rws = vaddq_s16(yws, cr0);
   3214          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3215          int16x8_t bws = vaddq_s16(yws, cb1);
   3216 
   3217          // undo scaling, round, convert to byte
   3218          uint8x8x4_t o;
   3219          o.val[0] = vqrshrun_n_s16(rws, 4);
   3220          o.val[1] = vqrshrun_n_s16(gws, 4);
   3221          o.val[2] = vqrshrun_n_s16(bws, 4);
   3222          o.val[3] = vdup_n_u8(255);
   3223 
   3224          // store, interleaving r/g/b/a
   3225          vst4_u8(out, o);
   3226          out += 8*4;
   3227       }
   3228    }
   3229 #endif
   3230 
   3231    for (; i < count; ++i) {
   3232       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3233       int r,g,b;
   3234       int cr = pcr[i] - 128;
   3235       int cb = pcb[i] - 128;
   3236       r = y_fixed + cr* float2fixed(1.40200f);
   3237       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
   3238       b = y_fixed                             +   cb* float2fixed(1.77200f);
   3239       r >>= 20;
   3240       g >>= 20;
   3241       b >>= 20;
   3242       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3243       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3244       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3245       out[0] = (stbi_uc)r;
   3246       out[1] = (stbi_uc)g;
   3247       out[2] = (stbi_uc)b;
   3248       out[3] = 255;
   3249       out += step;
   3250    }
   3251 }
   3252 #endif
   3253 
   3254 // set up the kernels
   3255 static void stbi__setup_jpeg(stbi__jpeg *j)
   3256 {
   3257    j->idct_block_kernel = stbi__idct_block;
   3258    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3259    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3260 
   3261 #ifdef STBI_SSE2
   3262    if (stbi__sse2_available()) {
   3263       j->idct_block_kernel = stbi__idct_simd;
   3264       #ifndef STBI_JPEG_OLD
   3265       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3266       #endif
   3267       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3268    }
   3269 #endif
   3270 
   3271 #ifdef STBI_NEON
   3272    j->idct_block_kernel = stbi__idct_simd;
   3273    #ifndef STBI_JPEG_OLD
   3274    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3275    #endif
   3276    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3277 #endif
   3278 }
   3279 
   3280 // clean up the temporary component buffers
   3281 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3282 {
   3283    int i;
   3284    for (i=0; i < j->s->img_n; ++i) {
   3285       if (j->img_comp[i].raw_data) {
   3286          STBI_FREE(j->img_comp[i].raw_data);
   3287          j->img_comp[i].raw_data = NULL;
   3288          j->img_comp[i].data = NULL;
   3289       }
   3290       if (j->img_comp[i].raw_coeff) {
   3291          STBI_FREE(j->img_comp[i].raw_coeff);
   3292          j->img_comp[i].raw_coeff = 0;
   3293          j->img_comp[i].coeff = 0;
   3294       }
   3295       if (j->img_comp[i].linebuf) {
   3296          STBI_FREE(j->img_comp[i].linebuf);
   3297          j->img_comp[i].linebuf = NULL;
   3298       }
   3299    }
   3300 }
   3301 
   3302 typedef struct
   3303 {
   3304    resample_row_func resample;
   3305    stbi_uc *line0,*line1;
   3306    int hs,vs;   // expansion factor in each axis
   3307    int w_lores; // horizontal pixels pre-expansion
   3308    int ystep;   // how far through vertical expansion we are
   3309    int ypos;    // which pre-expansion row we're on
   3310 } stbi__resample;
   3311 
   3312 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3313 {
   3314    int n, decode_n;
   3315    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3316 
   3317    // validate req_comp
   3318    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3319 
   3320    // load a jpeg image from whichever source, but leave in YCbCr format
   3321    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3322 
   3323    // determine actual number of components to generate
   3324    n = req_comp ? req_comp : z->s->img_n;
   3325 
   3326    if (z->s->img_n == 3 && n < 3)
   3327       decode_n = 1;
   3328    else
   3329       decode_n = z->s->img_n;
   3330 
   3331    // resample and color-convert
   3332    {
   3333       int k;
   3334       unsigned int i,j;
   3335       stbi_uc *output;
   3336       stbi_uc *coutput[4];
   3337 
   3338       stbi__resample res_comp[4];
   3339 
   3340       for (k=0; k < decode_n; ++k) {
   3341          stbi__resample *r = &res_comp[k];
   3342 
   3343          // allocate line buffer big enough for upsampling off the edges
   3344          // with upsample factor of 4
   3345          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3346          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3347 
   3348          r->hs      = z->img_h_max / z->img_comp[k].h;
   3349          r->vs      = z->img_v_max / z->img_comp[k].v;
   3350          r->ystep   = r->vs >> 1;
   3351          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3352          r->ypos    = 0;
   3353          r->line0   = r->line1 = z->img_comp[k].data;
   3354 
   3355          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3356          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3357          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3358          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3359          else                               r->resample = stbi__resample_row_generic;
   3360       }
   3361 
   3362       // can't error after this so, this is safe
   3363       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
   3364       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3365 
   3366       // now go ahead and resample
   3367       for (j=0; j < z->s->img_y; ++j) {
   3368          stbi_uc *out = output + n * z->s->img_x * j;
   3369          for (k=0; k < decode_n; ++k) {
   3370             stbi__resample *r = &res_comp[k];
   3371             int y_bot = r->ystep >= (r->vs >> 1);
   3372             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3373                                      y_bot ? r->line1 : r->line0,
   3374                                      y_bot ? r->line0 : r->line1,
   3375                                      r->w_lores, r->hs);
   3376             if (++r->ystep >= r->vs) {
   3377                r->ystep = 0;
   3378                r->line0 = r->line1;
   3379                if (++r->ypos < z->img_comp[k].y)
   3380                   r->line1 += z->img_comp[k].w2;
   3381             }
   3382          }
   3383          if (n >= 3) {
   3384             stbi_uc *y = coutput[0];
   3385             if (z->s->img_n == 3) {
   3386                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3387             } else
   3388                for (i=0; i < z->s->img_x; ++i) {
   3389                   out[0] = out[1] = out[2] = y[i];
   3390                   out[3] = 255; // not used if n==3
   3391                   out += n;
   3392                }
   3393          } else {
   3394             stbi_uc *y = coutput[0];
   3395             if (n == 1)
   3396                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   3397             else
   3398                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
   3399          }
   3400       }
   3401       stbi__cleanup_jpeg(z);
   3402       *out_x = z->s->img_x;
   3403       *out_y = z->s->img_y;
   3404       if (comp) *comp  = z->s->img_n; // report original components, not output
   3405       return output;
   3406    }
   3407 }
   3408 
   3409 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   3410 {
   3411    stbi__jpeg j;
   3412    j.s = s;
   3413    stbi__setup_jpeg(&j);
   3414    return load_jpeg_image(&j, x,y,comp,req_comp);
   3415 }
   3416 
   3417 static int stbi__jpeg_test(stbi__context *s)
   3418 {
   3419    int r;
   3420    stbi__jpeg j;
   3421    j.s = s;
   3422    stbi__setup_jpeg(&j);
   3423    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
   3424    stbi__rewind(s);
   3425    return r;
   3426 }
   3427 
   3428 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   3429 {
   3430    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   3431       stbi__rewind( j->s );
   3432       return 0;
   3433    }
   3434    if (x) *x = j->s->img_x;
   3435    if (y) *y = j->s->img_y;
   3436    if (comp) *comp = j->s->img_n;
   3437    return 1;
   3438 }
   3439 
   3440 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   3441 {
   3442    stbi__jpeg j;
   3443    j.s = s;
   3444    return stbi__jpeg_info_raw(&j, x, y, comp);
   3445 }
   3446 #endif
   3447 
   3448 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   3449 //    simple implementation
   3450 //      - all input must be provided in an upfront buffer
   3451 //      - all output is written to a single output buffer (can malloc/realloc)
   3452 //    performance
   3453 //      - fast huffman
   3454 
   3455 #ifndef STBI_NO_ZLIB
   3456 
   3457 // fast-way is faster to check than jpeg huffman, but slow way is slower
   3458 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   3459 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   3460 
   3461 // zlib-style huffman encoding
   3462 // (jpegs packs from left, zlib from right, so can't share code)
   3463 typedef struct
   3464 {
   3465    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   3466    stbi__uint16 firstcode[16];
   3467    int maxcode[17];
   3468    stbi__uint16 firstsymbol[16];
   3469    stbi_uc  size[288];
   3470    stbi__uint16 value[288];
   3471 } stbi__zhuffman;
   3472 
   3473 stbi_inline static int stbi__bitreverse16(int n)
   3474 {
   3475   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   3476   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   3477   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   3478   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   3479   return n;
   3480 }
   3481 
   3482 stbi_inline static int stbi__bit_reverse(int v, int bits)
   3483 {
   3484    STBI_ASSERT(bits <= 16);
   3485    // to bit reverse n bits, reverse 16 and shift
   3486    // e.g. 11 bits, bit reverse and shift away 5
   3487    return stbi__bitreverse16(v) >> (16-bits);
   3488 }
   3489 
   3490 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
   3491 {
   3492    int i,k=0;
   3493    int code, next_code[16], sizes[17];
   3494 
   3495    // DEFLATE spec for generating codes
   3496    memset(sizes, 0, sizeof(sizes));
   3497    memset(z->fast, 0, sizeof(z->fast));
   3498    for (i=0; i < num; ++i)
   3499       ++sizes[sizelist[i]];
   3500    sizes[0] = 0;
   3501    for (i=1; i < 16; ++i)
   3502       if (sizes[i] > (1 << i))
   3503          return stbi__err("bad sizes", "Corrupt PNG");
   3504    code = 0;
   3505    for (i=1; i < 16; ++i) {
   3506       next_code[i] = code;
   3507       z->firstcode[i] = (stbi__uint16) code;
   3508       z->firstsymbol[i] = (stbi__uint16) k;
   3509       code = (code + sizes[i]);
   3510       if (sizes[i])
   3511          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   3512       z->maxcode[i] = code << (16-i); // preshift for inner loop
   3513       code <<= 1;
   3514       k += sizes[i];
   3515    }
   3516    z->maxcode[16] = 0x10000; // sentinel
   3517    for (i=0; i < num; ++i) {
   3518       int s = sizelist[i];
   3519       if (s) {
   3520          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   3521          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   3522          z->size [c] = (stbi_uc     ) s;
   3523          z->value[c] = (stbi__uint16) i;
   3524          if (s <= STBI__ZFAST_BITS) {
   3525             int j = stbi__bit_reverse(next_code[s],s);
   3526             while (j < (1 << STBI__ZFAST_BITS)) {
   3527                z->fast[j] = fastv;
   3528                j += (1 << s);
   3529             }
   3530          }
   3531          ++next_code[s];
   3532       }
   3533    }
   3534    return 1;
   3535 }
   3536 
   3537 // zlib-from-memory implementation for PNG reading
   3538 //    because PNG allows splitting the zlib stream arbitrarily,
   3539 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   3540 //    we require PNG read all the IDATs and combine them into a single
   3541 //    memory buffer
   3542 
   3543 typedef struct
   3544 {
   3545    stbi_uc *zbuffer, *zbuffer_end;
   3546    int num_bits;
   3547    stbi__uint32 code_buffer;
   3548 
   3549    char *zout;
   3550    char *zout_start;
   3551    char *zout_end;
   3552    int   z_expandable;
   3553 
   3554    stbi__zhuffman z_length, z_distance;
   3555 } stbi__zbuf;
   3556 
   3557 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   3558 {
   3559    if (z->zbuffer >= z->zbuffer_end) return 0;
   3560    return *z->zbuffer++;
   3561 }
   3562 
   3563 static void stbi__fill_bits(stbi__zbuf *z)
   3564 {
   3565    do {
   3566       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
   3567       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   3568       z->num_bits += 8;
   3569    } while (z->num_bits <= 24);
   3570 }
   3571 
   3572 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   3573 {
   3574    unsigned int k;
   3575    if (z->num_bits < n) stbi__fill_bits(z);
   3576    k = z->code_buffer & ((1 << n) - 1);
   3577    z->code_buffer >>= n;
   3578    z->num_bits -= n;
   3579    return k;
   3580 }
   3581 
   3582 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   3583 {
   3584    int b,s,k;
   3585    // not resolved by fast table, so compute it the slow way
   3586    // use jpeg approach, which requires MSbits at top
   3587    k = stbi__bit_reverse(a->code_buffer, 16);
   3588    for (s=STBI__ZFAST_BITS+1; ; ++s)
   3589       if (k < z->maxcode[s])
   3590          break;
   3591    if (s == 16) return -1; // invalid code!
   3592    // code size is s, so:
   3593    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   3594    STBI_ASSERT(z->size[b] == s);
   3595    a->code_buffer >>= s;
   3596    a->num_bits -= s;
   3597    return z->value[b];
   3598 }
   3599 
   3600 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   3601 {
   3602    int b,s;
   3603    if (a->num_bits < 16) stbi__fill_bits(a);
   3604    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   3605    if (b) {
   3606       s = b >> 9;
   3607       a->code_buffer >>= s;
   3608       a->num_bits -= s;
   3609       return b & 511;
   3610    }
   3611    return stbi__zhuffman_decode_slowpath(a, z);
   3612 }
   3613 
   3614 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   3615 {
   3616    char *q;
   3617    int cur, limit, old_limit;
   3618    z->zout = zout;
   3619    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   3620    cur   = (int) (z->zout     - z->zout_start);
   3621    limit = old_limit = (int) (z->zout_end - z->zout_start);
   3622    while (cur + n > limit)
   3623       limit *= 2;
   3624    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
   3625    STBI_NOTUSED(old_limit);
   3626    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   3627    z->zout_start = q;
   3628    z->zout       = q + cur;
   3629    z->zout_end   = q + limit;
   3630    return 1;
   3631 }
   3632 
   3633 static int stbi__zlength_base[31] = {
   3634    3,4,5,6,7,8,9,10,11,13,
   3635    15,17,19,23,27,31,35,43,51,59,
   3636    67,83,99,115,131,163,195,227,258,0,0 };
   3637 
   3638 static int stbi__zlength_extra[31]=
   3639 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   3640 
   3641 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   3642 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   3643 
   3644 static int stbi__zdist_extra[32] =
   3645 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   3646 
   3647 static int stbi__parse_huffman_block(stbi__zbuf *a)
   3648 {
   3649    char *zout = a->zout;
   3650    for(;;) {
   3651       int z = stbi__zhuffman_decode(a, &a->z_length);
   3652       if (z < 256) {
   3653          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   3654          if (zout >= a->zout_end) {
   3655             if (!stbi__zexpand(a, zout, 1)) return 0;
   3656             zout = a->zout;
   3657          }
   3658          *zout++ = (char) z;
   3659       } else {
   3660          stbi_uc *p;
   3661          int len,dist;
   3662          if (z == 256) {
   3663             a->zout = zout;
   3664             return 1;
   3665          }
   3666          z -= 257;
   3667          len = stbi__zlength_base[z];
   3668          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   3669          z = stbi__zhuffman_decode(a, &a->z_distance);
   3670          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
   3671          dist = stbi__zdist_base[z];
   3672          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   3673          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   3674          if (zout + len > a->zout_end) {
   3675             if (!stbi__zexpand(a, zout, len)) return 0;
   3676             zout = a->zout;
   3677          }
   3678          p = (stbi_uc *) (zout - dist);
   3679          if (dist == 1) { // run of one byte; common in images.
   3680             stbi_uc v = *p;
   3681             if (len) { do *zout++ = v; while (--len); }
   3682          } else {
   3683             if (len) { do *zout++ = *p++; while (--len); }
   3684          }
   3685       }
   3686    }
   3687 }
   3688 
   3689 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   3690 {
   3691    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   3692    stbi__zhuffman z_codelength;
   3693    stbi_uc lencodes[286+32+137];//padding for maximum single op
   3694    stbi_uc codelength_sizes[19];
   3695    int i,n;
   3696 
   3697    int hlit  = stbi__zreceive(a,5) + 257;
   3698    int hdist = stbi__zreceive(a,5) + 1;
   3699    int hclen = stbi__zreceive(a,4) + 4;
   3700 
   3701    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   3702    for (i=0; i < hclen; ++i) {
   3703       int s = stbi__zreceive(a,3);
   3704       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   3705    }
   3706    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   3707 
   3708    n = 0;
   3709    while (n < hlit + hdist) {
   3710       int c = stbi__zhuffman_decode(a, &z_codelength);
   3711       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   3712       if (c < 16)
   3713          lencodes[n++] = (stbi_uc) c;
   3714       else if (c == 16) {
   3715          c = stbi__zreceive(a,2)+3;
   3716          memset(lencodes+n, lencodes[n-1], c);
   3717          n += c;
   3718       } else if (c == 17) {
   3719          c = stbi__zreceive(a,3)+3;
   3720          memset(lencodes+n, 0, c);
   3721          n += c;
   3722       } else {
   3723          STBI_ASSERT(c == 18);
   3724          c = stbi__zreceive(a,7)+11;
   3725          memset(lencodes+n, 0, c);
   3726          n += c;
   3727       }
   3728    }
   3729    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
   3730    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   3731    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   3732    return 1;
   3733 }
   3734 
   3735 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
   3736 {
   3737    stbi_uc header[4];
   3738    int len,nlen,k;
   3739    if (a->num_bits & 7)
   3740       stbi__zreceive(a, a->num_bits & 7); // discard
   3741    // drain the bit-packed data into header
   3742    k = 0;
   3743    while (a->num_bits > 0) {
   3744       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   3745       a->code_buffer >>= 8;
   3746       a->num_bits -= 8;
   3747    }
   3748    STBI_ASSERT(a->num_bits == 0);
   3749    // now fill header the normal way
   3750    while (k < 4)
   3751       header[k++] = stbi__zget8(a);
   3752    len  = header[1] * 256 + header[0];
   3753    nlen = header[3] * 256 + header[2];
   3754    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   3755    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   3756    if (a->zout + len > a->zout_end)
   3757       if (!stbi__zexpand(a, a->zout, len)) return 0;
   3758    memcpy(a->zout, a->zbuffer, len);
   3759    a->zbuffer += len;
   3760    a->zout += len;
   3761    return 1;
   3762 }
   3763 
   3764 static int stbi__parse_zlib_header(stbi__zbuf *a)
   3765 {
   3766    int cmf   = stbi__zget8(a);
   3767    int cm    = cmf & 15;
   3768    /* int cinfo = cmf >> 4; */
   3769    int flg   = stbi__zget8(a);
   3770    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   3771    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   3772    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   3773    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   3774    return 1;
   3775 }
   3776 
   3777 // @TODO: should statically initialize these for optimal thread safety
   3778 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
   3779 static void stbi__init_zdefaults(void)
   3780 {
   3781    int i;   // use <= to match clearly with spec
   3782    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   3783    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   3784    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   3785    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   3786 
   3787    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   3788 }
   3789 
   3790 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   3791 {
   3792    int final, type;
   3793    if (parse_header)
   3794       if (!stbi__parse_zlib_header(a)) return 0;
   3795    a->num_bits = 0;
   3796    a->code_buffer = 0;
   3797    do {
   3798       final = stbi__zreceive(a,1);
   3799       type = stbi__zreceive(a,2);
   3800       if (type == 0) {
   3801          if (!stbi__parse_uncomperssed_block(a)) return 0;
   3802       } else if (type == 3) {
   3803          return 0;
   3804       } else {
   3805          if (type == 1) {
   3806             // use fixed code lengths
   3807             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
   3808             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
   3809             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   3810          } else {
   3811             if (!stbi__compute_huffman_codes(a)) return 0;
   3812          }
   3813          if (!stbi__parse_huffman_block(a)) return 0;
   3814       }
   3815    } while (!final);
   3816    return 1;
   3817 }
   3818 
   3819 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   3820 {
   3821    a->zout_start = obuf;
   3822    a->zout       = obuf;
   3823    a->zout_end   = obuf + olen;
   3824    a->z_expandable = exp;
   3825 
   3826    return stbi__parse_zlib(a, parse_header);
   3827 }
   3828 
   3829 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   3830 {
   3831    stbi__zbuf a;
   3832    char *p = (char *) stbi__malloc(initial_size);
   3833    if (p == NULL) return NULL;
   3834    a.zbuffer = (stbi_uc *) buffer;
   3835    a.zbuffer_end = (stbi_uc *) buffer + len;
   3836    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   3837       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3838       return a.zout_start;
   3839    } else {
   3840       STBI_FREE(a.zout_start);
   3841       return NULL;
   3842    }
   3843 }
   3844 
   3845 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   3846 {
   3847    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   3848 }
   3849 
   3850 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   3851 {
   3852    stbi__zbuf a;
   3853    char *p = (char *) stbi__malloc(initial_size);
   3854    if (p == NULL) return NULL;
   3855    a.zbuffer = (stbi_uc *) buffer;
   3856    a.zbuffer_end = (stbi_uc *) buffer + len;
   3857    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   3858       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3859       return a.zout_start;
   3860    } else {
   3861       STBI_FREE(a.zout_start);
   3862       return NULL;
   3863    }
   3864 }
   3865 
   3866 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   3867 {
   3868    stbi__zbuf a;
   3869    a.zbuffer = (stbi_uc *) ibuffer;
   3870    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   3871    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   3872       return (int) (a.zout - a.zout_start);
   3873    else
   3874       return -1;
   3875 }
   3876 
   3877 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   3878 {
   3879    stbi__zbuf a;
   3880    char *p = (char *) stbi__malloc(16384);
   3881    if (p == NULL) return NULL;
   3882    a.zbuffer = (stbi_uc *) buffer;
   3883    a.zbuffer_end = (stbi_uc *) buffer+len;
   3884    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   3885       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3886       return a.zout_start;
   3887    } else {
   3888       STBI_FREE(a.zout_start);
   3889       return NULL;
   3890    }
   3891 }
   3892 
   3893 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   3894 {
   3895    stbi__zbuf a;
   3896    a.zbuffer = (stbi_uc *) ibuffer;
   3897    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   3898    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   3899       return (int) (a.zout - a.zout_start);
   3900    else
   3901       return -1;
   3902 }
   3903 #endif
   3904 
   3905 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   3906 //    simple implementation
   3907 //      - only 8-bit samples
   3908 //      - no CRC checking
   3909 //      - allocates lots of intermediate memory
   3910 //        - avoids problem of streaming data between subsystems
   3911 //        - avoids explicit window management
   3912 //    performance
   3913 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   3914 
   3915 #ifndef STBI_NO_PNG
   3916 typedef struct
   3917 {
   3918    stbi__uint32 length;
   3919    stbi__uint32 type;
   3920 } stbi__pngchunk;
   3921 
   3922 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   3923 {
   3924    stbi__pngchunk c;
   3925    c.length = stbi__get32be(s);
   3926    c.type   = stbi__get32be(s);
   3927    return c;
   3928 }
   3929 
   3930 static int stbi__check_png_header(stbi__context *s)
   3931 {
   3932    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   3933    int i;
   3934    for (i=0; i < 8; ++i)
   3935       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   3936    return 1;
   3937 }
   3938 
   3939 typedef struct
   3940 {
   3941    stbi__context *s;
   3942    stbi_uc *idata, *expanded, *out;
   3943 } stbi__png;
   3944 
   3945 
   3946 enum {
   3947    STBI__F_none=0,
   3948    STBI__F_sub=1,
   3949    STBI__F_up=2,
   3950    STBI__F_avg=3,
   3951    STBI__F_paeth=4,
   3952    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   3953    STBI__F_avg_first,
   3954    STBI__F_paeth_first
   3955 };
   3956 
   3957 static stbi_uc first_row_filter[5] =
   3958 {
   3959    STBI__F_none,
   3960    STBI__F_sub,
   3961    STBI__F_none,
   3962    STBI__F_avg_first,
   3963    STBI__F_paeth_first
   3964 };
   3965 
   3966 static int stbi__paeth(int a, int b, int c)
   3967 {
   3968    int p = a + b - c;
   3969    int pa = abs(p-a);
   3970    int pb = abs(p-b);
   3971    int pc = abs(p-c);
   3972    if (pa <= pb && pa <= pc) return a;
   3973    if (pb <= pc) return b;
   3974    return c;
   3975 }
   3976 
   3977 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   3978 
   3979 // create the png data from post-deflated data
   3980 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   3981 {
   3982    stbi__context *s = a->s;
   3983    stbi__uint32 i,j,stride = x*out_n;
   3984    stbi__uint32 img_len, img_width_bytes;
   3985    int k;
   3986    int img_n = s->img_n; // copy it into a local for later
   3987 
   3988    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   3989    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
   3990    if (!a->out) return stbi__err("outofmem", "Out of memory");
   3991 
   3992    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   3993    img_len = (img_width_bytes + 1) * y;
   3994    if (s->img_x == x && s->img_y == y) {
   3995       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
   3996    } else { // interlaced:
   3997       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   3998    }
   3999 
   4000    for (j=0; j < y; ++j) {
   4001       stbi_uc *cur = a->out + stride*j;
   4002       stbi_uc *prior = cur - stride;
   4003       int filter = *raw++;
   4004       int filter_bytes = img_n;
   4005       int width = x;
   4006       if (filter > 4)
   4007          return stbi__err("invalid filter","Corrupt PNG");
   4008 
   4009       if (depth < 8) {
   4010          STBI_ASSERT(img_width_bytes <= x);
   4011          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4012          filter_bytes = 1;
   4013          width = img_width_bytes;
   4014       }
   4015 
   4016       // if first row, use special filter that doesn't sample previous row
   4017       if (j == 0) filter = first_row_filter[filter];
   4018 
   4019       // handle first byte explicitly
   4020       for (k=0; k < filter_bytes; ++k) {
   4021          switch (filter) {
   4022             case STBI__F_none       : cur[k] = raw[k]; break;
   4023             case STBI__F_sub        : cur[k] = raw[k]; break;
   4024             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4025             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4026             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4027             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4028             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4029          }
   4030       }
   4031 
   4032       if (depth == 8) {
   4033          if (img_n != out_n)
   4034             cur[img_n] = 255; // first pixel
   4035          raw += img_n;
   4036          cur += out_n;
   4037          prior += out_n;
   4038       } else {
   4039          raw += 1;
   4040          cur += 1;
   4041          prior += 1;
   4042       }
   4043 
   4044       // this is a little gross, so that we don't switch per-pixel or per-component
   4045       if (depth < 8 || img_n == out_n) {
   4046          int nk = (width - 1)*img_n;
   4047          #define CASE(f) \
   4048              case f:     \
   4049                 for (k=0; k < nk; ++k)
   4050          switch (filter) {
   4051             // "none" filter turns into a memcpy here; make that explicit.
   4052             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4053             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
   4054             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4055             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
   4056             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
   4057             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
   4058             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
   4059          }
   4060          #undef CASE
   4061          raw += nk;
   4062       } else {
   4063          STBI_ASSERT(img_n+1 == out_n);
   4064          #define CASE(f) \
   4065              case f:     \
   4066                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
   4067                    for (k=0; k < img_n; ++k)
   4068          switch (filter) {
   4069             CASE(STBI__F_none)         cur[k] = raw[k]; break;
   4070             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
   4071             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4072             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
   4073             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
   4074             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
   4075             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
   4076          }
   4077          #undef CASE
   4078       }
   4079    }
   4080 
   4081    // we make a separate pass to expand bits to pixels; for performance,
   4082    // this could run two scanlines behind the above code, so it won't
   4083    // intefere with filtering but will still be in the cache.
   4084    if (depth < 8) {
   4085       for (j=0; j < y; ++j) {
   4086          stbi_uc *cur = a->out + stride*j;
   4087          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4088          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4089          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4090          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4091 
   4092          // note that the final byte might overshoot and write more data than desired.
   4093          // we can allocate enough data that this never writes out of memory, but it
   4094          // could also overwrite the next scanline. can it overwrite non-empty data
   4095          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4096          // so we need to explicitly clamp the final ones
   4097 
   4098          if (depth == 4) {
   4099             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4100                *cur++ = scale * ((*in >> 4)       );
   4101                *cur++ = scale * ((*in     ) & 0x0f);
   4102             }
   4103             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4104          } else if (depth == 2) {
   4105             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4106                *cur++ = scale * ((*in >> 6)       );
   4107                *cur++ = scale * ((*in >> 4) & 0x03);
   4108                *cur++ = scale * ((*in >> 2) & 0x03);
   4109                *cur++ = scale * ((*in     ) & 0x03);
   4110             }
   4111             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4112             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4113             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4114          } else if (depth == 1) {
   4115             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4116                *cur++ = scale * ((*in >> 7)       );
   4117                *cur++ = scale * ((*in >> 6) & 0x01);
   4118                *cur++ = scale * ((*in >> 5) & 0x01);
   4119                *cur++ = scale * ((*in >> 4) & 0x01);
   4120                *cur++ = scale * ((*in >> 3) & 0x01);
   4121                *cur++ = scale * ((*in >> 2) & 0x01);
   4122                *cur++ = scale * ((*in >> 1) & 0x01);
   4123                *cur++ = scale * ((*in     ) & 0x01);
   4124             }
   4125             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4126             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4127             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4128             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4129             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4130             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4131             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4132          }
   4133          if (img_n != out_n) {
   4134             int q;
   4135             // insert alpha = 255
   4136             cur = a->out + stride*j;
   4137             if (img_n == 1) {
   4138                for (q=x-1; q >= 0; --q) {
   4139                   cur[q*2+1] = 255;
   4140                   cur[q*2+0] = cur[q];
   4141                }
   4142             } else {
   4143                STBI_ASSERT(img_n == 3);
   4144                for (q=x-1; q >= 0; --q) {
   4145                   cur[q*4+3] = 255;
   4146                   cur[q*4+2] = cur[q*3+2];
   4147                   cur[q*4+1] = cur[q*3+1];
   4148                   cur[q*4+0] = cur[q*3+0];
   4149                }
   4150             }
   4151          }
   4152       }
   4153    }
   4154 
   4155    return 1;
   4156 }
   4157 
   4158 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4159 {
   4160    stbi_uc *final;
   4161    int p;
   4162    if (!interlaced)
   4163       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4164 
   4165    // de-interlacing
   4166    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
   4167    for (p=0; p < 7; ++p) {
   4168       int xorig[] = { 0,4,0,2,0,1,0 };
   4169       int yorig[] = { 0,0,4,0,2,0,1 };
   4170       int xspc[]  = { 8,8,4,4,2,2,1 };
   4171       int yspc[]  = { 8,8,8,4,4,2,2 };
   4172       int i,j,x,y;
   4173       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4174       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4175       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4176       if (x && y) {
   4177          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4178          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4179             STBI_FREE(final);
   4180             return 0;
   4181          }
   4182          for (j=0; j < y; ++j) {
   4183             for (i=0; i < x; ++i) {
   4184                int out_y = j*yspc[p]+yorig[p];
   4185                int out_x = i*xspc[p]+xorig[p];
   4186                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
   4187                       a->out + (j*x+i)*out_n, out_n);
   4188             }
   4189          }
   4190          STBI_FREE(a->out);
   4191          image_data += img_len;
   4192          image_data_len -= img_len;
   4193       }
   4194    }
   4195    a->out = final;
   4196 
   4197    return 1;
   4198 }
   4199 
   4200 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4201 {
   4202    stbi__context *s = z->s;
   4203    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4204    stbi_uc *p = z->out;
   4205 
   4206    // compute color-based transparency, assuming we've
   4207    // already got 255 as the alpha value in the output
   4208    STBI_ASSERT(out_n == 2 || out_n == 4);
   4209 
   4210    if (out_n == 2) {
   4211       for (i=0; i < pixel_count; ++i) {
   4212          p[1] = (p[0] == tc[0] ? 0 : 255);
   4213          p += 2;
   4214       }
   4215    } else {
   4216       for (i=0; i < pixel_count; ++i) {
   4217          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4218             p[3] = 0;
   4219          p += 4;
   4220       }
   4221    }
   4222    return 1;
   4223 }
   4224 
   4225 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4226 {
   4227    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4228    stbi_uc *p, *temp_out, *orig = a->out;
   4229 
   4230    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
   4231    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4232 
   4233    // between here and free(out) below, exitting would leak
   4234    temp_out = p;
   4235 
   4236    if (pal_img_n == 3) {
   4237       for (i=0; i < pixel_count; ++i) {
   4238          int n = orig[i]*4;
   4239          p[0] = palette[n  ];
   4240          p[1] = palette[n+1];
   4241          p[2] = palette[n+2];
   4242          p += 3;
   4243       }
   4244    } else {
   4245       for (i=0; i < pixel_count; ++i) {
   4246          int n = orig[i]*4;
   4247          p[0] = palette[n  ];
   4248          p[1] = palette[n+1];
   4249          p[2] = palette[n+2];
   4250          p[3] = palette[n+3];
   4251          p += 4;
   4252       }
   4253    }
   4254    STBI_FREE(a->out);
   4255    a->out = temp_out;
   4256 
   4257    STBI_NOTUSED(len);
   4258 
   4259    return 1;
   4260 }
   4261 
   4262 static int stbi__unpremultiply_on_load = 0;
   4263 static int stbi__de_iphone_flag = 0;
   4264 
   4265 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4266 {
   4267    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
   4268 }
   4269 
   4270 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   4271 {
   4272    stbi__de_iphone_flag = flag_true_if_should_convert;
   4273 }
   4274 
   4275 static void stbi__de_iphone(stbi__png *z)
   4276 {
   4277    stbi__context *s = z->s;
   4278    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4279    stbi_uc *p = z->out;
   4280 
   4281    if (s->img_out_n == 3) {  // convert bgr to rgb
   4282       for (i=0; i < pixel_count; ++i) {
   4283          stbi_uc t = p[0];
   4284          p[0] = p[2];
   4285          p[2] = t;
   4286          p += 3;
   4287       }
   4288    } else {
   4289       STBI_ASSERT(s->img_out_n == 4);
   4290       if (stbi__unpremultiply_on_load) {
   4291          // convert bgr to rgb and unpremultiply
   4292          for (i=0; i < pixel_count; ++i) {
   4293             stbi_uc a = p[3];
   4294             stbi_uc t = p[0];
   4295             if (a) {
   4296                p[0] = p[2] * 255 / a;
   4297                p[1] = p[1] * 255 / a;
   4298                p[2] =  t   * 255 / a;
   4299             } else {
   4300                p[0] = p[2];
   4301                p[2] = t;
   4302             }
   4303             p += 4;
   4304          }
   4305       } else {
   4306          // convert bgr to rgb
   4307          for (i=0; i < pixel_count; ++i) {
   4308             stbi_uc t = p[0];
   4309             p[0] = p[2];
   4310             p[2] = t;
   4311             p += 4;
   4312          }
   4313       }
   4314    }
   4315 }
   4316 
   4317 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
   4318 
   4319 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   4320 {
   4321    stbi_uc palette[1024], pal_img_n=0;
   4322    stbi_uc has_trans=0, tc[3];
   4323    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   4324    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
   4325    stbi__context *s = z->s;
   4326 
   4327    z->expanded = NULL;
   4328    z->idata = NULL;
   4329    z->out = NULL;
   4330 
   4331    if (!stbi__check_png_header(s)) return 0;
   4332 
   4333    if (scan == STBI__SCAN_type) return 1;
   4334 
   4335    for (;;) {
   4336       stbi__pngchunk c = stbi__get_chunk_header(s);
   4337       switch (c.type) {
   4338          case STBI__PNG_TYPE('C','g','B','I'):
   4339             is_iphone = 1;
   4340             stbi__skip(s, c.length);
   4341             break;
   4342          case STBI__PNG_TYPE('I','H','D','R'): {
   4343             int comp,filter;
   4344             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   4345             first = 0;
   4346             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   4347             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4348             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4349             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
   4350             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   4351             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   4352             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   4353             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   4354             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   4355             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   4356             if (!pal_img_n) {
   4357                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   4358                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   4359                if (scan == STBI__SCAN_header) return 1;
   4360             } else {
   4361                // if paletted, then pal_n is our final components, and
   4362                // img_n is # components to decompress/filter.
   4363                s->img_n = 1;
   4364                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   4365                // if SCAN_header, have to scan to see if we have a tRNS
   4366             }
   4367             break;
   4368          }
   4369 
   4370          case STBI__PNG_TYPE('P','L','T','E'):  {
   4371             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4372             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   4373             pal_len = c.length / 3;
   4374             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   4375             for (i=0; i < pal_len; ++i) {
   4376                palette[i*4+0] = stbi__get8(s);
   4377                palette[i*4+1] = stbi__get8(s);
   4378                palette[i*4+2] = stbi__get8(s);
   4379                palette[i*4+3] = 255;
   4380             }
   4381             break;
   4382          }
   4383 
   4384          case STBI__PNG_TYPE('t','R','N','S'): {
   4385             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4386             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   4387             if (pal_img_n) {
   4388                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   4389                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   4390                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   4391                pal_img_n = 4;
   4392                for (i=0; i < c.length; ++i)
   4393                   palette[i*4+3] = stbi__get8(s);
   4394             } else {
   4395                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   4396                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   4397                has_trans = 1;
   4398                for (k=0; k < s->img_n; ++k)
   4399                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
   4400             }
   4401             break;
   4402          }
   4403 
   4404          case STBI__PNG_TYPE('I','D','A','T'): {
   4405             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4406             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   4407             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
   4408             if ((int)(ioff + c.length) < (int)ioff) return 0;
   4409             if (ioff + c.length > idata_limit) {
   4410                stbi__uint32 idata_limit_old = idata_limit;
   4411                stbi_uc *p;
   4412                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   4413                while (ioff + c.length > idata_limit)
   4414                   idata_limit *= 2;
   4415                STBI_NOTUSED(idata_limit_old);
   4416                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4417                z->idata = p;
   4418             }
   4419             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   4420             ioff += c.length;
   4421             break;
   4422          }
   4423 
   4424          case STBI__PNG_TYPE('I','E','N','D'): {
   4425             stbi__uint32 raw_len, bpl;
   4426             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4427             if (scan != STBI__SCAN_load) return 1;
   4428             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   4429             // initial guess for decoded data size to avoid unnecessary reallocs
   4430             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
   4431             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   4432             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   4433             if (z->expanded == NULL) return 0; // zlib should set error
   4434             STBI_FREE(z->idata); z->idata = NULL;
   4435             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   4436                s->img_out_n = s->img_n+1;
   4437             else
   4438                s->img_out_n = s->img_n;
   4439             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
   4440             if (has_trans)
   4441                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   4442             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   4443                stbi__de_iphone(z);
   4444             if (pal_img_n) {
   4445                // pal_img_n == 3 or 4
   4446                s->img_n = pal_img_n; // record the actual colors we had
   4447                s->img_out_n = pal_img_n;
   4448                if (req_comp >= 3) s->img_out_n = req_comp;
   4449                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   4450                   return 0;
   4451             }
   4452             STBI_FREE(z->expanded); z->expanded = NULL;
   4453             return 1;
   4454          }
   4455 
   4456          default:
   4457             // if critical, fail
   4458             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4459             if ((c.type & (1 << 29)) == 0) {
   4460                #ifndef STBI_NO_FAILURE_STRINGS
   4461                // not threadsafe
   4462                static char invalid_chunk[] = "XXXX PNG chunk not known";
   4463                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   4464                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   4465                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   4466                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   4467                #endif
   4468                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   4469             }
   4470             stbi__skip(s, c.length);
   4471             break;
   4472       }
   4473       // end of PNG chunk, read and skip CRC
   4474       stbi__get32be(s);
   4475    }
   4476 }
   4477 
   4478 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
   4479 {
   4480    unsigned char *result=NULL;
   4481    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   4482    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   4483       result = p->out;
   4484       p->out = NULL;
   4485       if (req_comp && req_comp != p->s->img_out_n) {
   4486          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   4487          p->s->img_out_n = req_comp;
   4488          if (result == NULL) return result;
   4489       }
   4490       *x = p->s->img_x;
   4491       *y = p->s->img_y;
   4492       if (n) *n = p->s->img_out_n;
   4493    }
   4494    STBI_FREE(p->out);      p->out      = NULL;
   4495    STBI_FREE(p->expanded); p->expanded = NULL;
   4496    STBI_FREE(p->idata);    p->idata    = NULL;
   4497 
   4498    return result;
   4499 }
   4500 
   4501 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4502 {
   4503    stbi__png p;
   4504    p.s = s;
   4505    return stbi__do_png(&p, x,y,comp,req_comp);
   4506 }
   4507 
   4508 static int stbi__png_test(stbi__context *s)
   4509 {
   4510    int r;
   4511    r = stbi__check_png_header(s);
   4512    stbi__rewind(s);
   4513    return r;
   4514 }
   4515 
   4516 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   4517 {
   4518    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   4519       stbi__rewind( p->s );
   4520       return 0;
   4521    }
   4522    if (x) *x = p->s->img_x;
   4523    if (y) *y = p->s->img_y;
   4524    if (comp) *comp = p->s->img_n;
   4525    return 1;
   4526 }
   4527 
   4528 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   4529 {
   4530    stbi__png p;
   4531    p.s = s;
   4532    return stbi__png_info_raw(&p, x, y, comp);
   4533 }
   4534 #endif
   4535 
   4536 // Microsoft/Windows BMP image
   4537 
   4538 #ifndef STBI_NO_BMP
   4539 static int stbi__bmp_test_raw(stbi__context *s)
   4540 {
   4541    int r;
   4542    int sz;
   4543    if (stbi__get8(s) != 'B') return 0;
   4544    if (stbi__get8(s) != 'M') return 0;
   4545    stbi__get32le(s); // discard filesize
   4546    stbi__get16le(s); // discard reserved
   4547    stbi__get16le(s); // discard reserved
   4548    stbi__get32le(s); // discard data offset
   4549    sz = stbi__get32le(s);
   4550    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   4551    return r;
   4552 }
   4553 
   4554 static int stbi__bmp_test(stbi__context *s)
   4555 {
   4556    int r = stbi__bmp_test_raw(s);
   4557    stbi__rewind(s);
   4558    return r;
   4559 }
   4560 
   4561 
   4562 // returns 0..31 for the highest set bit
   4563 static int stbi__high_bit(unsigned int z)
   4564 {
   4565    int n=0;
   4566    if (z == 0) return -1;
   4567    if (z >= 0x10000) n += 16, z >>= 16;
   4568    if (z >= 0x00100) n +=  8, z >>=  8;
   4569    if (z >= 0x00010) n +=  4, z >>=  4;
   4570    if (z >= 0x00004) n +=  2, z >>=  2;
   4571    if (z >= 0x00002) n +=  1, z >>=  1;
   4572    return n;
   4573 }
   4574 
   4575 static int stbi__bitcount(unsigned int a)
   4576 {
   4577    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   4578    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   4579    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   4580    a = (a + (a >> 8)); // max 16 per 8 bits
   4581    a = (a + (a >> 16)); // max 32 per 8 bits
   4582    return a & 0xff;
   4583 }
   4584 
   4585 static int stbi__shiftsigned(int v, int shift, int bits)
   4586 {
   4587    int result;
   4588    int z=0;
   4589 
   4590    if (shift < 0) v <<= -shift;
   4591    else v >>= shift;
   4592    result = v;
   4593 
   4594    z = bits;
   4595    while (z < 8) {
   4596       result += v >> z;
   4597       z += bits;
   4598    }
   4599    return result;
   4600 }
   4601 
   4602 typedef struct
   4603 {
   4604    int bpp, offset, hsz;
   4605    unsigned int mr,mg,mb,ma, all_a;
   4606 } stbi__bmp_data;
   4607 
   4608 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
   4609 {
   4610    int hsz;
   4611    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   4612    stbi__get32le(s); // discard filesize
   4613    stbi__get16le(s); // discard reserved
   4614    stbi__get16le(s); // discard reserved
   4615    info->offset = stbi__get32le(s);
   4616    info->hsz = hsz = stbi__get32le(s);
   4617    
   4618    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   4619    if (hsz == 12) {
   4620       s->img_x = stbi__get16le(s);
   4621       s->img_y = stbi__get16le(s);
   4622    } else {
   4623       s->img_x = stbi__get32le(s);
   4624       s->img_y = stbi__get32le(s);
   4625    }
   4626    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   4627    info->bpp = stbi__get16le(s);
   4628    if (info->bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
   4629    if (hsz != 12) {
   4630       int compress = stbi__get32le(s);
   4631       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   4632       stbi__get32le(s); // discard sizeof
   4633       stbi__get32le(s); // discard hres
   4634       stbi__get32le(s); // discard vres
   4635       stbi__get32le(s); // discard colorsused
   4636       stbi__get32le(s); // discard max important
   4637       if (hsz == 40 || hsz == 56) {
   4638          if (hsz == 56) {
   4639             stbi__get32le(s);
   4640             stbi__get32le(s);
   4641             stbi__get32le(s);
   4642             stbi__get32le(s);
   4643          }
   4644          if (info->bpp == 16 || info->bpp == 32) {
   4645             info->mr = info->mg = info->mb = 0;
   4646             if (compress == 0) {
   4647                if (info->bpp == 32) {
   4648                   info->mr = 0xffu << 16;
   4649                   info->mg = 0xffu <<  8;
   4650                   info->mb = 0xffu <<  0;
   4651                   info->ma = 0xffu << 24;
   4652                   info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   4653                } else {
   4654                   info->mr = 31u << 10;
   4655                   info->mg = 31u <<  5;
   4656                   info->mb = 31u <<  0;
   4657                }
   4658             } else if (compress == 3) {
   4659                info->mr = stbi__get32le(s);
   4660                info->mg = stbi__get32le(s);
   4661                info->mb = stbi__get32le(s);
   4662                // not documented, but generated by photoshop and handled by mspaint
   4663                if (info->mr == info->mg && info->mg == info->mb) {
   4664                   // ?!?!?
   4665                   return stbi__errpuc("bad BMP", "bad BMP");
   4666                }
   4667             } else
   4668                return stbi__errpuc("bad BMP", "bad BMP");
   4669          }
   4670       } else {
   4671          int i;
   4672          if (hsz != 108 && hsz != 124)
   4673             return stbi__errpuc("bad BMP", "bad BMP");
   4674          info->mr = stbi__get32le(s);
   4675          info->mg = stbi__get32le(s);
   4676          info->mb = stbi__get32le(s);
   4677          info->ma = stbi__get32le(s);
   4678          stbi__get32le(s); // discard color space
   4679          for (i=0; i < 12; ++i)
   4680             stbi__get32le(s); // discard color space parameters
   4681          if (hsz == 124) {
   4682             stbi__get32le(s); // discard rendering intent
   4683             stbi__get32le(s); // discard offset of profile data
   4684             stbi__get32le(s); // discard size of profile data
   4685             stbi__get32le(s); // discard reserved
   4686          }
   4687       }
   4688    }
   4689    return (void *) 1;
   4690 }
   4691 
   4692 
   4693 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4694 {
   4695    stbi_uc *out;
   4696    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
   4697    stbi_uc pal[256][4];
   4698    int psize=0,i,j,width;
   4699    int flip_vertically, pad, target;
   4700    stbi__bmp_data info;
   4701 
   4702    info.all_a = 255;   
   4703    if (stbi__bmp_parse_header(s, &info) == NULL)
   4704       return NULL; // error code already set
   4705 
   4706    flip_vertically = ((int) s->img_y) > 0;
   4707    s->img_y = abs((int) s->img_y);
   4708 
   4709    mr = info.mr;
   4710    mg = info.mg;
   4711    mb = info.mb;
   4712    ma = info.ma;
   4713    all_a = info.all_a;
   4714 
   4715    if (info.hsz == 12) {
   4716       if (info.bpp < 24)
   4717          psize = (info.offset - 14 - 24) / 3;
   4718    } else {
   4719       if (info.bpp < 16)
   4720          psize = (info.offset - 14 - info.hsz) >> 2;
   4721    }
   4722 
   4723    s->img_n = ma ? 4 : 3;
   4724    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   4725       target = req_comp;
   4726    else
   4727       target = s->img_n; // if they want monochrome, we'll post-convert
   4728 
   4729    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
   4730    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   4731    if (info.bpp < 16) {
   4732       int z=0;
   4733       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   4734       for (i=0; i < psize; ++i) {
   4735          pal[i][2] = stbi__get8(s);
   4736          pal[i][1] = stbi__get8(s);
   4737          pal[i][0] = stbi__get8(s);
   4738          if (info.hsz != 12) stbi__get8(s);
   4739          pal[i][3] = 255;
   4740       }
   4741       stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
   4742       if (info.bpp == 4) width = (s->img_x + 1) >> 1;
   4743       else if (info.bpp == 8) width = s->img_x;
   4744       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   4745       pad = (-width)&3;
   4746       for (j=0; j < (int) s->img_y; ++j) {
   4747          for (i=0; i < (int) s->img_x; i += 2) {
   4748             int v=stbi__get8(s),v2=0;
   4749             if (info.bpp == 4) {
   4750                v2 = v & 15;
   4751                v >>= 4;
   4752             }
   4753             out[z++] = pal[v][0];
   4754             out[z++] = pal[v][1];
   4755             out[z++] = pal[v][2];
   4756             if (target == 4) out[z++] = 255;
   4757             if (i+1 == (int) s->img_x) break;
   4758             v = (info.bpp == 8) ? stbi__get8(s) : v2;
   4759             out[z++] = pal[v][0];
   4760             out[z++] = pal[v][1];
   4761             out[z++] = pal[v][2];
   4762             if (target == 4) out[z++] = 255;
   4763          }
   4764          stbi__skip(s, pad);
   4765       }
   4766    } else {
   4767       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   4768       int z = 0;
   4769       int easy=0;
   4770       stbi__skip(s, info.offset - 14 - info.hsz);
   4771       if (info.bpp == 24) width = 3 * s->img_x;
   4772       else if (info.bpp == 16) width = 2*s->img_x;
   4773       else /* bpp = 32 and pad = 0 */ width=0;
   4774       pad = (-width) & 3;
   4775       if (info.bpp == 24) {
   4776          easy = 1;
   4777       } else if (info.bpp == 32) {
   4778          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   4779             easy = 2;
   4780       }
   4781       if (!easy) {
   4782          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   4783          // right shift amt to put high bit in position #7
   4784          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   4785          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   4786          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   4787          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   4788       }
   4789       for (j=0; j < (int) s->img_y; ++j) {
   4790          if (easy) {
   4791             for (i=0; i < (int) s->img_x; ++i) {
   4792                unsigned char a;
   4793                out[z+2] = stbi__get8(s);
   4794                out[z+1] = stbi__get8(s);
   4795                out[z+0] = stbi__get8(s);
   4796                z += 3;
   4797                a = (easy == 2 ? stbi__get8(s) : 255);
   4798                all_a |= a;
   4799                if (target == 4) out[z++] = a;
   4800             }
   4801          } else {
   4802             int bpp = info.bpp;
   4803             for (i=0; i < (int) s->img_x; ++i) {
   4804                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   4805                int a;
   4806                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   4807                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   4808                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   4809                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   4810                all_a |= a;
   4811                if (target == 4) out[z++] = STBI__BYTECAST(a);
   4812             }
   4813          }
   4814          stbi__skip(s, pad);
   4815       }
   4816    }
   4817    
   4818    // if alpha channel is all 0s, replace with all 255s
   4819    if (target == 4 && all_a == 0)
   4820       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   4821          out[i] = 255;
   4822 
   4823    if (flip_vertically) {
   4824       stbi_uc t;
   4825       for (j=0; j < (int) s->img_y>>1; ++j) {
   4826          stbi_uc *p1 = out +      j     *s->img_x*target;
   4827          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   4828          for (i=0; i < (int) s->img_x*target; ++i) {
   4829             t = p1[i], p1[i] = p2[i], p2[i] = t;
   4830          }
   4831       }
   4832    }
   4833 
   4834    if (req_comp && req_comp != target) {
   4835       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   4836       if (out == NULL) return out; // stbi__convert_format frees input on failure
   4837    }
   4838 
   4839    *x = s->img_x;
   4840    *y = s->img_y;
   4841    if (comp) *comp = s->img_n;
   4842    return out;
   4843 }
   4844 #endif
   4845 
   4846 // Targa Truevision - TGA
   4847 // by Jonathan Dummer
   4848 #ifndef STBI_NO_TGA
   4849 // returns STBI_rgb or whatever, 0 on error
   4850 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
   4851 {
   4852    // only RGB or RGBA (incl. 16bit) or grey allowed
   4853    if(is_rgb16) *is_rgb16 = 0;
   4854    switch(bits_per_pixel) {
   4855       case 8:  return STBI_grey;
   4856       case 16: if(is_grey) return STBI_grey_alpha;
   4857             // else: fall-through
   4858       case 15: if(is_rgb16) *is_rgb16 = 1;
   4859             return STBI_rgb;
   4860       case 24: // fall-through
   4861       case 32: return bits_per_pixel/8;
   4862       default: return 0;
   4863    }
   4864 }
   4865 
   4866 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   4867 {
   4868     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
   4869     int sz, tga_colormap_type;
   4870     stbi__get8(s);                   // discard Offset
   4871     tga_colormap_type = stbi__get8(s); // colormap type
   4872     if( tga_colormap_type > 1 ) {
   4873         stbi__rewind(s);
   4874         return 0;      // only RGB or indexed allowed
   4875     }
   4876     tga_image_type = stbi__get8(s); // image type
   4877     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
   4878         if (tga_image_type != 1 && tga_image_type != 9) {
   4879             stbi__rewind(s);
   4880             return 0;
   4881         }
   4882         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   4883         sz = stbi__get8(s);    //   check bits per palette color entry
   4884         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
   4885             stbi__rewind(s);
   4886             return 0;
   4887         }
   4888         stbi__skip(s,4);       // skip image x and y origin
   4889         tga_colormap_bpp = sz;
   4890     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
   4891         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
   4892             stbi__rewind(s);
   4893             return 0; // only RGB or grey allowed, +/- RLE
   4894         }
   4895         stbi__skip(s,9); // skip colormap specification and image x/y origin
   4896         tga_colormap_bpp = 0;
   4897     }
   4898     tga_w = stbi__get16le(s);
   4899     if( tga_w < 1 ) {
   4900         stbi__rewind(s);
   4901         return 0;   // test width
   4902     }
   4903     tga_h = stbi__get16le(s);
   4904     if( tga_h < 1 ) {
   4905         stbi__rewind(s);
   4906         return 0;   // test height
   4907     }
   4908     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
   4909     stbi__get8(s); // ignore alpha bits
   4910     if (tga_colormap_bpp != 0) {
   4911         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
   4912             // when using a colormap, tga_bits_per_pixel is the size of the indexes
   4913             // I don't think anything but 8 or 16bit indexes makes sense
   4914             stbi__rewind(s);
   4915             return 0;
   4916         }
   4917         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
   4918     } else {
   4919         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
   4920     }
   4921     if(!tga_comp) {
   4922       stbi__rewind(s);
   4923       return 0;
   4924     }
   4925     if (x) *x = tga_w;
   4926     if (y) *y = tga_h;
   4927     if (comp) *comp = tga_comp;
   4928     return 1;                   // seems to have passed everything
   4929 }
   4930 
   4931 static int stbi__tga_test(stbi__context *s)
   4932 {
   4933    int res = 0;
   4934    int sz, tga_color_type;
   4935    stbi__get8(s);      //   discard Offset
   4936    tga_color_type = stbi__get8(s);   //   color type
   4937    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
   4938    sz = stbi__get8(s);   //   image type
   4939    if ( tga_color_type == 1 ) { // colormapped (paletted) image
   4940       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
   4941       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   4942       sz = stbi__get8(s);    //   check bits per palette color entry
   4943       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   4944       stbi__skip(s,4);       // skip image x and y origin
   4945    } else { // "normal" image w/o colormap
   4946       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
   4947       stbi__skip(s,9); // skip colormap specification and image x/y origin
   4948    }
   4949    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
   4950    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
   4951    sz = stbi__get8(s);   //   bits per pixel
   4952    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
   4953    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   4954 
   4955    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
   4956 
   4957 errorEnd:
   4958    stbi__rewind(s);
   4959    return res;
   4960 }
   4961 
   4962 // read 16bit value and convert to 24bit RGB
   4963 void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
   4964 {
   4965    stbi__uint16 px = stbi__get16le(s);
   4966    stbi__uint16 fiveBitMask = 31;
   4967    // we have 3 channels with 5bits each
   4968    int r = (px >> 10) & fiveBitMask;
   4969    int g = (px >> 5) & fiveBitMask;
   4970    int b = px & fiveBitMask;
   4971    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
   4972    out[0] = (r * 255)/31;
   4973    out[1] = (g * 255)/31;
   4974    out[2] = (b * 255)/31;
   4975 
   4976    // some people claim that the most significant bit might be used for alpha
   4977    // (possibly if an alpha-bit is set in the "image descriptor byte")
   4978    // but that only made 16bit test images completely translucent..
   4979    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
   4980 }
   4981 
   4982 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4983 {
   4984    //   read in the TGA header stuff
   4985    int tga_offset = stbi__get8(s);
   4986    int tga_indexed = stbi__get8(s);
   4987    int tga_image_type = stbi__get8(s);
   4988    int tga_is_RLE = 0;
   4989    int tga_palette_start = stbi__get16le(s);
   4990    int tga_palette_len = stbi__get16le(s);
   4991    int tga_palette_bits = stbi__get8(s);
   4992    int tga_x_origin = stbi__get16le(s);
   4993    int tga_y_origin = stbi__get16le(s);
   4994    int tga_width = stbi__get16le(s);
   4995    int tga_height = stbi__get16le(s);
   4996    int tga_bits_per_pixel = stbi__get8(s);
   4997    int tga_comp, tga_rgb16=0;
   4998    int tga_inverted = stbi__get8(s);
   4999    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
   5000    //   image data
   5001    unsigned char *tga_data;
   5002    unsigned char *tga_palette = NULL;
   5003    int i, j;
   5004    unsigned char raw_data[4];
   5005    int RLE_count = 0;
   5006    int RLE_repeating = 0;
   5007    int read_next_pixel = 1;
   5008 
   5009    //   do a tiny bit of precessing
   5010    if ( tga_image_type >= 8 )
   5011    {
   5012       tga_image_type -= 8;
   5013       tga_is_RLE = 1;
   5014    }
   5015    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   5016 
   5017    //   If I'm paletted, then I'll use the number of bits from the palette
   5018    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
   5019    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
   5020 
   5021    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
   5022       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
   5023 
   5024    //   tga info
   5025    *x = tga_width;
   5026    *y = tga_height;
   5027    if (comp) *comp = tga_comp;
   5028 
   5029    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
   5030    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   5031 
   5032    // skip to the data's starting position (offset usually = 0)
   5033    stbi__skip(s, tga_offset );
   5034 
   5035    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
   5036       for (i=0; i < tga_height; ++i) {
   5037          int row = tga_inverted ? tga_height -i - 1 : i;
   5038          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   5039          stbi__getn(s, tga_row, tga_width * tga_comp);
   5040       }
   5041    } else  {
   5042       //   do I need to load a palette?
   5043       if ( tga_indexed)
   5044       {
   5045          //   any data to skip? (offset usually = 0)
   5046          stbi__skip(s, tga_palette_start );
   5047          //   load the palette
   5048          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_comp );
   5049          if (!tga_palette) {
   5050             STBI_FREE(tga_data);
   5051             return stbi__errpuc("outofmem", "Out of memory");
   5052          }
   5053          if (tga_rgb16) {
   5054             stbi_uc *pal_entry = tga_palette;
   5055             STBI_ASSERT(tga_comp == STBI_rgb);
   5056             for (i=0; i < tga_palette_len; ++i) {
   5057                stbi__tga_read_rgb16(s, pal_entry);
   5058                pal_entry += tga_comp;
   5059             }
   5060          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
   5061                STBI_FREE(tga_data);
   5062                STBI_FREE(tga_palette);
   5063                return stbi__errpuc("bad palette", "Corrupt TGA");
   5064          }
   5065       }
   5066       //   load the data
   5067       for (i=0; i < tga_width * tga_height; ++i)
   5068       {
   5069          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   5070          if ( tga_is_RLE )
   5071          {
   5072             if ( RLE_count == 0 )
   5073             {
   5074                //   yep, get the next byte as a RLE command
   5075                int RLE_cmd = stbi__get8(s);
   5076                RLE_count = 1 + (RLE_cmd & 127);
   5077                RLE_repeating = RLE_cmd >> 7;
   5078                read_next_pixel = 1;
   5079             } else if ( !RLE_repeating )
   5080             {
   5081                read_next_pixel = 1;
   5082             }
   5083          } else
   5084          {
   5085             read_next_pixel = 1;
   5086          }
   5087          //   OK, if I need to read a pixel, do it now
   5088          if ( read_next_pixel )
   5089          {
   5090             //   load however much data we did have
   5091             if ( tga_indexed )
   5092             {
   5093                // read in index, then perform the lookup
   5094                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
   5095                if ( pal_idx >= tga_palette_len ) {
   5096                   // invalid index
   5097                   pal_idx = 0;
   5098                }
   5099                pal_idx *= tga_comp;
   5100                for (j = 0; j < tga_comp; ++j) {
   5101                   raw_data[j] = tga_palette[pal_idx+j];
   5102                }
   5103             } else if(tga_rgb16) {
   5104                STBI_ASSERT(tga_comp == STBI_rgb);
   5105                stbi__tga_read_rgb16(s, raw_data);
   5106             } else {
   5107                //   read in the data raw
   5108                for (j = 0; j < tga_comp; ++j) {
   5109                   raw_data[j] = stbi__get8(s);
   5110                }
   5111             }
   5112             //   clear the reading flag for the next pixel
   5113             read_next_pixel = 0;
   5114          } // end of reading a pixel
   5115 
   5116          // copy data
   5117          for (j = 0; j < tga_comp; ++j)
   5118            tga_data[i*tga_comp+j] = raw_data[j];
   5119 
   5120          //   in case we're in RLE mode, keep counting down
   5121          --RLE_count;
   5122       }
   5123       //   do I need to invert the image?
   5124       if ( tga_inverted )
   5125       {
   5126          for (j = 0; j*2 < tga_height; ++j)
   5127          {
   5128             int index1 = j * tga_width * tga_comp;
   5129             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   5130             for (i = tga_width * tga_comp; i > 0; --i)
   5131             {
   5132                unsigned char temp = tga_data[index1];
   5133                tga_data[index1] = tga_data[index2];
   5134                tga_data[index2] = temp;
   5135                ++index1;
   5136                ++index2;
   5137             }
   5138          }
   5139       }
   5140       //   clear my palette, if I had one
   5141       if ( tga_palette != NULL )
   5142       {
   5143          STBI_FREE( tga_palette );
   5144       }
   5145    }
   5146 
   5147    // swap RGB - if the source data was RGB16, it already is in the right order
   5148    if (tga_comp >= 3 && !tga_rgb16)
   5149    {
   5150       unsigned char* tga_pixel = tga_data;
   5151       for (i=0; i < tga_width * tga_height; ++i)
   5152       {
   5153          unsigned char temp = tga_pixel[0];
   5154          tga_pixel[0] = tga_pixel[2];
   5155          tga_pixel[2] = temp;
   5156          tga_pixel += tga_comp;
   5157       }
   5158    }
   5159 
   5160    // convert to target component count
   5161    if (req_comp && req_comp != tga_comp)
   5162       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   5163 
   5164    //   the things I do to get rid of an error message, and yet keep
   5165    //   Microsoft's C compilers happy... [8^(
   5166    tga_palette_start = tga_palette_len = tga_palette_bits =
   5167          tga_x_origin = tga_y_origin = 0;
   5168    //   OK, done
   5169    return tga_data;
   5170 }
   5171 #endif
   5172 
   5173 // *************************************************************************************************
   5174 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   5175 
   5176 #ifndef STBI_NO_PSD
   5177 static int stbi__psd_test(stbi__context *s)
   5178 {
   5179    int r = (stbi__get32be(s) == 0x38425053);
   5180    stbi__rewind(s);
   5181    return r;
   5182 }
   5183 
   5184 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   5185 {
   5186    int   pixelCount;
   5187    int channelCount, compression;
   5188    int channel, i, count, len;
   5189    int bitdepth;
   5190    int w,h;
   5191    stbi_uc *out;
   5192 
   5193    // Check identifier
   5194    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   5195       return stbi__errpuc("not PSD", "Corrupt PSD image");
   5196 
   5197    // Check file type version.
   5198    if (stbi__get16be(s) != 1)
   5199       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   5200 
   5201    // Skip 6 reserved bytes.
   5202    stbi__skip(s, 6 );
   5203 
   5204    // Read the number of channels (R, G, B, A, etc).
   5205    channelCount = stbi__get16be(s);
   5206    if (channelCount < 0 || channelCount > 16)
   5207       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   5208 
   5209    // Read the rows and columns of the image.
   5210    h = stbi__get32be(s);
   5211    w = stbi__get32be(s);
   5212 
   5213    // Make sure the depth is 8 bits.
   5214    bitdepth = stbi__get16be(s);
   5215    if (bitdepth != 8 && bitdepth != 16)
   5216       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   5217 
   5218    // Make sure the color mode is RGB.
   5219    // Valid options are:
   5220    //   0: Bitmap
   5221    //   1: Grayscale
   5222    //   2: Indexed color
   5223    //   3: RGB color
   5224    //   4: CMYK color
   5225    //   7: Multichannel
   5226    //   8: Duotone
   5227    //   9: Lab color
   5228    if (stbi__get16be(s) != 3)
   5229       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   5230 
   5231    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   5232    stbi__skip(s,stbi__get32be(s) );
   5233 
   5234    // Skip the image resources.  (resolution, pen tool paths, etc)
   5235    stbi__skip(s, stbi__get32be(s) );
   5236 
   5237    // Skip the reserved data.
   5238    stbi__skip(s, stbi__get32be(s) );
   5239 
   5240    // Find out if the data is compressed.
   5241    // Known values:
   5242    //   0: no compression
   5243    //   1: RLE compressed
   5244    compression = stbi__get16be(s);
   5245    if (compression > 1)
   5246       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   5247 
   5248    // Create the destination image.
   5249    out = (stbi_uc *) stbi__malloc(4 * w*h);
   5250    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5251    pixelCount = w*h;
   5252 
   5253    // Initialize the data to zero.
   5254    //memset( out, 0, pixelCount * 4 );
   5255 
   5256    // Finally, the image data.
   5257    if (compression) {
   5258       // RLE as used by .PSD and .TIFF
   5259       // Loop until you get the number of unpacked bytes you are expecting:
   5260       //     Read the next source byte into n.
   5261       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   5262       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   5263       //     Else if n is 128, noop.
   5264       // Endloop
   5265 
   5266       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
   5267       // which we're going to just skip.
   5268       stbi__skip(s, h * channelCount * 2 );
   5269 
   5270       // Read the RLE data by channel.
   5271       for (channel = 0; channel < 4; channel++) {
   5272          stbi_uc *p;
   5273 
   5274          p = out+channel;
   5275          if (channel >= channelCount) {
   5276             // Fill this channel with default data.
   5277             for (i = 0; i < pixelCount; i++, p += 4)
   5278                *p = (channel == 3 ? 255 : 0);
   5279          } else {
   5280             // Read the RLE data.
   5281             count = 0;
   5282             while (count < pixelCount) {
   5283                len = stbi__get8(s);
   5284                if (len == 128) {
   5285                   // No-op.
   5286                } else if (len < 128) {
   5287                   // Copy next len+1 bytes literally.
   5288                   len++;
   5289                   count += len;
   5290                   while (len) {
   5291                      *p = stbi__get8(s);
   5292                      p += 4;
   5293                      len--;
   5294                   }
   5295                } else if (len > 128) {
   5296                   stbi_uc   val;
   5297                   // Next -len+1 bytes in the dest are replicated from next source byte.
   5298                   // (Interpret len as a negative 8-bit int.)
   5299                   len ^= 0x0FF;
   5300                   len += 2;
   5301                   val = stbi__get8(s);
   5302                   count += len;
   5303                   while (len) {
   5304                      *p = val;
   5305                      p += 4;
   5306                      len--;
   5307                   }
   5308                }
   5309             }
   5310          }
   5311       }
   5312 
   5313    } else {
   5314       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   5315       // where each channel consists of an 8-bit value for each pixel in the image.
   5316 
   5317       // Read the data by channel.
   5318       for (channel = 0; channel < 4; channel++) {
   5319          stbi_uc *p;
   5320 
   5321          p = out + channel;
   5322          if (channel >= channelCount) {
   5323             // Fill this channel with default data.
   5324             stbi_uc val = channel == 3 ? 255 : 0;
   5325             for (i = 0; i < pixelCount; i++, p += 4)
   5326                *p = val;
   5327          } else {
   5328             // Read the data.
   5329             if (bitdepth == 16) {
   5330                for (i = 0; i < pixelCount; i++, p += 4)
   5331                   *p = (stbi_uc) (stbi__get16be(s) >> 8);
   5332             } else {
   5333                for (i = 0; i < pixelCount; i++, p += 4)
   5334                   *p = stbi__get8(s);
   5335             }
   5336          }
   5337       }
   5338    }
   5339 
   5340    if (req_comp && req_comp != 4) {
   5341       out = stbi__convert_format(out, 4, req_comp, w, h);
   5342       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5343    }
   5344 
   5345    if (comp) *comp = 4;
   5346    *y = h;
   5347    *x = w;
   5348 
   5349    return out;
   5350 }
   5351 #endif
   5352 
   5353 // *************************************************************************************************
   5354 // Softimage PIC loader
   5355 // by Tom Seddon
   5356 //
   5357 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   5358 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   5359 
   5360 #ifndef STBI_NO_PIC
   5361 static int stbi__pic_is4(stbi__context *s,const char *str)
   5362 {
   5363    int i;
   5364    for (i=0; i<4; ++i)
   5365       if (stbi__get8(s) != (stbi_uc)str[i])
   5366          return 0;
   5367 
   5368    return 1;
   5369 }
   5370 
   5371 static int stbi__pic_test_core(stbi__context *s)
   5372 {
   5373    int i;
   5374 
   5375    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   5376       return 0;
   5377 
   5378    for(i=0;i<84;++i)
   5379       stbi__get8(s);
   5380 
   5381    if (!stbi__pic_is4(s,"PICT"))
   5382       return 0;
   5383 
   5384    return 1;
   5385 }
   5386 
   5387 typedef struct
   5388 {
   5389    stbi_uc size,type,channel;
   5390 } stbi__pic_packet;
   5391 
   5392 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   5393 {
   5394    int mask=0x80, i;
   5395 
   5396    for (i=0; i<4; ++i, mask>>=1) {
   5397       if (channel & mask) {
   5398          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   5399          dest[i]=stbi__get8(s);
   5400       }
   5401    }
   5402 
   5403    return dest;
   5404 }
   5405 
   5406 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   5407 {
   5408    int mask=0x80,i;
   5409 
   5410    for (i=0;i<4; ++i, mask>>=1)
   5411       if (channel&mask)
   5412          dest[i]=src[i];
   5413 }
   5414 
   5415 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   5416 {
   5417    int act_comp=0,num_packets=0,y,chained;
   5418    stbi__pic_packet packets[10];
   5419 
   5420    // this will (should...) cater for even some bizarre stuff like having data
   5421     // for the same channel in multiple packets.
   5422    do {
   5423       stbi__pic_packet *packet;
   5424 
   5425       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   5426          return stbi__errpuc("bad format","too many packets");
   5427 
   5428       packet = &packets[num_packets++];
   5429 
   5430       chained = stbi__get8(s);
   5431       packet->size    = stbi__get8(s);
   5432       packet->type    = stbi__get8(s);
   5433       packet->channel = stbi__get8(s);
   5434 
   5435       act_comp |= packet->channel;
   5436 
   5437       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   5438       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   5439    } while (chained);
   5440 
   5441    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   5442 
   5443    for(y=0; y<height; ++y) {
   5444       int packet_idx;
   5445 
   5446       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   5447          stbi__pic_packet *packet = &packets[packet_idx];
   5448          stbi_uc *dest = result+y*width*4;
   5449 
   5450          switch (packet->type) {
   5451             default:
   5452                return stbi__errpuc("bad format","packet has bad compression type");
   5453 
   5454             case 0: {//uncompressed
   5455                int x;
   5456 
   5457                for(x=0;x<width;++x, dest+=4)
   5458                   if (!stbi__readval(s,packet->channel,dest))
   5459                      return 0;
   5460                break;
   5461             }
   5462 
   5463             case 1://Pure RLE
   5464                {
   5465                   int left=width, i;
   5466 
   5467                   while (left>0) {
   5468                      stbi_uc count,value[4];
   5469 
   5470                      count=stbi__get8(s);
   5471                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   5472 
   5473                      if (count > left)
   5474                         count = (stbi_uc) left;
   5475 
   5476                      if (!stbi__readval(s,packet->channel,value))  return 0;
   5477 
   5478                      for(i=0; i<count; ++i,dest+=4)
   5479                         stbi__copyval(packet->channel,dest,value);
   5480                      left -= count;
   5481                   }
   5482                }
   5483                break;
   5484 
   5485             case 2: {//Mixed RLE
   5486                int left=width;
   5487                while (left>0) {
   5488                   int count = stbi__get8(s), i;
   5489                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   5490 
   5491                   if (count >= 128) { // Repeated
   5492                      stbi_uc value[4];
   5493 
   5494                      if (count==128)
   5495                         count = stbi__get16be(s);
   5496                      else
   5497                         count -= 127;
   5498                      if (count > left)
   5499                         return stbi__errpuc("bad file","scanline overrun");
   5500 
   5501                      if (!stbi__readval(s,packet->channel,value))
   5502                         return 0;
   5503 
   5504                      for(i=0;i<count;++i, dest += 4)
   5505                         stbi__copyval(packet->channel,dest,value);
   5506                   } else { // Raw
   5507                      ++count;
   5508                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   5509 
   5510                      for(i=0;i<count;++i, dest+=4)
   5511                         if (!stbi__readval(s,packet->channel,dest))
   5512                            return 0;
   5513                   }
   5514                   left-=count;
   5515                }
   5516                break;
   5517             }
   5518          }
   5519       }
   5520    }
   5521 
   5522    return result;
   5523 }
   5524 
   5525 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
   5526 {
   5527    stbi_uc *result;
   5528    int i, x,y;
   5529 
   5530    for (i=0; i<92; ++i)
   5531       stbi__get8(s);
   5532 
   5533    x = stbi__get16be(s);
   5534    y = stbi__get16be(s);
   5535    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   5536    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
   5537 
   5538    stbi__get32be(s); //skip `ratio'
   5539    stbi__get16be(s); //skip `fields'
   5540    stbi__get16be(s); //skip `pad'
   5541 
   5542    // intermediate buffer is RGBA
   5543    result = (stbi_uc *) stbi__malloc(x*y*4);
   5544    memset(result, 0xff, x*y*4);
   5545 
   5546    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   5547       STBI_FREE(result);
   5548       result=0;
   5549    }
   5550    *px = x;
   5551    *py = y;
   5552    if (req_comp == 0) req_comp = *comp;
   5553    result=stbi__convert_format(result,4,req_comp,x,y);
   5554 
   5555    return result;
   5556 }
   5557 
   5558 static int stbi__pic_test(stbi__context *s)
   5559 {
   5560    int r = stbi__pic_test_core(s);
   5561    stbi__rewind(s);
   5562    return r;
   5563 }
   5564 #endif
   5565 
   5566 // *************************************************************************************************
   5567 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   5568 
   5569 #ifndef STBI_NO_GIF
   5570 typedef struct
   5571 {
   5572    stbi__int16 prefix;
   5573    stbi_uc first;
   5574    stbi_uc suffix;
   5575 } stbi__gif_lzw;
   5576 
   5577 typedef struct
   5578 {
   5579    int w,h;
   5580    stbi_uc *out, *old_out;             // output buffer (always 4 components)
   5581    int flags, bgindex, ratio, transparent, eflags, delay;
   5582    stbi_uc  pal[256][4];
   5583    stbi_uc lpal[256][4];
   5584    stbi__gif_lzw codes[4096];
   5585    stbi_uc *color_table;
   5586    int parse, step;
   5587    int lflags;
   5588    int start_x, start_y;
   5589    int max_x, max_y;
   5590    int cur_x, cur_y;
   5591    int line_size;
   5592 } stbi__gif;
   5593 
   5594 static int stbi__gif_test_raw(stbi__context *s)
   5595 {
   5596    int sz;
   5597    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   5598    sz = stbi__get8(s);
   5599    if (sz != '9' && sz != '7') return 0;
   5600    if (stbi__get8(s) != 'a') return 0;
   5601    return 1;
   5602 }
   5603 
   5604 static int stbi__gif_test(stbi__context *s)
   5605 {
   5606    int r = stbi__gif_test_raw(s);
   5607    stbi__rewind(s);
   5608    return r;
   5609 }
   5610 
   5611 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   5612 {
   5613    int i;
   5614    for (i=0; i < num_entries; ++i) {
   5615       pal[i][2] = stbi__get8(s);
   5616       pal[i][1] = stbi__get8(s);
   5617       pal[i][0] = stbi__get8(s);
   5618       pal[i][3] = transp == i ? 0 : 255;
   5619    }
   5620 }
   5621 
   5622 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   5623 {
   5624    stbi_uc version;
   5625    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   5626       return stbi__err("not GIF", "Corrupt GIF");
   5627 
   5628    version = stbi__get8(s);
   5629    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   5630    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   5631 
   5632    stbi__g_failure_reason = "";
   5633    g->w = stbi__get16le(s);
   5634    g->h = stbi__get16le(s);
   5635    g->flags = stbi__get8(s);
   5636    g->bgindex = stbi__get8(s);
   5637    g->ratio = stbi__get8(s);
   5638    g->transparent = -1;
   5639 
   5640    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   5641 
   5642    if (is_info) return 1;
   5643 
   5644    if (g->flags & 0x80)
   5645       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   5646 
   5647    return 1;
   5648 }
   5649 
   5650 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   5651 {
   5652    stbi__gif g;
   5653    if (!stbi__gif_header(s, &g, comp, 1)) {
   5654       stbi__rewind( s );
   5655       return 0;
   5656    }
   5657    if (x) *x = g.w;
   5658    if (y) *y = g.h;
   5659    return 1;
   5660 }
   5661 
   5662 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   5663 {
   5664    stbi_uc *p, *c;
   5665 
   5666    // recurse to decode the prefixes, since the linked-list is backwards,
   5667    // and working backwards through an interleaved image would be nasty
   5668    if (g->codes[code].prefix >= 0)
   5669       stbi__out_gif_code(g, g->codes[code].prefix);
   5670 
   5671    if (g->cur_y >= g->max_y) return;
   5672 
   5673    p = &g->out[g->cur_x + g->cur_y];
   5674    c = &g->color_table[g->codes[code].suffix * 4];
   5675 
   5676    if (c[3] >= 128) {
   5677       p[0] = c[2];
   5678       p[1] = c[1];
   5679       p[2] = c[0];
   5680       p[3] = c[3];
   5681    }
   5682    g->cur_x += 4;
   5683 
   5684    if (g->cur_x >= g->max_x) {
   5685       g->cur_x = g->start_x;
   5686       g->cur_y += g->step;
   5687 
   5688       while (g->cur_y >= g->max_y && g->parse > 0) {
   5689          g->step = (1 << g->parse) * g->line_size;
   5690          g->cur_y = g->start_y + (g->step >> 1);
   5691          --g->parse;
   5692       }
   5693    }
   5694 }
   5695 
   5696 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   5697 {
   5698    stbi_uc lzw_cs;
   5699    stbi__int32 len, init_code;
   5700    stbi__uint32 first;
   5701    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   5702    stbi__gif_lzw *p;
   5703 
   5704    lzw_cs = stbi__get8(s);
   5705    if (lzw_cs > 12) return NULL;
   5706    clear = 1 << lzw_cs;
   5707    first = 1;
   5708    codesize = lzw_cs + 1;
   5709    codemask = (1 << codesize) - 1;
   5710    bits = 0;
   5711    valid_bits = 0;
   5712    for (init_code = 0; init_code < clear; init_code++) {
   5713       g->codes[init_code].prefix = -1;
   5714       g->codes[init_code].first = (stbi_uc) init_code;
   5715       g->codes[init_code].suffix = (stbi_uc) init_code;
   5716    }
   5717 
   5718    // support no starting clear code
   5719    avail = clear+2;
   5720    oldcode = -1;
   5721 
   5722    len = 0;
   5723    for(;;) {
   5724       if (valid_bits < codesize) {
   5725          if (len == 0) {
   5726             len = stbi__get8(s); // start new block
   5727             if (len == 0)
   5728                return g->out;
   5729          }
   5730          --len;
   5731          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   5732          valid_bits += 8;
   5733       } else {
   5734          stbi__int32 code = bits & codemask;
   5735          bits >>= codesize;
   5736          valid_bits -= codesize;
   5737          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   5738          if (code == clear) {  // clear code
   5739             codesize = lzw_cs + 1;
   5740             codemask = (1 << codesize) - 1;
   5741             avail = clear + 2;
   5742             oldcode = -1;
   5743             first = 0;
   5744          } else if (code == clear + 1) { // end of stream code
   5745             stbi__skip(s, len);
   5746             while ((len = stbi__get8(s)) > 0)
   5747                stbi__skip(s,len);
   5748             return g->out;
   5749          } else if (code <= avail) {
   5750             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
   5751 
   5752             if (oldcode >= 0) {
   5753                p = &g->codes[avail++];
   5754                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
   5755                p->prefix = (stbi__int16) oldcode;
   5756                p->first = g->codes[oldcode].first;
   5757                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   5758             } else if (code == avail)
   5759                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   5760 
   5761             stbi__out_gif_code(g, (stbi__uint16) code);
   5762 
   5763             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   5764                codesize++;
   5765                codemask = (1 << codesize) - 1;
   5766             }
   5767 
   5768             oldcode = code;
   5769          } else {
   5770             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   5771          }
   5772       }
   5773    }
   5774 }
   5775 
   5776 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
   5777 {
   5778    int x, y;
   5779    stbi_uc *c = g->pal[g->bgindex];
   5780    for (y = y0; y < y1; y += 4 * g->w) {
   5781       for (x = x0; x < x1; x += 4) {
   5782          stbi_uc *p  = &g->out[y + x];
   5783          p[0] = c[2];
   5784          p[1] = c[1];
   5785          p[2] = c[0];
   5786          p[3] = 0;
   5787       }
   5788    }
   5789 }
   5790 
   5791 // this function is designed to support animated gifs, although stb_image doesn't support it
   5792 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
   5793 {
   5794    int i;
   5795    stbi_uc *prev_out = 0;
   5796 
   5797    if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
   5798       return 0; // stbi__g_failure_reason set by stbi__gif_header
   5799 
   5800    prev_out = g->out;
   5801    g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
   5802    if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
   5803 
   5804    switch ((g->eflags & 0x1C) >> 2) {
   5805       case 0: // unspecified (also always used on 1st frame)
   5806          stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
   5807          break;
   5808       case 1: // do not dispose
   5809          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
   5810          g->old_out = prev_out;
   5811          break;
   5812       case 2: // dispose to background
   5813          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
   5814          stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
   5815          break;
   5816       case 3: // dispose to previous
   5817          if (g->old_out) {
   5818             for (i = g->start_y; i < g->max_y; i += 4 * g->w)
   5819                memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
   5820          }
   5821          break;
   5822    }
   5823 
   5824    for (;;) {
   5825       switch (stbi__get8(s)) {
   5826          case 0x2C: /* Image Descriptor */
   5827          {
   5828             int prev_trans = -1;
   5829             stbi__int32 x, y, w, h;
   5830             stbi_uc *o;
   5831 
   5832             x = stbi__get16le(s);
   5833             y = stbi__get16le(s);
   5834             w = stbi__get16le(s);
   5835             h = stbi__get16le(s);
   5836             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   5837                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   5838 
   5839             g->line_size = g->w * 4;
   5840             g->start_x = x * 4;
   5841             g->start_y = y * g->line_size;
   5842             g->max_x   = g->start_x + w * 4;
   5843             g->max_y   = g->start_y + h * g->line_size;
   5844             g->cur_x   = g->start_x;
   5845             g->cur_y   = g->start_y;
   5846 
   5847             g->lflags = stbi__get8(s);
   5848 
   5849             if (g->lflags & 0x40) {
   5850                g->step = 8 * g->line_size; // first interlaced spacing
   5851                g->parse = 3;
   5852             } else {
   5853                g->step = g->line_size;
   5854                g->parse = 0;
   5855             }
   5856 
   5857             if (g->lflags & 0x80) {
   5858                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   5859                g->color_table = (stbi_uc *) g->lpal;
   5860             } else if (g->flags & 0x80) {
   5861                if (g->transparent >= 0 && (g->eflags & 0x01)) {
   5862                   prev_trans = g->pal[g->transparent][3];
   5863                   g->pal[g->transparent][3] = 0;
   5864                }
   5865                g->color_table = (stbi_uc *) g->pal;
   5866             } else
   5867                return stbi__errpuc("missing color table", "Corrupt GIF");
   5868 
   5869             o = stbi__process_gif_raster(s, g);
   5870             if (o == NULL) return NULL;
   5871 
   5872             if (prev_trans != -1)
   5873                g->pal[g->transparent][3] = (stbi_uc) prev_trans;
   5874 
   5875             return o;
   5876          }
   5877 
   5878          case 0x21: // Comment Extension.
   5879          {
   5880             int len;
   5881             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
   5882                len = stbi__get8(s);
   5883                if (len == 4) {
   5884                   g->eflags = stbi__get8(s);
   5885                   g->delay = stbi__get16le(s);
   5886                   g->transparent = stbi__get8(s);
   5887                } else {
   5888                   stbi__skip(s, len);
   5889                   break;
   5890                }
   5891             }
   5892             while ((len = stbi__get8(s)) != 0)
   5893                stbi__skip(s, len);
   5894             break;
   5895          }
   5896 
   5897          case 0x3B: // gif stream termination code
   5898             return (stbi_uc *) s; // using '1' causes warning on some compilers
   5899 
   5900          default:
   5901             return stbi__errpuc("unknown code", "Corrupt GIF");
   5902       }
   5903    }
   5904 
   5905    STBI_NOTUSED(req_comp);
   5906 }
   5907 
   5908 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   5909 {
   5910    stbi_uc *u = 0;
   5911    stbi__gif g;
   5912    memset(&g, 0, sizeof(g));
   5913 
   5914    u = stbi__gif_load_next(s, &g, comp, req_comp);
   5915    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   5916    if (u) {
   5917       *x = g.w;
   5918       *y = g.h;
   5919       if (req_comp && req_comp != 4)
   5920          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   5921    }
   5922    else if (g.out)
   5923       STBI_FREE(g.out);
   5924 
   5925    return u;
   5926 }
   5927 
   5928 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   5929 {
   5930    return stbi__gif_info_raw(s,x,y,comp);
   5931 }
   5932 #endif
   5933 
   5934 // *************************************************************************************************
   5935 // Radiance RGBE HDR loader
   5936 // originally by Nicolas Schulz
   5937 #ifndef STBI_NO_HDR
   5938 static int stbi__hdr_test_core(stbi__context *s)
   5939 {
   5940    const char *signature = "#?RADIANCE\n";
   5941    int i;
   5942    for (i=0; signature[i]; ++i)
   5943       if (stbi__get8(s) != signature[i])
   5944          return 0;
   5945    return 1;
   5946 }
   5947 
   5948 static int stbi__hdr_test(stbi__context* s)
   5949 {
   5950    int r = stbi__hdr_test_core(s);
   5951    stbi__rewind(s);
   5952    return r;
   5953 }
   5954 
   5955 #define STBI__HDR_BUFLEN  1024
   5956 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   5957 {
   5958    int len=0;
   5959    char c = '\0';
   5960 
   5961    c = (char) stbi__get8(z);
   5962 
   5963    while (!stbi__at_eof(z) && c != '\n') {
   5964       buffer[len++] = c;
   5965       if (len == STBI__HDR_BUFLEN-1) {
   5966          // flush to end of line
   5967          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   5968             ;
   5969          break;
   5970       }
   5971       c = (char) stbi__get8(z);
   5972    }
   5973 
   5974    buffer[len] = 0;
   5975    return buffer;
   5976 }
   5977 
   5978 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   5979 {
   5980    if ( input[3] != 0 ) {
   5981       float f1;
   5982       // Exponent
   5983       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   5984       if (req_comp <= 2)
   5985          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   5986       else {
   5987          output[0] = input[0] * f1;
   5988          output[1] = input[1] * f1;
   5989          output[2] = input[2] * f1;
   5990       }
   5991       if (req_comp == 2) output[1] = 1;
   5992       if (req_comp == 4) output[3] = 1;
   5993    } else {
   5994       switch (req_comp) {
   5995          case 4: output[3] = 1; /* fallthrough */
   5996          case 3: output[0] = output[1] = output[2] = 0;
   5997                  break;
   5998          case 2: output[1] = 1; /* fallthrough */
   5999          case 1: output[0] = 0;
   6000                  break;
   6001       }
   6002    }
   6003 }
   6004 
   6005 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   6006 {
   6007    char buffer[STBI__HDR_BUFLEN];
   6008    char *token;
   6009    int valid = 0;
   6010    int width, height;
   6011    stbi_uc *scanline;
   6012    float *hdr_data;
   6013    int len;
   6014    unsigned char count, value;
   6015    int i, j, k, c1,c2, z;
   6016 
   6017 
   6018    // Check identifier
   6019    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
   6020       return stbi__errpf("not HDR", "Corrupt HDR image");
   6021 
   6022    // Parse header
   6023    for(;;) {
   6024       token = stbi__hdr_gettoken(s,buffer);
   6025       if (token[0] == 0) break;
   6026       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   6027    }
   6028 
   6029    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   6030 
   6031    // Parse width and height
   6032    // can't use sscanf() if we're not using stdio!
   6033    token = stbi__hdr_gettoken(s,buffer);
   6034    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   6035    token += 3;
   6036    height = (int) strtol(token, &token, 10);
   6037    while (*token == ' ') ++token;
   6038    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   6039    token += 3;
   6040    width = (int) strtol(token, NULL, 10);
   6041 
   6042    *x = width;
   6043    *y = height;
   6044 
   6045    if (comp) *comp = 3;
   6046    if (req_comp == 0) req_comp = 3;
   6047 
   6048    // Read data
   6049    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
   6050 
   6051    // Load image data
   6052    // image data is stored as some number of sca
   6053    if ( width < 8 || width >= 32768) {
   6054       // Read flat data
   6055       for (j=0; j < height; ++j) {
   6056          for (i=0; i < width; ++i) {
   6057             stbi_uc rgbe[4];
   6058            main_decode_loop:
   6059             stbi__getn(s, rgbe, 4);
   6060             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   6061          }
   6062       }
   6063    } else {
   6064       // Read RLE-encoded data
   6065       scanline = NULL;
   6066 
   6067       for (j = 0; j < height; ++j) {
   6068          c1 = stbi__get8(s);
   6069          c2 = stbi__get8(s);
   6070          len = stbi__get8(s);
   6071          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   6072             // not run-length encoded, so we have to actually use THIS data as a decoded
   6073             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   6074             stbi_uc rgbe[4];
   6075             rgbe[0] = (stbi_uc) c1;
   6076             rgbe[1] = (stbi_uc) c2;
   6077             rgbe[2] = (stbi_uc) len;
   6078             rgbe[3] = (stbi_uc) stbi__get8(s);
   6079             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   6080             i = 1;
   6081             j = 0;
   6082             STBI_FREE(scanline);
   6083             goto main_decode_loop; // yes, this makes no sense
   6084          }
   6085          len <<= 8;
   6086          len |= stbi__get8(s);
   6087          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   6088          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
   6089 
   6090          for (k = 0; k < 4; ++k) {
   6091             i = 0;
   6092             while (i < width) {
   6093                count = stbi__get8(s);
   6094                if (count > 128) {
   6095                   // Run
   6096                   value = stbi__get8(s);
   6097                   count -= 128;
   6098                   for (z = 0; z < count; ++z)
   6099                      scanline[i++ * 4 + k] = value;
   6100                } else {
   6101                   // Dump
   6102                   for (z = 0; z < count; ++z)
   6103                      scanline[i++ * 4 + k] = stbi__get8(s);
   6104                }
   6105             }
   6106          }
   6107          for (i=0; i < width; ++i)
   6108             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   6109       }
   6110       STBI_FREE(scanline);
   6111    }
   6112 
   6113    return hdr_data;
   6114 }
   6115 
   6116 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   6117 {
   6118    char buffer[STBI__HDR_BUFLEN];
   6119    char *token;
   6120    int valid = 0;
   6121 
   6122    if (stbi__hdr_test(s) == 0) {
   6123        stbi__rewind( s );
   6124        return 0;
   6125    }
   6126 
   6127    for(;;) {
   6128       token = stbi__hdr_gettoken(s,buffer);
   6129       if (token[0] == 0) break;
   6130       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   6131    }
   6132 
   6133    if (!valid) {
   6134        stbi__rewind( s );
   6135        return 0;
   6136    }
   6137    token = stbi__hdr_gettoken(s,buffer);
   6138    if (strncmp(token, "-Y ", 3)) {
   6139        stbi__rewind( s );
   6140        return 0;
   6141    }
   6142    token += 3;
   6143    *y = (int) strtol(token, &token, 10);
   6144    while (*token == ' ') ++token;
   6145    if (strncmp(token, "+X ", 3)) {
   6146        stbi__rewind( s );
   6147        return 0;
   6148    }
   6149    token += 3;
   6150    *x = (int) strtol(token, NULL, 10);
   6151    *comp = 3;
   6152    return 1;
   6153 }
   6154 #endif // STBI_NO_HDR
   6155 
   6156 #ifndef STBI_NO_BMP
   6157 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   6158 {
   6159    void *p;
   6160    stbi__bmp_data info;
   6161 
   6162    info.all_a = 255;   
   6163    p = stbi__bmp_parse_header(s, &info);
   6164    stbi__rewind( s );
   6165    if (p == NULL)
   6166       return 0;
   6167    *x = s->img_x;
   6168    *y = s->img_y;
   6169    *comp = info.ma ? 4 : 3;
   6170    return 1;
   6171 }
   6172 #endif
   6173 
   6174 #ifndef STBI_NO_PSD
   6175 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   6176 {
   6177    int channelCount;
   6178    if (stbi__get32be(s) != 0x38425053) {
   6179        stbi__rewind( s );
   6180        return 0;
   6181    }
   6182    if (stbi__get16be(s) != 1) {
   6183        stbi__rewind( s );
   6184        return 0;
   6185    }
   6186    stbi__skip(s, 6);
   6187    channelCount = stbi__get16be(s);
   6188    if (channelCount < 0 || channelCount > 16) {
   6189        stbi__rewind( s );
   6190        return 0;
   6191    }
   6192    *y = stbi__get32be(s);
   6193    *x = stbi__get32be(s);
   6194    if (stbi__get16be(s) != 8) {
   6195        stbi__rewind( s );
   6196        return 0;
   6197    }
   6198    if (stbi__get16be(s) != 3) {
   6199        stbi__rewind( s );
   6200        return 0;
   6201    }
   6202    *comp = 4;
   6203    return 1;
   6204 }
   6205 #endif
   6206 
   6207 #ifndef STBI_NO_PIC
   6208 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   6209 {
   6210    int act_comp=0,num_packets=0,chained;
   6211    stbi__pic_packet packets[10];
   6212 
   6213    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   6214       stbi__rewind(s);
   6215       return 0;
   6216    }
   6217 
   6218    stbi__skip(s, 88);
   6219 
   6220    *x = stbi__get16be(s);
   6221    *y = stbi__get16be(s);
   6222    if (stbi__at_eof(s)) {
   6223       stbi__rewind( s);
   6224       return 0;
   6225    }
   6226    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   6227       stbi__rewind( s );
   6228       return 0;
   6229    }
   6230 
   6231    stbi__skip(s, 8);
   6232 
   6233    do {
   6234       stbi__pic_packet *packet;
   6235 
   6236       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6237          return 0;
   6238 
   6239       packet = &packets[num_packets++];
   6240       chained = stbi__get8(s);
   6241       packet->size    = stbi__get8(s);
   6242       packet->type    = stbi__get8(s);
   6243       packet->channel = stbi__get8(s);
   6244       act_comp |= packet->channel;
   6245 
   6246       if (stbi__at_eof(s)) {
   6247           stbi__rewind( s );
   6248           return 0;
   6249       }
   6250       if (packet->size != 8) {
   6251           stbi__rewind( s );
   6252           return 0;
   6253       }
   6254    } while (chained);
   6255 
   6256    *comp = (act_comp & 0x10 ? 4 : 3);
   6257 
   6258    return 1;
   6259 }
   6260 #endif
   6261 
   6262 // *************************************************************************************************
   6263 // Portable Gray Map and Portable Pixel Map loader
   6264 // by Ken Miller
   6265 //
   6266 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   6267 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   6268 //
   6269 // Known limitations:
   6270 //    Does not support comments in the header section
   6271 //    Does not support ASCII image data (formats P2 and P3)
   6272 //    Does not support 16-bit-per-channel
   6273 
   6274 #ifndef STBI_NO_PNM
   6275 
   6276 static int      stbi__pnm_test(stbi__context *s)
   6277 {
   6278    char p, t;
   6279    p = (char) stbi__get8(s);
   6280    t = (char) stbi__get8(s);
   6281    if (p != 'P' || (t != '5' && t != '6')) {
   6282        stbi__rewind( s );
   6283        return 0;
   6284    }
   6285    return 1;
   6286 }
   6287 
   6288 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   6289 {
   6290    stbi_uc *out;
   6291    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
   6292       return 0;
   6293    *x = s->img_x;
   6294    *y = s->img_y;
   6295    *comp = s->img_n;
   6296 
   6297    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
   6298    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   6299    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
   6300 
   6301    if (req_comp && req_comp != s->img_n) {
   6302       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   6303       if (out == NULL) return out; // stbi__convert_format frees input on failure
   6304    }
   6305    return out;
   6306 }
   6307 
   6308 static int      stbi__pnm_isspace(char c)
   6309 {
   6310    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   6311 }
   6312 
   6313 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   6314 {
   6315    for (;;) {
   6316       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   6317          *c = (char) stbi__get8(s);
   6318 
   6319       if (stbi__at_eof(s) || *c != '#')
   6320          break;
   6321 
   6322       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
   6323          *c = (char) stbi__get8(s);
   6324    }
   6325 }
   6326 
   6327 static int      stbi__pnm_isdigit(char c)
   6328 {
   6329    return c >= '0' && c <= '9';
   6330 }
   6331 
   6332 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   6333 {
   6334    int value = 0;
   6335 
   6336    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   6337       value = value*10 + (*c - '0');
   6338       *c = (char) stbi__get8(s);
   6339    }
   6340 
   6341    return value;
   6342 }
   6343 
   6344 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   6345 {
   6346    int maxv;
   6347    char c, p, t;
   6348 
   6349    stbi__rewind( s );
   6350 
   6351    // Get identifier
   6352    p = (char) stbi__get8(s);
   6353    t = (char) stbi__get8(s);
   6354    if (p != 'P' || (t != '5' && t != '6')) {
   6355        stbi__rewind( s );
   6356        return 0;
   6357    }
   6358 
   6359    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   6360 
   6361    c = (char) stbi__get8(s);
   6362    stbi__pnm_skip_whitespace(s, &c);
   6363 
   6364    *x = stbi__pnm_getinteger(s, &c); // read width
   6365    stbi__pnm_skip_whitespace(s, &c);
   6366 
   6367    *y = stbi__pnm_getinteger(s, &c); // read height
   6368    stbi__pnm_skip_whitespace(s, &c);
   6369 
   6370    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   6371 
   6372    if (maxv > 255)
   6373       return stbi__err("max value > 255", "PPM image not 8-bit");
   6374    else
   6375       return 1;
   6376 }
   6377 #endif
   6378 
   6379 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   6380 {
   6381    #ifndef STBI_NO_JPEG
   6382    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   6383    #endif
   6384 
   6385    #ifndef STBI_NO_PNG
   6386    if (stbi__png_info(s, x, y, comp))  return 1;
   6387    #endif
   6388 
   6389    #ifndef STBI_NO_GIF
   6390    if (stbi__gif_info(s, x, y, comp))  return 1;
   6391    #endif
   6392 
   6393    #ifndef STBI_NO_BMP
   6394    if (stbi__bmp_info(s, x, y, comp))  return 1;
   6395    #endif
   6396 
   6397    #ifndef STBI_NO_PSD
   6398    if (stbi__psd_info(s, x, y, comp))  return 1;
   6399    #endif
   6400 
   6401    #ifndef STBI_NO_PIC
   6402    if (stbi__pic_info(s, x, y, comp))  return 1;
   6403    #endif
   6404 
   6405    #ifndef STBI_NO_PNM
   6406    if (stbi__pnm_info(s, x, y, comp))  return 1;
   6407    #endif
   6408 
   6409    #ifndef STBI_NO_HDR
   6410    if (stbi__hdr_info(s, x, y, comp))  return 1;
   6411    #endif
   6412 
   6413    // test tga last because it's a crappy test!
   6414    #ifndef STBI_NO_TGA
   6415    if (stbi__tga_info(s, x, y, comp))
   6416        return 1;
   6417    #endif
   6418    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   6419 }
   6420 
   6421 #ifndef STBI_NO_STDIO
   6422 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   6423 {
   6424     FILE *f = stbi__fopen(filename, "rb");
   6425     int result;
   6426     if (!f) return stbi__err("can't fopen", "Unable to open file");
   6427     result = stbi_info_from_file(f, x, y, comp);
   6428     fclose(f);
   6429     return result;
   6430 }
   6431 
   6432 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   6433 {
   6434    int r;
   6435    stbi__context s;
   6436    long pos = ftell(f);
   6437    stbi__start_file(&s, f);
   6438    r = stbi__info_main(&s,x,y,comp);
   6439    fseek(f,pos,SEEK_SET);
   6440    return r;
   6441 }
   6442 #endif // !STBI_NO_STDIO
   6443 
   6444 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   6445 {
   6446    stbi__context s;
   6447    stbi__start_mem(&s,buffer,len);
   6448    return stbi__info_main(&s,x,y,comp);
   6449 }
   6450 
   6451 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   6452 {
   6453    stbi__context s;
   6454    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   6455    return stbi__info_main(&s,x,y,comp);
   6456 }
   6457 
   6458 #endif // STB_IMAGE_IMPLEMENTATION
   6459 
   6460 /*
   6461    revision history:
   6462       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
   6463       2.09  (2016-01-16) allow comments in PNM files
   6464                          16-bit-per-pixel TGA (not bit-per-component)
   6465                          info() for TGA could break due to .hdr handling
   6466                          info() for BMP to shares code instead of sloppy parse
   6467                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
   6468                          code cleanup
   6469       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   6470       2.07  (2015-09-13) fix compiler warnings
   6471                          partial animated GIF support
   6472                          limited 16-bpc PSD support
   6473                          #ifdef unused functions
   6474                          bug with < 92 byte PIC,PNM,HDR,TGA
   6475       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   6476       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   6477       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   6478       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   6479                          stbi_set_flip_vertically_on_load (nguillemot)
   6480                          fix NEON support; fix mingw support
   6481       2.02  (2015-01-19) fix incorrect assert, fix warning
   6482       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   6483       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   6484       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   6485                          progressive JPEG (stb)
   6486                          PGM/PPM support (Ken Miller)
   6487                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   6488                          GIF bugfix -- seemingly never worked
   6489                          STBI_NO_*, STBI_ONLY_*
   6490       1.48  (2014-12-14) fix incorrectly-named assert()
   6491       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   6492                          optimize PNG (ryg)
   6493                          fix bug in interlaced PNG with user-specified channel count (stb)
   6494       1.46  (2014-08-26)
   6495               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   6496       1.45  (2014-08-16)
   6497               fix MSVC-ARM internal compiler error by wrapping malloc
   6498       1.44  (2014-08-07)
   6499               various warning fixes from Ronny Chevalier
   6500       1.43  (2014-07-15)
   6501               fix MSVC-only compiler problem in code changed in 1.42
   6502       1.42  (2014-07-09)
   6503               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   6504               fixes to stbi__cleanup_jpeg path
   6505               added STBI_ASSERT to avoid requiring assert.h
   6506       1.41  (2014-06-25)
   6507               fix search&replace from 1.36 that messed up comments/error messages
   6508       1.40  (2014-06-22)
   6509               fix gcc struct-initialization warning
   6510       1.39  (2014-06-15)
   6511               fix to TGA optimization when req_comp != number of components in TGA;
   6512               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   6513               add support for BMP version 5 (more ignored fields)
   6514       1.38  (2014-06-06)
   6515               suppress MSVC warnings on integer casts truncating values
   6516               fix accidental rename of 'skip' field of I/O
   6517       1.37  (2014-06-04)
   6518               remove duplicate typedef
   6519       1.36  (2014-06-03)
   6520               convert to header file single-file library
   6521               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   6522       1.35  (2014-05-27)
   6523               various warnings
   6524               fix broken STBI_SIMD path
   6525               fix bug where stbi_load_from_file no longer left file pointer in correct place
   6526               fix broken non-easy path for 32-bit BMP (possibly never used)
   6527               TGA optimization by Arseny Kapoulkine
   6528       1.34  (unknown)
   6529               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   6530       1.33  (2011-07-14)
   6531               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   6532       1.32  (2011-07-13)
   6533               support for "info" function for all supported filetypes (SpartanJ)
   6534       1.31  (2011-06-20)
   6535               a few more leak fixes, bug in PNG handling (SpartanJ)
   6536       1.30  (2011-06-11)
   6537               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   6538               removed deprecated format-specific test/load functions
   6539               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   6540               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   6541               fix inefficiency in decoding 32-bit BMP (David Woo)
   6542       1.29  (2010-08-16)
   6543               various warning fixes from Aurelien Pocheville
   6544       1.28  (2010-08-01)
   6545               fix bug in GIF palette transparency (SpartanJ)
   6546       1.27  (2010-08-01)
   6547               cast-to-stbi_uc to fix warnings
   6548       1.26  (2010-07-24)
   6549               fix bug in file buffering for PNG reported by SpartanJ
   6550       1.25  (2010-07-17)
   6551               refix trans_data warning (Won Chun)
   6552       1.24  (2010-07-12)
   6553               perf improvements reading from files on platforms with lock-heavy fgetc()
   6554               minor perf improvements for jpeg
   6555               deprecated type-specific functions so we'll get feedback if they're needed
   6556               attempt to fix trans_data warning (Won Chun)
   6557       1.23    fixed bug in iPhone support
   6558       1.22  (2010-07-10)
   6559               removed image *writing* support
   6560               stbi_info support from Jetro Lauha
   6561               GIF support from Jean-Marc Lienher
   6562               iPhone PNG-extensions from James Brown
   6563               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   6564       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   6565       1.20    added support for Softimage PIC, by Tom Seddon
   6566       1.19    bug in interlaced PNG corruption check (found by ryg)
   6567       1.18  (2008-08-02)
   6568               fix a threading bug (local mutable static)
   6569       1.17    support interlaced PNG
   6570       1.16    major bugfix - stbi__convert_format converted one too many pixels
   6571       1.15    initialize some fields for thread safety
   6572       1.14    fix threadsafe conversion bug
   6573               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   6574       1.13    threadsafe
   6575       1.12    const qualifiers in the API
   6576       1.11    Support installable IDCT, colorspace conversion routines
   6577       1.10    Fixes for 64-bit (don't use "unsigned long")
   6578               optimized upsampling by Fabian "ryg" Giesen
   6579       1.09    Fix format-conversion for PSD code (bad global variables!)
   6580       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   6581       1.07    attempt to fix C++ warning/errors again
   6582       1.06    attempt to fix C++ warning/errors again
   6583       1.05    fix TGA loading to return correct *comp and use good luminance calc
   6584       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   6585       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   6586       1.02    support for (subset of) HDR files, float interface for preferred access to them
   6587       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   6588               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   6589       1.00    interface to zlib that skips zlib header
   6590       0.99    correct handling of alpha in palette
   6591       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   6592       0.97    jpeg errors on too large a file; also catch another malloc failure
   6593       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   6594       0.95    during header scan, seek to markers in case of padding
   6595       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   6596       0.93    handle jpegtran output; verbose errors
   6597       0.92    read 4,8,16,24,32-bit BMP files of several formats
   6598       0.91    output 24-bit Windows 3.0 BMP files
   6599       0.90    fix a few more warnings; bump version number to approach 1.0
   6600       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   6601       0.60    fix compiling as c++
   6602       0.59    fix warnings: merge Dave Moore's -Wall fixes
   6603       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   6604       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   6605       0.56    fix bug: zlib uncompressed mode len vs. nlen
   6606       0.55    fix bug: restart_interval not initialized to 0
   6607       0.54    allow NULL for 'int *comp'
   6608       0.53    fix bug in png 3->4; speedup png decoding
   6609       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   6610       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   6611               on 'test' only check type, not whether we support this variant
   6612       0.50  (2006-11-19)
   6613               first released version
   6614 */