- no longer pass the array length to `utf8_decode`
- add `utf8_decode_len` for border cases
- use switch based dispatch in `utf8_decode_len` to work around a gcc 12.2 optimizer bug
integer conversions:
- improve `u32toa_radix` and `u64toa_radix`, add `i32toa_radix`
- use `i32toa_radix` for small ints in `js_number_toString`
floating point conversions (`js_dtoa`):
- complete rewrite with fewer calls to `snprintf`
- remove `JS_DTOA_FORMAT`, define 4 possible modes for `js_dtoa`
- remove the radix argument in `js_dtoa`
- merge `js_dtoa1` into `js_dtoa`
- add `js_dtoa_infinite` for non finite values
- simplify sign handling
- handle locale specific decimal point transparently
helper function `js_fcvt`:
- simplify `js_fcvt`, remove `js_fcvt1`, reduce overhead
- round up manually instead of using `fesetround(FE_UPWARD)`.
helper function `js_ecvt`:
- document `js_ecvt` and `js_ecvt1` behavior
- avoid redundant `js_ecvt1` calls in `js_ecvt`
- fixed buffer contents, no buffer copies
- simplify decimal point handling
- round up manually instead of using `fesetround(FE_UPWARD)`.
miscellaneous:
- remove `CONFIG_PRINTF_RNDN`. This fixes some of the conversion errors
on Windows. Updated the tests accordingly
- this fixes a v8.sh bug on macOS: `0.5.toFixed(0)` used to produce `0` instead of `1`
- add regression tests, update test_conv unit tests
- add benchmarks for `toFixed`, `toPrecision` and `toExponential` number methods
- benchmarks show all conversions are now 40 to 45% faster (M2)
Ensure proper UTF-8 encoding (1 to 4 bytes).
Handle invalid encodings (return 0xFFFD and consume a single byte)
Individually encoded surrogate code points are accepted.
- add `utf8_scan()` to analyze a byte array for UTF-8 contents
detects invalid encoding, computes number of codepoints and content kind:
plain ASCII, 8-bit, 16-bit or larger codepoints.
- add `utf8_encode_len(c)` to compute the number of bytes to encode `c`
- rename `unicode_to_utf8` as `utf8_encode`
- rename `unicode_from_utf8` as `utf8_decode`
- add `utf8_decode_buf8(dest, size, src, len)` to decode a UTF-8 encoded
byte array known to contain only ASCII and 8-bit codepoints.
- add `utf8_decode_buf16(dest, size, src, len)` to decode a UTF-8 encoded
byte array into an array of 16-bit codepoints using UTF-16 surrogate pairs
for non-BMP1 codepoints.
- add `utf8_encode_buf8(dest, size, src, len)` to encode an array of 8-bit
codepoints as a UTF-8 encoded null terminated string
- add `utf16_encode_buf8(dest, size, src, len)` to decode an array of 16-bit
codepoints (including surrogate pairs) as a UTF-8 encoded null terminated string
- detect invalid UTF-8 encoding in RegExp parser
- simplify `JS_AtomGetStrRT`, `JS_NewStringLen` using the above functions
- simplify UTF-8 decoding and error testing
* Add utility functions, improve integer conversion functions
- move `is_be()` to cutils.h
- add `is_upper_ascii()` and `to_upper_ascii()`
- add extensive benchmark for integer conversion variants in **tests/test_conv.c**
- add `u32toa()`, `i32toa()`, `u64toa()`, `i64toa()` based on register shift variant
- add `u32toa_radix()`, `u64toa_radix()`, `i64toa_radix()` based on length_loop variant
- use direct converters instead of `snprintf()`
- copy NaN and Infinity directly in `js_dtoa1()`
- optimize `js_number_toString()` for small integers
- use `JS_NewStringLen()` instead of `JS_NewString()` when possible
- add more precise conversion tests in microbench.js
- disable some benchmark tests for gcc (they cause ASAN failures)