quickjs

Author	SHA1	Message	Date
Charlie Gordon	921c1eef50	Simpler utf8_decode (#414 ) - no longer pass the array length to `utf8_decode` - add `utf8_decode_len` for border cases - use switch based dispatch in `utf8_decode_len` to work around a gcc 12.2 optimizer bug	2024-05-27 08:15:52 +02:00
Charlie Gordon	1baa6763f8	Improve UTF-8 decoding and encoding functions (#410 ) Ensure proper UTF-8 encoding (1 to 4 bytes). Handle invalid encodings (return 0xFFFD and consume a single byte) Individually encoded surrogate code points are accepted. - add `utf8_scan()` to analyze a byte array for UTF-8 contents detects invalid encoding, computes number of codepoints and content kind: plain ASCII, 8-bit, 16-bit or larger codepoints. - add `utf8_encode_len(c)` to compute the number of bytes to encode `c` - rename `unicode_to_utf8` as `utf8_encode` - rename `unicode_from_utf8` as `utf8_decode` - add `utf8_decode_buf8(dest, size, src, len)` to decode a UTF-8 encoded byte array known to contain only ASCII and 8-bit codepoints. - add `utf8_decode_buf16(dest, size, src, len)` to decode a UTF-8 encoded byte array into an array of 16-bit codepoints using UTF-16 surrogate pairs for non-BMP1 codepoints. - add `utf8_encode_buf8(dest, size, src, len)` to encode an array of 8-bit codepoints as a UTF-8 encoded null terminated string - add `utf16_encode_buf8(dest, size, src, len)` to decode an array of 16-bit codepoints (including surrogate pairs) as a UTF-8 encoded null terminated string - detect invalid UTF-8 encoding in RegExp parser - simplify `JS_AtomGetStrRT`, `JS_NewStringLen` using the above functions - simplify UTF-8 decoding and error testing	2024-05-21 14:08:33 +02:00
Charlie Gordon	5abbeacc62	Fix bug in GET_PREV_CHAR macro (#278 ) * Fix bug in `GET_PREV_CHAR` macro - pass `cbuf_type` variable to `XXX_CHAR` macros in `lre_exec_backtrack()` - improve readability of these macros - fix `GET_PREV_CHAR` macro: `cptr` was decremented twice on invalid high surrogate. - minimize non functional changes	2024-03-03 17:12:52 +01:00
Charlie Gordon	708dbcbf5b	Fix big endian serialization (#269 ) * Fix big endian serialization Big endian serialization was broken because: - it partially relied on `WORDS_ENDIAN` (unconditionally undef'd in cutils.h) - endianness was not handled at all in the bc reader. - `bc_tag_str` was missing the `"RegExp"` string - `lre_byte_swap()` was broken for `REOP_range` and `REOP_range32` Modifications: - remove `WORDS_ENDIAN` - use `bc_put_u32()` / `bc_put_u64()` in `JS_WriteBigInt()` - use `bc_get_u32()` / `bc_get_u64()` in `JS_ReadBigInt()` - handle host endianness in `bc_get_u16()`, `bc_get_u32()`, `bc_get_u64()` and `JS_ReadFunctionBytecode()` - handle optional littleEndian argument as specified in `js_dataview_getValue()` and `js_dataview_setValue()` - fix `bc_tag_str` and `lre_byte_swap()`	2024-03-02 18:38:29 +01:00
Ben Noordhuis	f406d6f78c	Accept /[\-]/u as a valid regular expression (#288 ) The non-Unicode version of the pattern was already accepted. test262 tests it in an inverted sense in test/built-ins/RegExp/unicode_restricted_identity_escape.js but it appears to be per spec and both V8 and Spidermonkey accept it. Fixes: https://github.com/quickjs-ng/quickjs/issues/286	2024-03-02 13:29:15 +01:00
Ben Noordhuis	f0ef9e1593	Implement RegExp 'v' flag, part 1 (#229 ) This commit implements the flag itself and teaches the regex engine to reject previously accepted patterns when in unicodeSets mode. Refs: https://github.com/quickjs-ng/quickjs/issues/228	2023-12-21 19:37:31 +01:00
Ben Noordhuis	f6ed206bd5	Change regexp flags field from uint8 to uint16 (#185 ) I need the extra bits to store the 'v' flag as described in https://github.com/tc39/proposal-regexp-v-flag	2023-12-09 16:47:05 +01:00
Ben Noordhuis	f7d2169999	Rename LRE_FLAG_UTF16 to LRE_FLAG_UNICODE (#186 ) Prep work for https://github.com/tc39/proposal-regexp-v-flag a.k.a. UnicodeSets.	2023-12-08 10:58:00 +01:00
Ben Noordhuis	42b708622c	Use named constant for regexp bytecode size field (#183 )	2023-12-07 23:00:32 +01:00
Linus Groh	3b034b84d9	Fix null pointer arithmetic UB in libregexp (#136 ) This is a patch I originally wrote for the Kiesel JS engine: https://codeberg.org/kiesel-js/kiesel/src/branch/main/patches/libregexp.patch	2023-11-29 14:43:02 +01:00
Ben Noordhuis	5c3077e091	Implement RegExp serialization (#153 ) JS_WriteObject() and JS_ReadObject() now support RegExp objects.	2023-11-29 08:50:53 +01:00
Saúl Ibarra Corretgé	a721bda7b5	Drop CONFIG_ALL_UNICODE and enable it by default	2023-11-20 10:52:04 +01:00
Ben Noordhuis	bef2a12566	DRY surrogate pair handling (#95 )	2023-11-20 09:46:02 +01:00
Ben Noordhuis	d1960d1bfe	Implement RegExp 'd' flag (#86 )	2023-11-20 09:45:44 +01:00
Ben Noordhuis	e2bc6441f8	Optimize RegExp ASCII literal matching (#94 ) Add REOP_char8 that matches single bytes. Compresses bytecode for the ASCII common case by 33% and reduces regexp_ascii benchmark running time by 4%. The regexp_utf16 benchmark is unaffected.	2023-11-19 17:26:45 +01:00
Ben Noordhuis	b56cbb143c	Implement extended named capture group identifiers (#90 ) Perfectly reasonable and not at all uncommon regular expressions like /(?<𝑓𝑜𝑥>fox).*(?<𝓓𝓸𝓰>dog)/ are now accepted.	2023-11-19 11:01:38 +01:00
Ben Noordhuis	162a8b7409	Remove trailing whitespace (#46 ) Not purely cosmetic because it breaks navigation with { and } in the One True Editor.	2023-11-12 10:01:40 +01:00
bellard	b1f67dfc1a	2020-11-08 release	2020-11-08 14:30:56 +01:00
bellard	7c312df422	2020-09-06 release	2020-09-06 19:10:15 +02:00
bellard	8900766099	2020-07-05 release	2020-09-06 19:07:30 +02:00
bellard	383e2b06c8	2020-03-16 release	2020-09-06 19:02:03 +02:00
bellard	91459fb672	2020-01-05 release	2020-09-06 18:53:08 +02:00

22 commits