Handle BIGINT_ERROR failure on DOM Parse #2642
harshanagd
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @lemire @jkeiser — I'd appreciate your thoughts on this approach.
Context: I'm building go-simdjson, a Go wrapper around simdjson's DOM API via CGo.
BIGINT_ERRORis the only case where valid JSON (per RFC 8259, which places no limit on number magnitude) causes a DOM parse failure. Every other error code represents genuinely malformed input or aresource limit. On Demand users can recover via
raw_json_token(), but DOM users have no recovery path — the parse fails entirely on documents containing big integers.What #2640 does: When
parser.number_as_string(true)is set,visit_numbercatchesBIGINT_ERRORand writes the raw digits to the string buffer with aBIGINT = 'Z'tape tag. Users read it back viaelement.get_bigint()which returns astring_view. Default behavior is completely unchanged — zero cost when disabled.Design decisions I'd like your input on:
Runtime flag vs compile-time
#ifdef: I used a runtimeparser.number_as_string(bool)for ergonomics, but I know the project prefers compile-time options in hot paths. The check is only on the error path (BIGINT_ERRORalready returned), so the branch predictor should always skip it. Would you prefer#ifdef SIMDJSON_NUMBER_AS_STRINGinstead?Re-scanning digits: After
parse_numberreturnsBIGINT_ERROR, I re-scan the digits withis_digit()to determine the length.parse_numberalready computed this but doesn't expose it. I considered passingthe digit count back, but that would change
parse_number's return type and affect On Demand. Is there a better way, or is the re-scan acceptable given big integers are rare and the data is L1-hot?API naming: I used
get_bigint()returningstring_viewandelement_type::BIGINT. Open to alternatives —get_number_string(),get_raw_number(), etc. What feels right for the project?Scope: Should this also handle numbers-as-strings more broadly (e.g., very large floats), or is big integer support sufficient for now?
Beta Was this translation helpful? Give feedback.
All reactions