Conversation
📝 WalkthroughWalkthroughThis pull request optimizes constant collection folding in compiler codegen, reorders closure cell variable emission to match CPython semantics, restructures context manager and method-call stack protocols, refines static attribute detection in classes, and updates corresponding stack effect metadata in the bytecode instruction set. Changes
Sequence Diagram(s)sequenceDiagram
actor Compiler
participant AST as Collection Literal
participant ConstFolder as Constant Folder
participant CodeGen as Code Generator
participant Bytecode as Bytecode Stream
Compiler->>AST: Parse tuple/list/set literal
Compiler->>ConstFolder: scan_for_constants()
alt All elements are constant (no stars)
ConstFolder->>ConstFolder: try_fold_constant_collection()
ConstFolder-->>CodeGen: ConstantData::Tuple
alt Tuple (stack empty, len ≥ 3)
CodeGen->>Bytecode: emit LoadConst(folded_tuple)
else List/Set
CodeGen->>Bytecode: emit BuildList/BuildSet(0)
CodeGen->>Bytecode: emit LoadConst(folded_tuple)
CodeGen->>Bytecode: emit ListExtend/SetUpdate(1)
end
else Contains variables or stars
ConstFolder-->>CodeGen: proceed to normal codegen
CodeGen->>Bytecode: emit standard BUILD_*/unpacking ops
end
sequenceDiagram
actor VM as Virtual Machine
participant Stack
participant LoadSpecial as LoadSpecial (Old)
participant LoadSpecial_New as LoadSpecial (New)
participant CallProto as CALL Protocol
rect rgba(200, 150, 150, 0.5)
Note over VM,CallProto: OLD: Single bound method
VM->>LoadSpecial: execute LoadSpecial
LoadSpecial->>Stack: create PyBoundMethod
LoadSpecial->>Stack: push bound
Stack-->>CallProto: [bound]
CallProto->>VM: invoke bound method
end
rect rgba(150, 200, 150, 0.5)
Note over VM,CallProto: NEW: Callable + self_or_null pair
VM->>LoadSpecial_New: execute LoadSpecial
LoadSpecial_New->>Stack: extract callable (func or attr)
LoadSpecial_New->>Stack: extract self_or_null (bound target or NULL)
Stack-->>CallProto: [callable, self_or_null]
CallProto->>VM: invoke with CALL convention
end
sequenceDiagram
actor Compiler
participant ExitHandler as With Exit Handler
participant Stack
participant VM as Virtual Machine
rect rgba(150, 150, 200, 0.5)
Note over Compiler,VM: NEW: With/Async With Exit
Compiler->>Stack: setup context (push TOS for exit)
Compiler->>Stack: push exit_func
Compiler->>Stack: push self_or_null (or NULL)
Stack-->>ExitHandler: [TOS-5=exit_func, TOS-4=self, TOS-3..0=exc]
ExitHandler->>VM: emit WithExceptStart (stack layout aware)
VM->>VM: conditionally prepend self_exit
VM->>VM: call exit_func(tp, val, tb) or exit(self, tp, val, tb)
VM->>Stack: pop and restore layout
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3d4dbbf to
1cb8325
Compare
LOAD_SPECIAL now pushes (callable, self_or_null) matching CPython's CALL convention, instead of a single bound method: - Function descriptors: push (func, self) - Plain attributes: push (bound, NULL) Updated all with-statement paths: - Entry: add SWAP 3 after SWAP 2, remove PUSH_NULL before CALL 0 - Normal exit: remove PUSH_NULL before CALL 3 - Exception handler (WITH_EXCEPT_START): read exit_func at TOS-4 and self_or_null at TOS-3 - Suppress block: 3 POP_TOPs after POP_EXCEPT (was 2) - FBlock exit (preserve_tos): SWAP 3 + SWAP 2 rotation - UnwindAction::With: remove PUSH_NULL Stack effects updated: LoadSpecial (2,1), WithExceptStart (7,6)
Add LOAD_FAST_CHECK → LOAD_FAST and JUMP_BACKWARD_NO_INTERRUPT → JUMP_BACKWARD to opname normalization in dis_dump.py. These are optimization variants with identical semantics.
…WARD_NO_INTERRUPT
Skip duplicate_end_returns for blocks that already end with LOAD_CONST + RETURN_VALUE. Run DCE + unreachable elimination after duplication to remove the now-unreachable original return block.
- Support tuple/list unpacking targets: (self.x, self.y) = val - Skip @staticmethod and @classmethod decorated methods - Use scan_target_for_attrs helper for recursive target scanning
Function-local imports (scope is Local+IMPORTED) should use method mode LOAD_ATTR like regular names, not plain mode. Only module/class scope imports use plain LOAD_ATTR + PUSH_NULL.
Convert BUILD_LIST/SET 0 + LOAD_CONST + LIST_EXTEND/SET_UPDATE + GET_ITER to just LOAD_CONST (tuple) + GET_ITER, matching CPython's optimization for constant list/set literals in for-loop iterables. Also fix is_name_imported to use method mode for function-local imports, and improve __static_attributes__ accuracy (skip @classmethod/@staticmethod, handle tuple/list unpacking targets).
CPython orders cell variables with parameter cells first (in parameter definition order), then non-parameter cells sorted alphabetically. Previously all cells were sorted alphabetically. Also add for-loop iterable optimization: constant BUILD_LIST/SET before GET_ITER is folded to just LOAD_CONST tuple.
CPython emits COPY_FREE_VARS first, then MAKE_CELL instructions. Previously RustPython emitted them in reverse order.
CPython 3.14 uses RESUME arg=5 for after-yield, not 1. Also reorder COPY_FREE_VARS before MAKE_CELL and fix cell variable ordering (parameters first, then alphabetical).
- Set is_generator flag for generator expressions in scan_comprehension - Fix posonlyargs priority in collect_static_attributes first param - Add match statement support to scan_store_attrs - Fix stale decorator stack comment - Reorder NOP removal after fold_unary_negative for better collection folding
When all elements of a list/set/tuple literal are constants and there are 3+ elements, fold them into a single constant: - list: BUILD_LIST 0 + LOAD_CONST (tuple) + LIST_EXTEND 1 - set: BUILD_SET 0 + LOAD_CONST (tuple) + SET_UPDATE 1 - tuple: LOAD_CONST (tuple) This matches CPython's compiler optimization and fixes the most common bytecode difference (92/200 sampled files). Also add bytecode comparison scripts (dis_dump.py, compare_bytecode.py) for systematic parity tracking.
Match CPython's compiler behavior: dicts with 16+ key-value pairs use BUILD_MAP 0 followed by MAP_ADD for each pair, instead of pushing all keys/values on the stack and calling BUILD_MAP N.
89d6818 to
cab59e7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/codegen/src/compile.rs (1)
736-742:⚠️ Potential issue | 🟡 MinorLimit the enclosing import scan to module/class scopes.
Lines 736-742 currently treat any imported name in an outer function as if it were a module/class-scope import. For
def outer(): import m; def inner(): return m.f(),innerwill take the imported-name path again even thoughmis a closure-captured local, not a global/module binding.Possible fix
- self.symbol_table_stack.iter().rev().skip(1).any(|table| { - table - .symbols - .get(name) - .is_some_and(|sym| sym.flags.contains(SymbolFlags::IMPORTED)) - }) + self.symbol_table_stack.iter().rev().skip(1).any(|table| { + matches!(table.typ, CompilerScope::Module | CompilerScope::Class) + && table + .symbols + .get(name) + .is_some_and(|sym| sym.flags.contains(SymbolFlags::IMPORTED)) + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 736 - 742, The enclosing-scope scan in compile.rs currently treats any imported symbol found in an outer function as a module-level import; change the iteration over self.symbol_table_stack to only consider module- or class-level scopes (e.g., stop/skipping once you hit a non-module/class scope) instead of scanning all outer tables. Concretely, replace the current .iter().rev().skip(1).any(...) with a loop or iterator that filters tables by their scope kind (check the symbol table's scope/kind field or method on the symbol table) and only checks tables where the scope is Module or Class before testing symbols.get(name).is_some_and(|sym| sym.flags.contains(SymbolFlags::IMPORTED)). This ensures closure-captured imports in outer functions are not treated as module/class imports.crates/codegen/src/ir.rs (1)
2036-2050:⚠️ Potential issue | 🟡 MinorComment claims a check that isn't implemented.
The comment on line 2038 states "AND the return instructions have no explicit line number (lineno <= 0)", but the
is_return_blockcheck below (lines 2039-2047) only verifies the block length and instruction types—it doesn't checklineno_overrideorlocation.line.If the lineno check is required for CPython parity, please add it:
Possible fix if lineno check is needed
let is_return_block = last_insts.len() == 2 && matches!( last_insts[0].instr, AnyInstruction::Real(Instruction::LoadConst { .. }) ) && matches!( last_insts[1].instr, AnyInstruction::Real(Instruction::ReturnValue) - ); + ) + && last_insts[0].lineno_override.unwrap_or(last_insts[0].location.line.get() as i32) <= 0;Otherwise, please update the comment to accurately reflect what the code checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 2036 - 2050, The comment claims the code should only apply when the return instructions have no explicit line number (lineno <= 0) but the is_return_block predicate only checks instruction count and types; update the predicate in the block containing last_insts and is_return_block to also verify the first instruction's line info (e.g., check last_insts[0].lineno_override <= 0 or last_insts[0].location.line <= 0 as appropriate for this IR) so the condition matches the comment, or if that lineno check is not required for parity, change the comment to accurately describe that is_return_block only checks length and instruction kinds; reference last_insts, is_return_block, lineno_override and location.line to locate the exact variables to modify.crates/compiler-core/src/bytecode/oparg.rs (1)
289-311:⚠️ Potential issue | 🔴 CriticalThe
ResumeType::AfterYieldencoding is incomplete for CPython 3.14 compatibility.According to CPython 3.14's RESUME opcode specification, the oparg encoding uses the lowest 2 bits for the location (0–3) and bit 2 (value 4) for except-depth. This yields valid values 0–7:
- 0–3: Base locations (AtFuncStart=0, AfterYield=1, AfterYieldFrom=2, AfterAwait=3) at except-depth 0
- 4–7: Same locations at except-depth 1
Your current mapping encodes
AfterYieldas5(which is 1 + 4, representing AfterYield at except-depth 1), but the code doesn't handle AfterYield at except-depth 0 (oparg=1). Additionally, it lacks mappings forAtFuncStart(4),AfterYieldFrom(6), andAfterAwait(7) at except-depth 1.Either the encoding needs to distinguish except-depth explicitly in
ResumeType, or the current hardcoded mappings should reflect that reality.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 289 - 311, The mapping for ResumeType must support CPython 3.14's 3-bit oparg (low 2 bits = location 0–3, bit 2 = except-depth), so update ResumeType and its conversions: change the enum (ResumeType) to carry an except_depth flag (e.g., a bool or u8) or add separate variants that include except-depth, then modify impl From<u32> for ResumeType to decode value & 0b11 into AtFuncStart/AfterYield/AfterYieldFrom/AfterAwait and (value & 0b100) >> 2 into except-depth; likewise update impl From<ResumeType> for u32 to emit (location as 0..3) | (except_depth << 2) and preserve Other(v) for unknown values. Ensure all places referencing ResumeType are adjusted to the new shape.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)
2957-2965: Collapse the duplicatedLoadSpecialpush path.Both success arms only differ in the produced
(callable, self_or_null)pair. Computing that pair first keeps the new CALL convention in one place and removes the duplicated push sequence.♻️ Suggested cleanup
- match vm.get_special_method(&obj, method_name)? { - Some(PyMethod::Function { target, func }) => { - self.push_value(func); // callable (deeper) - self.push_value(target); // self (TOS) - } - Some(PyMethod::Attribute(bound)) => { - self.push_value(bound); // callable (deeper) - self.push_null(); // NULL (TOS) - } + let (callable, self_or_null) = match vm.get_special_method(&obj, method_name)? { + Some(PyMethod::Function { target, func }) => (func, Some(target)), + Some(PyMethod::Attribute(bound)) => (bound, None), None => { return Err(vm.new_type_error(get_special_method_error_msg( oparg, &obj.class().name(), special_method_can_suggest(&obj, oparg, vm)?, ))); } - }; + }; + self.push_value(callable); // callable (deeper) + self.push_value_opt(self_or_null); // self or NULL (TOS)As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2957 - 2965, Replace the duplicated push sequence by first computing the (callable, self_or_null) pair from the result of vm.get_special_method(&obj, method_name) — for PyMethod::Function produce (func, target) and for PyMethod::Attribute produce (bound, Null) — and then perform the common push logic once (calling self.push_value for the callable and either self.push_value or self.push_null for the self_or_null depending on the pair). Update the match over vm.get_special_method to return this pair (or an Option of it) and then call the common push sequence using self.push_value/self.push_null so the push logic is not duplicated; keep references to PyMethod::Function, PyMethod::Attribute, self.push_value, self.push_null, vm.get_special_method, method_name, and obj to locate the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 8557-8573: Docstrings are still being converted lossily because
split_doc() and clean_doc(doc.to_str()) operate on the lossy UTF-8 string; fix
by making docstring handling follow the same non-lossy path as
compile_string_value(): when splitting/cleaning docs, detect replacement
characters and, instead of using doc.to_str(), reparse the original source
range(s) with crate::string_parser::parse_string_literal (or call
compile_string_value on the underlying ast::ExprStringLiteral) to produce a
Wtf8Buf preserving surrogates, then pass that Wtf8Buf into split_doc/clean_doc
logic (or change clean_doc to accept Wtf8Buf) so module/class/function
docstrings retain literal parity with compile_string_value.
- Around line 1071-1095: The code is misclassifying parameter cellvars by
checking varname_cache.contains(&name); instead, classify a cell symbol as a
parameter by inspecting its SymbolFlags::PARAMETER on the original symbol entry
(the map iterated as ste.symbols) and push names into param_cells only when that
flag is present, otherwise push into nonparam_cells; keep the subsequent
ordering logic (sort param_cells by varname_cache.get_index_of(...) and
nonparam_cells.sort()) and then insert into cellvar_cache as before (referencing
ste.symbols, SymbolFlags::PARAMETER, varname_cache, param_cells, nonparam_cells,
and cellvar_cache).
---
Outside diff comments:
In `@crates/codegen/src/compile.rs`:
- Around line 736-742: The enclosing-scope scan in compile.rs currently treats
any imported symbol found in an outer function as a module-level import; change
the iteration over self.symbol_table_stack to only consider module- or
class-level scopes (e.g., stop/skipping once you hit a non-module/class scope)
instead of scanning all outer tables. Concretely, replace the current
.iter().rev().skip(1).any(...) with a loop or iterator that filters tables by
their scope kind (check the symbol table's scope/kind field or method on the
symbol table) and only checks tables where the scope is Module or Class before
testing symbols.get(name).is_some_and(|sym|
sym.flags.contains(SymbolFlags::IMPORTED)). This ensures closure-captured
imports in outer functions are not treated as module/class imports.
In `@crates/codegen/src/ir.rs`:
- Around line 2036-2050: The comment claims the code should only apply when the
return instructions have no explicit line number (lineno <= 0) but the
is_return_block predicate only checks instruction count and types; update the
predicate in the block containing last_insts and is_return_block to also verify
the first instruction's line info (e.g., check last_insts[0].lineno_override <=
0 or last_insts[0].location.line <= 0 as appropriate for this IR) so the
condition matches the comment, or if that lineno check is not required for
parity, change the comment to accurately describe that is_return_block only
checks length and instruction kinds; reference last_insts, is_return_block,
lineno_override and location.line to locate the exact variables to modify.
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 289-311: The mapping for ResumeType must support CPython 3.14's
3-bit oparg (low 2 bits = location 0–3, bit 2 = except-depth), so update
ResumeType and its conversions: change the enum (ResumeType) to carry an
except_depth flag (e.g., a bool or u8) or add separate variants that include
except-depth, then modify impl From<u32> for ResumeType to decode value & 0b11
into AtFuncStart/AfterYield/AfterYieldFrom/AfterAwait and (value & 0b100) >> 2
into except-depth; likewise update impl From<ResumeType> for u32 to emit
(location as 0..3) | (except_depth << 2) and preserve Other(v) for unknown
values. Ensure all places referencing ResumeType are adjusted to the new shape.
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 2957-2965: Replace the duplicated push sequence by first computing
the (callable, self_or_null) pair from the result of vm.get_special_method(&obj,
method_name) — for PyMethod::Function produce (func, target) and for
PyMethod::Attribute produce (bound, Null) — and then perform the common push
logic once (calling self.push_value for the callable and either self.push_value
or self.push_null for the self_or_null depending on the pair). Update the match
over vm.get_special_method to return this pair (or an Option of it) and then
call the common push sequence using self.push_value/self.push_null so the push
logic is not duplicated; keep references to PyMethod::Function,
PyMethod::Attribute, self.push_value, self.push_null, vm.get_special_method,
method_name, and obj to locate the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: f61ef7aa-f8d3-41be-883f-a6cf46ac67fc
⛔ Files ignored due to path filters (4)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ands.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_mixed.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ors.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/codegen/src/symboltable.rscrates/compiler-core/src/bytecode/instruction.rscrates/compiler-core/src/bytecode/oparg.rscrates/vm/src/frame.rs
| let cell_symbols: Vec<_> = ste | ||
| .symbols | ||
| .iter() | ||
| .filter(|(_, s)| { | ||
| s.scope == SymbolScope::Cell || s.flags.contains(SymbolFlags::COMP_CELL) | ||
| }) | ||
| .map(|(name, _)| name.clone()) | ||
| .collect(); | ||
| cell_names.sort(); | ||
| for name in cell_names { | ||
| let mut param_cells = Vec::new(); | ||
| let mut nonparam_cells = Vec::new(); | ||
| for name in cell_symbols { | ||
| if varname_cache.contains(&name) { | ||
| param_cells.push(name); | ||
| } else { | ||
| nonparam_cells.push(name); | ||
| } | ||
| } | ||
| // param_cells are already in parameter order (from varname_cache insertion order) | ||
| param_cells.sort_by_key(|n| varname_cache.get_index_of(n.as_str()).unwrap_or(usize::MAX)); | ||
| nonparam_cells.sort(); | ||
| for name in param_cells { | ||
| cellvar_cache.insert(name); | ||
| } | ||
| for name in nonparam_cells { | ||
| cellvar_cache.insert(name); |
There was a problem hiding this comment.
Detect parameter cells from PARAMETER, not from varname_cache.
Line 1082 is using varname_cache.contains(&name), but varname_cache contains every local, not just arguments. That means ordinary captured locals also land in param_cells, so non-parameter cellvars keep local-definition order instead of the advertised alphabetical order. Because cellvars are indexed positionally later, this still shifts co_cellvars/localsplus metadata away from CPython.
Possible fix
- let cell_symbols: Vec<_> = ste
+ let cell_symbols: Vec<_> = ste
.symbols
.iter()
.filter(|(_, s)| {
s.scope == SymbolScope::Cell || s.flags.contains(SymbolFlags::COMP_CELL)
})
- .map(|(name, _)| name.clone())
+ .map(|(name, sym)| (name.clone(), sym.flags))
.collect();
let mut param_cells = Vec::new();
let mut nonparam_cells = Vec::new();
- for name in cell_symbols {
- if varname_cache.contains(&name) {
+ for (name, flags) in cell_symbols {
+ if flags.contains(SymbolFlags::PARAMETER) {
param_cells.push(name);
} else {
nonparam_cells.push(name);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let cell_symbols: Vec<_> = ste | |
| .symbols | |
| .iter() | |
| .filter(|(_, s)| { | |
| s.scope == SymbolScope::Cell || s.flags.contains(SymbolFlags::COMP_CELL) | |
| }) | |
| .map(|(name, _)| name.clone()) | |
| .collect(); | |
| cell_names.sort(); | |
| for name in cell_names { | |
| let mut param_cells = Vec::new(); | |
| let mut nonparam_cells = Vec::new(); | |
| for name in cell_symbols { | |
| if varname_cache.contains(&name) { | |
| param_cells.push(name); | |
| } else { | |
| nonparam_cells.push(name); | |
| } | |
| } | |
| // param_cells are already in parameter order (from varname_cache insertion order) | |
| param_cells.sort_by_key(|n| varname_cache.get_index_of(n.as_str()).unwrap_or(usize::MAX)); | |
| nonparam_cells.sort(); | |
| for name in param_cells { | |
| cellvar_cache.insert(name); | |
| } | |
| for name in nonparam_cells { | |
| cellvar_cache.insert(name); | |
| let cell_symbols: Vec<_> = ste | |
| .symbols | |
| .iter() | |
| .filter(|(_, s)| { | |
| s.scope == SymbolScope::Cell || s.flags.contains(SymbolFlags::COMP_CELL) | |
| }) | |
| .map(|(name, sym)| (name.clone(), sym.flags)) | |
| .collect(); | |
| let mut param_cells = Vec::new(); | |
| let mut nonparam_cells = Vec::new(); | |
| for (name, flags) in cell_symbols { | |
| if flags.contains(SymbolFlags::PARAMETER) { | |
| param_cells.push(name); | |
| } else { | |
| nonparam_cells.push(name); | |
| } | |
| } | |
| // param_cells are already in parameter order (from varname_cache insertion order) | |
| param_cells.sort_by_key(|n| varname_cache.get_index_of(n.as_str()).unwrap_or(usize::MAX)); | |
| nonparam_cells.sort(); | |
| for name in param_cells { | |
| cellvar_cache.insert(name); | |
| } | |
| for name in nonparam_cells { | |
| cellvar_cache.insert(name); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/compile.rs` around lines 1071 - 1095, The code is
misclassifying parameter cellvars by checking varname_cache.contains(&name);
instead, classify a cell symbol as a parameter by inspecting its
SymbolFlags::PARAMETER on the original symbol entry (the map iterated as
ste.symbols) and push names into param_cells only when that flag is present,
otherwise push into nonparam_cells; keep the subsequent ordering logic (sort
param_cells by varname_cache.get_index_of(...) and nonparam_cells.sort()) and
then insert into cellvar_cache as before (referencing ste.symbols,
SymbolFlags::PARAMETER, varname_cache, param_cells, nonparam_cells, and
cellvar_cache).
| /// Convert a string literal AST node to Wtf8Buf, handling surrogates correctly. | ||
| fn compile_string_value(&self, string: &ast::ExprStringLiteral) -> Wtf8Buf { | ||
| let value = string.value.to_str(); | ||
| if value.contains(char::REPLACEMENT_CHARACTER) { | ||
| // Might have a surrogate literal; reparse from source to preserve them | ||
| string | ||
| .value | ||
| .iter() | ||
| .map(|lit| { | ||
| let source = self.source_file.slice(lit.range); | ||
| crate::string_parser::parse_string_literal(source, lit.flags.into()) | ||
| }) | ||
| .collect() | ||
| } else { | ||
| value.into() | ||
| } | ||
| } |
There was a problem hiding this comment.
This still leaves surrogate-bearing docstrings on the lossy path.
compile_string_value() fixes regular string literals, but docstrings still go through split_doc() and clean_doc(doc.to_str()). So a module/class/function docstring containing an isolated surrogate will still be normalized to U+FFFD, which means literal parity is only partial after this change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/compile.rs` around lines 8557 - 8573, Docstrings are still
being converted lossily because split_doc() and clean_doc(doc.to_str()) operate
on the lossy UTF-8 string; fix by making docstring handling follow the same
non-lossy path as compile_string_value(): when splitting/cleaning docs, detect
replacement characters and, instead of using doc.to_str(), reparse the original
source range(s) with crate::string_parser::parse_string_literal (or call
compile_string_value on the underlying ast::ExprStringLiteral) to produce a
Wtf8Buf preserving surrogates, then pass that Wtf8Buf into split_doc/clean_doc
logic (or change clean_doc to accept Wtf8Buf) so module/class/function
docstrings retain literal parity with compile_string_value.
Summary by CodeRabbit
Performance Improvements
Bug Fixes
Code Quality