Conversation
📝 WalkthroughWalkthroughThis PR modifies bytecode compilation and runtime VM logic in RustPython. Changes include: refining class static attribute scanning with decorator awareness, relocating async-generator yield wrapping from runtime to compile-time, adjusting call argument construction and comprehension call protocols, filtering parameter cells from variable resolution, and adding safety checks for missing comparison slots. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5e65683 to
57f44c6
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/codegen/src/compile.rs (2)
4596-4611:⚠️ Potential issue | 🟡 MinorMove the
__new__special-case above the first-parameter guard.Line 4602 still exits before the
__new__branch runs, sodef __new__(*args, **kwargs): self.x = 1will never contribute to__static_attributes__. If__new__is meant to key off the literalself, that branch needs to run before looking for a positional parameter.Suggested fix
- let first_param = f - .parameters - .posonlyargs - .first() - .or(f.parameters.args.first()) - .map(|p| &p.parameter.name); - let Some(self_name) = first_param else { - continue; - }; // For __new__, first param is cls → scan for "self" instead // (CPython scans for the literal name "self" in __new__ bodies) if fname == "__new__" { Self::scan_store_attrs(&f.body, "self", attrs); - } else { - Self::scan_store_attrs(&f.body, self_name.as_str(), attrs); + continue; } + let first_param = f + .parameters + .posonlyargs + .first() + .or(f.parameters.args.first()) + .map(|p| &p.parameter.name); + let Some(self_name) = first_param else { + continue; + }; + Self::scan_store_attrs(&f.body, self_name.as_str(), attrs);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 4596 - 4611, The guard that extracts first_param (from f.parameters.posonlyargs or args) runs before handling the __new__ special-case, so functions like def __new__(*args, **kwargs): self.x = 1 are skipped; move the __new__ branch above the first_param extraction/guard so you always call Self::scan_store_attrs(&f.body, "self", attrs) when fname == "__new__". Specifically, check fname == "__new__" and call Self::scan_store_attrs with the literal "self" before using f.parameters.posonlyargs/.args and the first_param/self_name logic.
5474-5485:⚠️ Potential issue | 🔴 CriticalThis async-for loop transformation does not match CPython behavior and introduces a divergence.
CPython 3.13/3.14 does not rewrite list literals to tuples before
GET_AITERin async-for loops. The list remains a list through the entire process, and whenGET_AITERfails (e.g., forasync for x in [1]: ...), the error message reports the actual type as'list'.RustPython's current implementation converts the list to a tuple, which causes:
- Bytecode to differ from CPython's
- Error messages to report
'tuple'instead of'list'when async iteration failsThis rewrite should either be removed to match CPython's behavior, or if intentional, the rationale should be documented. If the goal is CPython parity for bytecode generation, this change needs to be reverted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 5474 - 5485, The code rewrites a list literal into a tuple before async-for iteration (the ast::Expr::List match that emits Instruction::BuildTuple), causing bytecode and error-message divergence from CPython (GET_AITER should see a list); revert this transformation so list literals are compiled as lists in async-for contexts by removing the special-case that turns ast::Expr::List into a tuple (leave compilation to self.compile_expression for the list expression and do not emit Instruction::BuildTuple here), or add a clear comment if you intentionally keep the behavior.
🧹 Nitpick comments (2)
crates/vm/src/builtins/code.rs (1)
1293-1302: Consider avoidingStringallocations by using reference-based comparison.Building a
HashSet<String>withto_string()on every call creates heap allocations. SincePyStrInternedreferences are interned and can be compared directly, you can avoid these allocations:♻️ Suggested optimization using pointer comparison
- let varnames_set: std::collections::HashSet<String> = - self.code.varnames.iter().map(|s| s.to_string()).collect(); - let nonparam_cellvars: Vec<_> = self - .code - .cellvars - .iter() - .filter(|s| !varnames_set.contains(&s.to_string())) - .collect(); + let nonparam_cellvars: Vec<_> = self + .code + .cellvars + .iter() + .filter(|cv| !self.code.varnames.iter().any(|v| *v == **cv)) + .collect();Alternatively, if the cellvars/varnames lists are large enough that O(n²) becomes a concern, consider caching the
nonparam_cellvarscomputation in thePyCodestruct.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/code.rs` around lines 1293 - 1302, The current code builds a HashSet<String> by calling to_string() on each element of self.code.varnames which causes unnecessary heap allocations; change the comparison to use references/borrowed interned strings instead (e.g., compare PyStrInterned or &str values directly) so you avoid allocating in varnames_set and when filtering self.code.cellvars for nonparam_cellvars, or alternatively compute and cache nonparam_cellvars inside PyCode to avoid repeated O(n²) work; update usages of varnames_set, nonparam_cellvars, self.code.varnames, and self.code.cellvars accordingly to use reference-based comparisons rather than String allocations.crates/codegen/src/ir.rs (1)
2029-2038: Fix doc-comment association after inserting the new helper.
duplicate_end_returnsdocs are now attached toadd_generator_stop_handler, which makes rustdoc misleading. Move the return-duplication docs back abovefn duplicate_end_returns(...)and keep only StopIteration-handler docs above this helper.Suggested doc-comment move
-/// Duplicate `LOAD_CONST None + RETURN_VALUE` for blocks that fall through -/// to the final return block. Matches CPython's behavior of ensuring every -/// code path that reaches the end of a function/module has its own explicit -/// return instruction. /// Add a StopIteration → RuntimeError handler for generators/coroutines. /// CPython: flowgraph.c add_stopiteration_handler() /// Appends a handler block that wraps the entire generator body. fn add_generator_stop_handler(_flags: CodeFlags, _blocks: &mut Vec<Block>) { @@ } +/// Duplicate `LOAD_CONST None + RETURN_VALUE` for blocks that fall through +/// to the final return block. Matches CPython's behavior of ensuring every +/// code path that reaches the end of a function/module has its own explicit +/// return instruction. fn duplicate_end_returns(blocks: &mut [Block]) {As per coding guidelines, "Do not delete or rewrite existing comments unless they are factually wrong or directly contradict the new code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 2029 - 2038, The doc comment block describing return-duplication was accidentally left above the new helper; move the duplicate_end_returns doc comment back directly above the fn duplicate_end_returns(...) declaration and leave only the StopIteration/handler documentation immediately above fn add_generator_stop_handler(...), ensuring each function has its correct doc-comment so rustdoc associates them with the intended symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/codegen/src/compile.rs`:
- Around line 4596-4611: The guard that extracts first_param (from
f.parameters.posonlyargs or args) runs before handling the __new__ special-case,
so functions like def __new__(*args, **kwargs): self.x = 1 are skipped; move the
__new__ branch above the first_param extraction/guard so you always call
Self::scan_store_attrs(&f.body, "self", attrs) when fname == "__new__".
Specifically, check fname == "__new__" and call Self::scan_store_attrs with the
literal "self" before using f.parameters.posonlyargs/.args and the
first_param/self_name logic.
- Around line 5474-5485: The code rewrites a list literal into a tuple before
async-for iteration (the ast::Expr::List match that emits
Instruction::BuildTuple), causing bytecode and error-message divergence from
CPython (GET_AITER should see a list); revert this transformation so list
literals are compiled as lists in async-for contexts by removing the
special-case that turns ast::Expr::List into a tuple (leave compilation to
self.compile_expression for the list expression and do not emit
Instruction::BuildTuple here), or add a clear comment if you intentionally keep
the behavior.
---
Nitpick comments:
In `@crates/codegen/src/ir.rs`:
- Around line 2029-2038: The doc comment block describing return-duplication was
accidentally left above the new helper; move the duplicate_end_returns doc
comment back directly above the fn duplicate_end_returns(...) declaration and
leave only the StopIteration/handler documentation immediately above fn
add_generator_stop_handler(...), ensuring each function has its correct
doc-comment so rustdoc associates them with the intended symbols.
In `@crates/vm/src/builtins/code.rs`:
- Around line 1293-1302: The current code builds a HashSet<String> by calling
to_string() on each element of self.code.varnames which causes unnecessary heap
allocations; change the comparison to use references/borrowed interned strings
instead (e.g., compare PyStrInterned or &str values directly) so you avoid
allocating in varnames_set and when filtering self.code.cellvars for
nonparam_cellvars, or alternatively compute and cache nonparam_cellvars inside
PyCode to avoid repeated O(n²) work; update usages of varnames_set,
nonparam_cellvars, self.code.varnames, and self.code.cellvars accordingly to use
reference-based comparisons rather than String allocations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 4562e2f0-aaec-4585-aaed-fcbb04e40640
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (5)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/vm/src/builtins/code.rscrates/vm/src/frame.rscrates/vm/src/protocol/object.rs
57f44c6 to
15b4c0a
Compare
15b4c0a to
40cab9c
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/codegen/src/compile.rs (1)
7919-7934: Please lock the new call lowering with a disassembly regression.Both the explicit tuple materialization for
CallFunctionExand the comprehensionCALL { argc: 0 }path are stack-shape-sensitive. A focused snapshot for a starred/**kwargscall and a comprehension invocation would make parity regressions much easier to catch.As per coding guidelines, "When modifying bytecode instructions, perform a clean build by running:
rm -r target/debug/build/rustpython-* && find . | grep -E "\.pyc$" | xargs rm -r".Also applies to: 8292-8293
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 7919 - 7934, Add a disassembly regression test that locks the new call lowering behavior for both the starred/**kwargs path (CallFunctionEx lowering) and the comprehension invocation path that emits CALL { argc: 0 }; specifically create focused snapshot tests that exercise a starred/`**kwargs` call and a comprehension call so any stack-shape-sensitive changes are caught. Locate the lowering logic around compile_expression, emit_load_const, set_source_range and the Instruction::BuildTuple / CallFunctionEx emission in compile.rs (also check the similar spots near the referenced 8292-8293 change) and add tests that assert the produced bytecode/disassembly matches the expected snapshot for both cases; run the clean build steps described in the guidelines before committing the snapshots.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/codegen/src/compile.rs`:
- Around line 7919-7934: Add a disassembly regression test that locks the new
call lowering behavior for both the starred/**kwargs path (CallFunctionEx
lowering) and the comprehension invocation path that emits CALL { argc: 0 };
specifically create focused snapshot tests that exercise a starred/`**kwargs`
call and a comprehension call so any stack-shape-sensitive changes are caught.
Locate the lowering logic around compile_expression, emit_load_const,
set_source_range and the Instruction::BuildTuple / CallFunctionEx emission in
compile.rs (also check the similar spots near the referenced 8292-8293 change)
and add tests that assert the produced bytecode/disassembly matches the expected
snapshot for both cases; run the clean build steps described in the guidelines
before committing the snapshots.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 2b118189-25bd-4908-9a6a-f0b0c12eab42
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (4)
crates/codegen/src/compile.rscrates/vm/src/builtins/code.rscrates/vm/src/frame.rscrates/vm/src/protocol/object.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/vm/src/builtins/code.rs
Summary by CodeRabbit
Bug Fixes
Performance