-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Description
Bug report
Bug description
I've been testing the free-threaded build for thread safety issues in unaudited modules (ref #116738 — _pickle.c is unchecked). Found a segfault when pickle.dumps() runs concurrently with dict mutations on the same dict.
The crash happens in batch_dict_exact at Modules/_pickle.c:3496 — it iterates dict items via _PyDict_Next which gives borrowed references, but there's no critical section on the dict. If another thread pops or replaces a key mid-iteration, the borrowed reference goes stale and Py_INCREF hits a freed/NULL pointer.
Sometimes it raises RuntimeError: dictionary changed size during iteration cleanly, but other times the reference gets invalidated before the size check and it just segfaults.
Reproducer
import pickle
import threading
shared = {str(i): list(range(50)) for i in range(100)}
for trial in range(2000):
barrier = threading.Barrier(3)
def dumper():
try:
barrier.wait()
pickle.dumps(shared)
except Exception:
pass
def mutator():
try:
barrier.wait()
for j in range(200):
key = str(j % 100)
shared[key] = list(range(j % 50))
if j % 10 == 0:
shared.pop(str(j % 100), None)
shared[str(j % 100)] = [j]
except Exception:
pass
threads = [
threading.Thread(target=dumper),
threading.Thread(target=dumper),
threading.Thread(target=mutator),
]
for t in threads: t.start()
for t in threads: t.join()Crashes within a few hundred iterations usually. I ran it 10 times, got 9 segfaults.
Tested on
3.14.0a4 free-threading (stock install, no sanitizers):
$ python3.14t reproducer.py
Segmentation fault (core dumped)
3.15.0a7+ free-threading (built with --with-address-sanitizer):
AddressSanitizer:DEADLYSIGNAL
==1334802==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000000c
#0 _Py_atomic_load_uint32_relaxed Include/cpython/pyatomic_gcc.h:367
#1 Py_INCREF Include/refcount.h:267
#2 batch_dict_exact Modules/_pickle.c:3496
#3 save_dict Modules/_pickle.c:3563
#4 save Modules/_pickle.c:4568
#5 dump Modules/_pickle.c:4768
#6 _pickle_dumps_impl Modules/_pickle.c:7992
Root cause
batch_dict_exactcallsPyDict_Next()which returns borrowed references to key and value- Between
PyDict_Next()returning andPy_INCREF(key)/Py_INCREF(value), another thread can mutate the dict - The borrowed reference points to a freed object,
Py_INCREFdereferences it and segfaults - The size check at line 3512 (
PyDict_GET_SIZE(obj) != dict_size) only catches mutations that change dict size, and only after the batch — too late to prevent the stale reference - Same issue exists in both the single-item path (line 3474) and the batch loop (line 3494)
Possible fix
- Wrap
PyDict_Next()+Py_INCREF(key)+Py_INCREF(value)inPy_BEGIN_CRITICAL_SECTION(obj)/Py_END_CRITICAL_SECTION()to prevent dict mutation while converting borrowed refs to owned refs - Keep
save()calls outside the critical section since they can reenter Python - Same approach as the set iteration path at line 3644 which already uses
Py_BEGIN_CRITICAL_SECTION(obj)with_PySet_NextEntryRef - Both
PyDict_Nextcall sites inbatch_dict_exactneed this (line 3474 and line 3494)
CPython versions tested on
3.14, CPython main branch
Operating systems tested on
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status