Message393715
Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution.
Please imagine this scenario:
- before the patch
- in 64-bit build
- use zlib.decompress() function
- the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB)
If set the `bufsize` argument to the decompressed size, it used to have a fast path:
zlib.decompress(data, bufsize=10*1024*1024*1024)
Fast path when (the initial size == the actual size):
https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426
https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011
But in the current code, the initial size is clamped to UINT32_MAX, so there are two regressions:
1. allocate double RAM. (~20 GiB, blocks and the final bytes)
2. need to memcpy from blocks to the final bytes.
PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial buffer size can be greater than UINT32_MAX.
_BlocksOutputBuffer_Finish() already has a fast path for single block. Benchmark this code:
zlib.decompress(data, bufsize=10*1024*1024*1024)
time RAM
before: 7.92 sec, ~20 GiB
after: 6.61 sec, 10 GiB
(AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a')
Maybe some user code rely on this corner case.
This should be the last revision, then there is no regression in any case. |
|
| Date |
User |
Action |
Args |
| 2021-05-15 14:33:59 | malin | set | recipients:
+ malin, gregory.p.smith, methane |
| 2021-05-15 14:33:59 | malin | set | messageid: <1621089239.57.0.967535507634.issue41486@roundup.psfhosted.org> |
| 2021-05-15 14:33:59 | malin | link | issue41486 messages |
| 2021-05-15 14:33:58 | malin | create | |
|