Message 393715 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	malin
Recipients	gregory.p.smith, malin, methane
Date	2021-05-15.14:33:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1621089239.57.0.967535507634.issue41486@roundup.psfhosted.org>
In-reply-to

Content
Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution. Please imagine this scenario: - before the patch - in 64-bit build - use zlib.decompress() function - the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB) If set the `bufsize` argument to the decompressed size, it used to have a fast path: zlib.decompress(data, bufsize=10102410241024) Fast path when (the initial size == the actual size): https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426 https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011 But in the current code, the initial size is clamped to UINT32_MAX, so there are two regressions: 1. allocate double RAM. (~20 GiB, blocks and the final bytes) 2. need to memcpy from blocks to the final bytes. PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial buffer size can be greater than UINT32_MAX. _BlocksOutputBuffer_Finish() already has a fast path for single block. Benchmark this code: zlib.decompress(data, bufsize=10102410241024) time RAM before: 7.92 sec, ~20 GiB after: 6.61 sec, 10 GiB (AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a') Maybe some user code rely on this corner case. This should be the last revision, then there is no regression in any case.

Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution.

Please imagine this scenario:
- before the patch
- in 64-bit build
- use zlib.decompress() function
- the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB)

If set the `bufsize` argument to the decompressed size, it used to have a fast path:

    zlib.decompress(data, bufsize=10*1024*1024*1024)

    Fast path when (the initial size == the actual size):
    https://github.com/python/cpython/blob/v3.9.5/Modules/zlibmodule.c#L424-L426
    https://github.com/python/cpython/blob/v3.9.5/Objects/bytesobject.c#L3008-L3011

But in the current code, the initial size is clamped to UINT32_MAX, so there are two regressions:

    1. allocate double RAM. (~20 GiB, blocks and the final bytes)
    2. need to memcpy from blocks to the final bytes.

PR 26143 uses an UINT32_MAX sliding window for the first block, now the initial buffer size can be greater than UINT32_MAX.

_BlocksOutputBuffer_Finish() already has a fast path for single block. Benchmark this code:

    zlib.decompress(data, bufsize=10*1024*1024*1024)

              time      RAM
    before: 7.92 sec, ~20 GiB
    after:  6.61 sec,  10 GiB
    (AMD 3600X, DDR4-3200, decompressed data is 10_GiB * b'a')

Maybe some user code rely on this corner case.
This should be the last revision, then there is no regression in any case.

History
Date	User	Action	Args
2021-05-15 14:33:59	malin	set	recipients: + malin, gregory.p.smith, methane
2021-05-15 14:33:59	malin	set	messageid: <1621089239.57.0.967535507634.issue41486@roundup.psfhosted.org>
2021-05-15 14:33:59	malin	link	issue41486 messages
2021-05-15 14:33:58	malin	create