bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer · Pull Request #26143 · python/cpython

ghost · 2021-05-15T14:32:48Z

These funtions have an initial output buffer size parameter:

zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
zlib.Decompress.flush([length])

If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX.
Speed up when (the initial size == the actual size).

https://bugs.python.org/issue41486

These funtions have an initial output buffer size parameter: - zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) - zlib.Decompress.flush([length]) If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX. Speed up when (the initial size == the actual size).

ghost · 2021-05-26T08:05:15Z

@gpshead
Sorry to bother you again.
Could you review this PR? This should be the last revision of blocks output buffer.

See this comment for details.
https://bugs.python.org/issue41486#msg393715

I had used half a month to polish this patch, mainly trying various code styles.

The code has been tested with this script:

import zlib, time, random

_1G = 1024*1024*1024
UINT32_MAX = 0xFFFF_FFFF

def test(DATA_SIZE, INIT_BUFF_SIZES):
    b = random.choice((b'a', b'b', b'c', b'd'))
    raw_dat = b * DATA_SIZE

    t1=time.perf_counter()
    compressed_dat = zlib.compress(raw_dat, 1)
    t2=time.perf_counter()
    del raw_dat
    print(f'compressed size: {len(compressed_dat)}, time: {t2-t1:.5f}')

    for init_size in INIT_BUFF_SIZES:
        t1=time.perf_counter()
        decompressed_dat = zlib.decompress(compressed_dat, bufsize=init_size)
        t2=time.perf_counter()

        assert len(decompressed_dat) == DATA_SIZE
        assert decompressed_dat.count(b) == DATA_SIZE
        del decompressed_dat

        print(f'data size: {DATA_SIZE:>8}, init buff size: {init_size:>8}, time: {t2-t1:.5f}')
    print()

SIZE = _1G
test(SIZE, (100, SIZE, SIZE+1, UINT32_MAX, UINT32_MAX+1))

SIZE = UINT32_MAX
test(SIZE, (100, SIZE, SIZE+1, 2*UINT32_MAX+1))

SIZE = 10*_1G
test(SIZE, (_1G, SIZE-1, SIZE, SIZE+1, UINT32_MAX, UINT32_MAX+1,
            2*UINT32_MAX+1, 3*UINT32_MAX+1, 4*UINT32_MAX+1))
            
SIZE = 20*_1G
test(SIZE, (3*_1G, SIZE-1, SIZE, SIZE+1, UINT32_MAX, UINT32_MAX+1,
            2*UINT32_MAX+1, 3*UINT32_MAX+1, 4*UINT32_MAX+1, 5*UINT32_MAX+1, 6*UINT32_MAX+1))

github-actions · 2021-06-26T00:06:13Z

This PR is stale because it has been open for 30 days with no activity.

miss-islington · 2021-07-05T01:10:47Z

Thanks @animalize for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

…er (pythonGH-26143) * zlib uses an UINT32_MAX sliding window for the output buffer These funtions have an initial output buffer size parameter: - zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) - zlib.Decompress.flush([length]) If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX. Speed up when (the initial size == the actual size). This fixes a memory consumption and copying performance regression in earlier 3.10 beta releases if someone used an output buffer larger than 4GiB with zlib.decompress. Reviewed-by: Gregory P. Smith (cherry picked from commit a9a69bb) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>

bedevere-bot · 2021-07-05T01:10:58Z

GH-27025 is a backport of this pull request to the 3.10 branch.

…er (GH-26143) * zlib uses an UINT32_MAX sliding window for the output buffer These funtions have an initial output buffer size parameter: - zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) - zlib.Decompress.flush([length]) If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX. Speed up when (the initial size == the actual size). This fixes a memory consumption and copying performance regression in earlier 3.10 beta releases if someone used an output buffer larger than 4GiB with zlib.decompress. Reviewed-by: Gregory P. Smith (cherry picked from commit a9a69bb) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>

bedevere-bot added the awaiting review label May 15, 2021

the-knights-who-say-ni added the CLA signed label May 15, 2021

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 26, 2021

gpshead self-assigned this Jul 5, 2021

gpshead added needs backport to 3.10 only security fixes performance Performance or resource usage labels Jul 5, 2021

gpshead approved these changes Jul 5, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Jul 5, 2021

gpshead added the type-bug An unexpected behavior, bug, or error label Jul 5, 2021

news entry

340bdf8

gpshead merged commit a9a69bb into python:main Jul 5, 2021

bedevere-bot removed the awaiting merge label Jul 5, 2021

bedevere-bot removed the needs backport to 3.10 only security fixes label Jul 5, 2021

ghost deleted the sliding_window branch July 12, 2021 18:32

pablogsal mentioned this pull request Jul 18, 2021

[3.10] Correct the order of regen-abidump #27228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer#26143

bpo-41486: zlib uses an UINT32_MAX sliding window for the output buffer#26143
gpshead merged 2 commits intomainfrom
unknown repository

ghost commented May 15, 2021 •

edited by ghost

Loading

Uh oh!

ghost commented May 26, 2021

Uh oh!

github-actions bot commented Jun 26, 2021

Uh oh!

miss-islington commented Jul 5, 2021

Uh oh!

bedevere-bot commented Jul 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

ghost commented May 15, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented May 26, 2021

Uh oh!

github-actions bot commented Jun 26, 2021

Uh oh!

miss-islington commented Jul 5, 2021

Uh oh!

bedevere-bot commented Jul 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ghost commented May 15, 2021 •

edited by ghost

Loading