Skip to content

[BUG] CRITICAL DATA LOSS VULNERABILITY: CJK Characters Corrupted and Files Permanently Damaged on Large File Scroll/Save #17234

@srcdabada

Description

@srcdabada

Is there an existing issue for this?

  • I have searched the existing issues

Is the issue reproducible in Notepad++ without plugin?

  • I have tried Notepad++ without plugin

Is the issue reproducible in portable version of Notepad++?

  • I have tried portable version of Notepad++

Does the issue affect SciTE?

  • I have tried SciTE

Description of the Issue

This is a critical and deceptive bug that poses a high risk of permanent data loss. When opening a large UTF-8 file containing CJK (Chinese, Japanese, Korean) characters, the beginning of the content displays normally, which can mislead the user into believing the file opened correctly. However, upon scrolling to the later part of the document, the characters become severely corrupted. The core severity lies in the fact that if a user performs the 'Save' action while the document is visually corrupted, the entire file is irreversibly rewritten using the incorrect Windows-1252 encoding, leading to permanent data destruction.

Crucial Comparison Note:
The exact same 'sample.txt' file opens and displays completely normally (in UTF-8) when viewed using the standard Windows built-in Notepad application. This confirms the file itself is valid UTF-8 and the issue resides specifically within Notepad++.

sample.zip

Steps To Reproduce

  1. Create a new, empty document in Notepad++.
  2. Set the encoding to UTF-8 (with BOM).
  3. Repeat a string consisting of specific CJK characters, such as '가' (U+AC00) and '川' (U+5DDD), until the file size exceeds approximately 200 KB.
  4. Save the file (e.g., as 'sample.txt').
  5. Close and restart Notepad++.
  6. Open the saved 'sample.txt' file in Notepad++.
  7. Crucially: Notice that the file is incorrectly identified as Windows-1252.
  8. Scroll down to the latter half of the document.

Current Behavior

  1. Deceptive Initial Display: The beginning of the file appears normal and undamaged, making the user unaware of the underlying encoding error.
  2. Corrupted Rendering: When scrolling down, the UTF-8 characters ('가川') are severely corrupted and rendered as garbled sequences (e.g., "·�ê°€å").
  3. Fatal Data Loss Scenario: Since the file is internally recognized as Windows-1252, if the user makes any changes and performs a 'Save' action, the entire file is irrevocably rewritten in the incorrect Windows-1252 encoding. This permanently destroys the original UTF-8 CJK data.
  4. Manual Fix Fails: Manually switching the encoding back to UTF-8 does not restore the already corrupted characters in the view.

Expected Behavior

The file must be correctly and consistently recognized as UTF-8 upon opening, and the text '가川' should be displayed correctly throughout the entire document, matching the reliable behavior of the standard Windows Notepad application.

Debug Information

Notepad++ v8.8.8   (64-bit)
Build time: Nov 16 2025 - 20:55:01
Scintilla/Lexilla included: 5.5.7/5.4.5
Boost Regex included: 1_85
Path: C:\Program Files\Notepad++\notepad++.exe
Command Line: 
Admin mode: OFF
Local Conf mode: OFF
Cloud Config: OFF
Periodic Backup: OFF
Placeholders: OFF
Scintilla Rendering Mode: SC_TECHNOLOGY_DEFAULT (0)
Multi-instance Mode: monoInst
asNotepad: OFF
File Status Auto-Detection: cdEnabledNew (for current file/tab only)
Dark Mode: OFF
Display Info:
    primary monitor: 1920x1080, scaling 125%
    visible monitors count: 1
    installed Display Class adapters: 
        0000: Description - Intel(R) HD Graphics 520
        0000: DriverVersion - 25.20.100.6518
        0001: Description - NVIDIA GeForce 940MX
        0001: DriverVersion - 22.21.13.8264
OS Name: Windows 10 Pro (64-bit)
OS Version: 22H2
OS Build: 19045.6575
Current ANSI codepage: 65001
Plugins: 
    mimeTools (3.1)
    NppConverter (4.7)
    NppExport (0.4)

Anything else?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions