Skip to content

[BUG] Notepad++ does not handle system default code page UTF-8 well. #17057

@Coises

Description

@Coises

Is there an existing issue for this?

  • I have searched the existing issues

Description of the Issue

When the Windows setting at Control Panel | Region | Administrative | Change system locale... | Beta: Use Unicode UTF-8 for worldwide language support is checked, Notepad++ does not properly handle documents in ANSI mode. There are various repercussions of this. I believe these issues are related: #11667, #13116, #15271, #15324 and community post N++ opens ANSI file incorrectly in Win11 - why?, but none of them get to the heart of the problem, which is that when the system default code page is 65001, Notepad++ treats ANSI as if it were a single byte code page. It fails to recognize that in this situation, ANSI is UTF-8. Below I give a simple, though contrived, demonstration that something is wrong.

Steps To Reproduce

  1. At Control Panel | Region | Administrative | Change system locale... check Beta: Use Unicode UTF-8 for worldwide language support and reboot.
  2. Open Notepad++ to a blank tab. If the default document encoding is UTF-8, select Encoding | ANSI.
  3. Type the letters abcde into the empty document.
  4. Select this text: áßçðé and copy it to the clipboard.
  5. Return to the document in Notepad++ and paste what you copied.

Current Behavior

All characters in the document disappear.

Expected Behavior

The copied text pastes normally.

Debug Information

Notepad++ v8.8.6   (64-bit)
Build time: Oct  1 2025 - 01:23:06
Scintilla/Lexilla included: 5.5.7/5.4.5
Boost Regex included: 1_85
Path: C:\Program Files\Notepad++\notepad++.exe
Command Line: 
Admin mode: OFF
Local Conf mode: OFF
Cloud Config: OFF
Periodic Backup: OFF
Placeholders: OFF
Scintilla Rendering Mode: SC_TECHNOLOGY_DIRECTWRITE (1)
Multi-instance Mode: monoInst
asNotepad: OFF
File Status Auto-Detection: cdEnabledNew (for current file/tab only)
Dark Mode: OFF
Display Info:
    primary monitor: 1920x1080, scaling 100%
    visible monitors count: 1
    installed Display Class adapters: 
        0000: Description - NVIDIA GeForce RTX 2060 SUPER
        0000: DriverVersion - 32.0.15.6094
        0001: Description - Microsoft Remote Display Adapter
        0001: DriverVersion - 10.0.19041.5794
        0003: Description - Microsoft Remote Display Adapter
        0003: DriverVersion - 10.0.19041.5794
        0004: Description - Microsoft Remote Display Adapter
        0004: DriverVersion - 10.0.19041.5794
    - warning, search has been limited to maximum number of adapter records: 4
OS Name: Windows 10 Pro (64-bit)
OS Version: 22H2
OS Build: 19045.6216
Current ANSI codepage: 65001
Plugins: 
    ColumnsPlusPlus (1.3)
    ComparePlus (1.2)
    ConfigUpdater (2.2.1)
    ControlledAutoIndent (1.1)
    DSpellCheck (1.5)
    mimeTools (3.1)
    NppConverter (4.7)
    NppExport (0.4)
    PythonScript (2.1)
    Unicode Normalize (1)

Anything else?

If you have Python Script installed, after step 2 (or 3) above, select Show Console, type editor.getCodePage() into the console and press enter. Note that the response is 0. Per Scintilla documentation, this value indicates a single byte code page. Since ANSI is 65001 with the Windows setting described, the code page should be 65001.

In the community topic example cited above, it is understandable that Notepad++ doesn’t know the encoding should be Windows-1252. However, it should know that it doesn’t look like UTF-8, and it should know that “ANSI” is UTF-8, so that won’t work, either. It could fall back to some sort of assumption derived from the Windows locale (which has a good chance of being correct), or it could ask the user what encoding to use. What it should not do is set the Scintilla code page to 0 when the encoding is UTF-8.

I suggest that ANSI encoding should be entirely disabled when “Current ANSI codepage” is 65001. A more user-friendly, but more complex, choice would be to make ANSI mean “the code page normally associated with the current user locale” when “Current ANSI codepage” is 65001.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions