-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Is there an existing issue for this?
- I have searched the existing issues
Description of the Issue
When the Windows setting at Control Panel | Region | Administrative | Change system locale... | Beta: Use Unicode UTF-8 for worldwide language support is checked, Notepad++ does not properly handle documents in ANSI mode. There are various repercussions of this. I believe these issues are related: #11667, #13116, #15271, #15324 and community post N++ opens ANSI file incorrectly in Win11 - why?, but none of them get to the heart of the problem, which is that when the system default code page is 65001, Notepad++ treats ANSI as if it were a single byte code page. It fails to recognize that in this situation, ANSI is UTF-8. Below I give a simple, though contrived, demonstration that something is wrong.
Steps To Reproduce
- At Control Panel | Region | Administrative | Change system locale... check Beta: Use Unicode UTF-8 for worldwide language support and reboot.
- Open Notepad++ to a blank tab. If the default document encoding is UTF-8, select Encoding | ANSI.
- Type the letters
abcdeinto the empty document. - Select this text:
áßçðéand copy it to the clipboard. - Return to the document in Notepad++ and paste what you copied.
Current Behavior
All characters in the document disappear.
Expected Behavior
The copied text pastes normally.
Debug Information
Notepad++ v8.8.6 (64-bit)
Build time: Oct 1 2025 - 01:23:06
Scintilla/Lexilla included: 5.5.7/5.4.5
Boost Regex included: 1_85
Path: C:\Program Files\Notepad++\notepad++.exe
Command Line:
Admin mode: OFF
Local Conf mode: OFF
Cloud Config: OFF
Periodic Backup: OFF
Placeholders: OFF
Scintilla Rendering Mode: SC_TECHNOLOGY_DIRECTWRITE (1)
Multi-instance Mode: monoInst
asNotepad: OFF
File Status Auto-Detection: cdEnabledNew (for current file/tab only)
Dark Mode: OFF
Display Info:
primary monitor: 1920x1080, scaling 100%
visible monitors count: 1
installed Display Class adapters:
0000: Description - NVIDIA GeForce RTX 2060 SUPER
0000: DriverVersion - 32.0.15.6094
0001: Description - Microsoft Remote Display Adapter
0001: DriverVersion - 10.0.19041.5794
0003: Description - Microsoft Remote Display Adapter
0003: DriverVersion - 10.0.19041.5794
0004: Description - Microsoft Remote Display Adapter
0004: DriverVersion - 10.0.19041.5794
- warning, search has been limited to maximum number of adapter records: 4
OS Name: Windows 10 Pro (64-bit)
OS Version: 22H2
OS Build: 19045.6216
Current ANSI codepage: 65001
Plugins:
ColumnsPlusPlus (1.3)
ComparePlus (1.2)
ConfigUpdater (2.2.1)
ControlledAutoIndent (1.1)
DSpellCheck (1.5)
mimeTools (3.1)
NppConverter (4.7)
NppExport (0.4)
PythonScript (2.1)
Unicode Normalize (1)Anything else?
If you have Python Script installed, after step 2 (or 3) above, select Show Console, type editor.getCodePage() into the console and press enter. Note that the response is 0. Per Scintilla documentation, this value indicates a single byte code page. Since ANSI is 65001 with the Windows setting described, the code page should be 65001.
In the community topic example cited above, it is understandable that Notepad++ doesn’t know the encoding should be Windows-1252. However, it should know that it doesn’t look like UTF-8, and it should know that “ANSI” is UTF-8, so that won’t work, either. It could fall back to some sort of assumption derived from the Windows locale (which has a good chance of being correct), or it could ask the user what encoding to use. What it should not do is set the Scintilla code page to 0 when the encoding is UTF-8.
I suggest that ANSI encoding should be entirely disabled when “Current ANSI codepage” is 65001. A more user-friendly, but more complex, choice would be to make ANSI mean “the code page normally associated with the current user locale” when “Current ANSI codepage” is 65001.