-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Description of the Issue
Similar to the data corruption that can happen with the Goto dialog's Offset option (see #9101 and #9129 (comment) ) , the -p command line parameter can set the current position between the bytes of a multibyte-encoded UTF-8 character, or between the bytes of a Windows' line-ending of a CRLF pair. This should not be allowed to occur.
Multibyte UTF-8 characters should be be considered "atomic" (strong feeling about) and so should Windows' line-endings (less strong of a feeling, but still fairly strong).
Steps to Reproduce the Issue
- Turn visible line-endings on via View menu > Show symbol > Show End of Line
- Open
test_4byte_utf8.txtfile (attached, below), observe the UTF-8 character (after zooming):

- Optional, using HexEditor plugin, look at hex view, observe:

- Open
test_crlf.txtfile (attached, below), observe:

- Close all files; quit Notepad++
- Run the command line:
notepad++.exe -p1 test_4byte_utf8.txtusing the attached file of the same name - After the file loads but before doing anything else, type
a - Observe data is corrupted as the 4-byte UTF-8 character has been split:

- Repeat steps 5 through 7 using the
test_crlf.txtfile instead of thetest_4byte_utf8.txtfile in step 6. - Observe line-endings, which should be CRLF, are "corrupted"; one line-ending is CR, the other is LF:

Expected Behavior
No data corruption.
Actual Behavior
The data corruption shown in steps 8 and 10.
Debug Information
Notepad++ v7.9.1 (64-bit)
Build time : Nov 2 2020 - 01:07:46
Path : C:\........\npp.7.9.1.portable.x64\notepad++.exe
Admin mode : OFF
Local Conf mode : ON
OS Name : Windows 10 Enterprise (64-bit)
OS Version : 1809
OS Build : 17763.1518
Current ANSI codepage : 1252
Plugins : mimeTools.dll NppConverter.dll NppExport.dll