-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Remark : This issue has been first noticed by Alan Kilborn, revisited by Peter Jones and discussed in that topic :
https://community.notepad-plus-plus.org/post/59397
Description of the Issue
When using the Search > Go to... feature, with the Offset option ticked, the different offsets corresponding to each byte of a multi-bytes encoding, after the first one, should be inaccessible !
Steps to Reproduce the Issue
-
Open a new tab in N++
-
if, necessary, use the
Encoding > Convert to UTF-8to get an emptyUTF-8encoded file -
Just type in the text
A👨Zon the first line
Note that, as the emoji MAN 👨 is the Unicode character of code-point U+1F468, we can describe this line, in an UTF-8 encoded file, as :
Characters : A 👨 Z
Bytes : 41 F0 9F 91 A8 5A
Offset : 0 1 2 3 4 5
-
If you move the caret right before the
Achar, theSearch > Go to...feature says you're at offset0 -
If you move the caret right before the
👨char, theSearch > Go to...feature says you're at offset1 -
If you move the caret right before the
Zchar, theSearch > Go to...feature says you're at offset5
All these offsets are correct. But these values should be the only possible offsets to type in in the You want to go to zone !
Actual Behavior
Now, let's force a move to offset 3, exactly in the middle of the multi-bytes sequence of the emoji char ( byte 91 ) and then click on the Go button
-
Seemingly, the caret seems right before the
Zletter. In fact :-
If you hit the
Backspacekey, you get the text Ax91xA8Z, so the first two bytes of the encodingxF0x9F, before the offset, are deleted -
If you hit the
Deletekey, you get the text AxF0x9FxA8Z, so the nextx91byte, after the offset, is deleted
-
-
In addition, as you can see, the action of the two keys
BackspaceandDeleteare not symmetrical as the former deletes two bytes ( the beginning of the multi-bytes sequence ) whereas the latter just deletes one byte (x91)
Expected Behavior
The offsets values, relative to the individual bytes of a multi-bytes sequence, after the 1 byte, in a Unicode encoded file, should not be allowed ! For instance, in the example above, the allowed values should be, exclusively, 0, 1 and 5
Then, the Backspace and Delete would just act on one character, only, as expected !
Best Regards,
guy038