The "Search > Go to..." feature should not allow moving inside a multi-byte encoding of a character !

**Remark** : This issue has been **first** noticed by **`Alan Kilborn`**, revisited by **`Peter Jones`** and **discussed** in that topic :

https://community.notepad-plus-plus.org/post/59397

### Description of the Issue

When using the **`Search > Go to...`** feature, with the **`Offset`** option **ticked**, the different **offsets** corresponding to each **byte** of a **multi-bytes** encoding, after the **first** one, should be **inaccessible** !

### Steps to Reproduce the Issue

- Open a new tab in N++

- if, necessary, use the **`Encoding > Convert to UTF-8`** to get an **empty** **`UTF-8`** encoded file

- Just type in the text **`A&#128104;Z`** on the **first** line

Note that, as the **`emoji`** **MAN** &#128104; is the **Unicode** character of code-point **`U+1F468`**, we can describe this line, in an **`UTF-8`** **encoded** file, as :

~~~z
Characters :   A         &#128104;        Z
Bytes      :   41  F0  9F  91  A8  5A
Offset     :   0   1   2   3   4   5
~~~

- If you move the caret right **before** the **`A`** char, the **`Search > Go to...`** feature says you're at offset **`0`**

- If you move the caret right **before** the **`&#128104;`** char, the **`Search > Go to...`** feature says you're at offset **`1`**

- If you move the caret right **before** the **`Z`** char, the **`Search > Go to...`** feature says you're at offset **`5`**

All these offsets are **correct**. But these values **should** be the **only** possible **offsets** to type in in the **`You want to go to`** zone !

### Actual Behavior

Now, let's force a move to offset **`3`**, exactly in the **middle** of the **multi-bytes** sequence of the **emoji** char ( byte **`91`** ) and then click on the **`Go`**  button

- Seemingly, the caret seems right **before** the **`Z`** letter. In fact :
 
    - If you hit the **`Backspace`** key, you get the text A**x91xA8**Z, so the **first two** bytes of the encoding **`xF0x9F`**, **before** the offset, are **deleted**

    - If you hit the **`Delete`** key, you get the text A**xF0x9FxA8**Z, so the next **`x91`** byte, **after** the offset, is **deleted**

- In addition, as you can see, the action of the **two** keys **`Backspace`** and **`Delete`** are **not** *symmetrical* as the former deletes **two** bytes ( the beginning of the **multi-bytes** sequence ) whereas the latter just deletes **one** byte ( **`x91`** )

### Expected Behavior

The offsets values, relative to the **individual** bytes of a multi-bytes sequence, **after** the **`1`** **byte**, in a **Unicode** encoded file, should **not** be allowed ! For instance, in the example above, the **allowed** values should be, **exclusively**, **`0`**,  **`1`** and **`5`**

Then, the **`Backspace`** and **`Delete`** would just act on **one** character, only, as **expected** !

Best Regards,

guy038


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The "Search > Go to..." feature should not allow moving inside a multi-byte encoding of a character ! #9101

Description of the Issue

Steps to Reproduce the Issue

Actual Behavior

Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The "Search > Go to..." feature should not allow moving inside a multi-byte encoding of a character ! #9101

Description

Description of the Issue

Steps to Reproduce the Issue

Actual Behavior

Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions