To reproduce:
- Paste the text
ȺRGH into a UTF-8 encoded document.
- Select it and press Ctrl+U to lower-case it.
Expected result: ⱥrgh
Actual result: ⱥrg
This happens because Ⱥ encodes to two bytes and ⱥ encodes to three bytes, and the case-conversion code assumes that the length in bytes doesn't change. If the length increases (as above) then you lose bytes from the end. If it decreases (e.g. upper-casing ⱥrgh) then you get NULs and other garbage at the end.