-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Description of the Issue
The Boost case modifiers ( \U, \L; \u and \l ), in replacement, do not change the case of any accentuated character, so for any letter char with Unicode code-point over U+007F !
Steps to Reproduce the Issue
- Place the simple French sentence, below, in a new tab
C'est là, près de la forêt, dans un gîte où régnait un grand capharnaüm, que l'aïeul ôta sa flûte et son bâton de son canoë.-
Open the Replace dialog (
Ctrl + H)-
SEARCH
\w -
REPLACE
\U$0or\u$0 -
Tick the
Wrap aroundoption -
Click on the
Replace Allbutton
-
Expected Behavior
After the replacement, the text should be changed like below :
C'EST LÀ, PRÈS DE LA FORÊT, DANS UN GÎTE OÙ RÉGNAIT UN GRAND CAPHARNAÜM, QUE L'AÏEUL ÔTA SA FLÛTE ET SON BÂTON DE SON CANOË.These expected modifications can be done, of course, by selecting the text and using the default Shift + Ctrl + U shortcut
Actual Behavior
After the replacement, we get this output :
C'EST Là, PRèS DE LA FORêT, DANS UN GîTE Où RéGNAIT UN GRAND CAPHARNAüM, QUE L'AïEUL ôTA SA FLûTE ET SON BâTON DE SON CANOë.It's obvious that all the accentuated characters have not been modified, in their uppercase form, by the regex S/R !
The reasoning would be identical with the \L or \l case modifiers, applied on an initial uppercase text
Notes
-
This issue occurs, both, in
ANSIorUnicodeencoded files, asUTF-8 -
This issue exists since the implementation of the Boost regex library, on N++
v6.0.0.! -
The last N++ version, used for the tests, was the
v7.9.2release
Best Regards,
guy038