-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Handling of BOM (byte order mark) in source files #4353
Copy link
Copy link
Closed
Description
Feature
PEP 263 specifies that
To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature \xef\xbb\xbf will be interpreted as ‘utf-8’ encoding as well (even if no magic encoding comment is given).
However, RustPython currently fails on files that include a BOM.
$ printf '\xEF\xBB\xBF' > f.py
$ cargo run f.py
SyntaxError: Got unexpected token at line 1 column 1
^Python Documentation
Not familiar with CPython source code, but some pointers:
https://github.com/python/cpython/blob/3.11/Lib/tokenize.py#L299 (detect_encoding)
https://github.com/python/cpython/blob/3.11/Lib/test/test_importlib/source/test_source_encoding.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels