how to handle surrogate encoding: read from fs write to database
marko at pacujo.net
Sun Jun 12 15:08:54 EDT 2016
Random832 <random832 at fastmail.com>:
> On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote:
>> I think Windows also gets it almost write: NTFS uses UTF-16, and (I
>> think) only allow valid Unicode file names.
> Nope. Windows allows any sequence of 16-bit units (except for a dozen or
> so ASCII characters) in filenames.
Also, somewhat related, Python allows strings to contain non-Unicode
code points, namely code points in the surrogate hole. Thus, Python's
native character set is a superset of Unicode.
More information about the Python-list