[Python-Dev] Windows: Remove support of bytes filenames in the os module?

Paul Moore p.f.moore at gmail.com
Mon Feb 8 13:26:32 EST 2016


On 8 February 2016 at 14:32, Victor Stinner <victor.stinner at gmail.com> wrote:
> Since 3.3, functions of the os module started to emit
> DeprecationWarning when called with bytes filenames.

Everywhere? Or just on Windows? I can't tell from your email and I
don't have a Unix system to hand to check.

> The rationale is quite simple: Windows native type for filenames is
> Unicode, and the Windows has a weird behaviour when you use bytes. For
> example, os.listdir(b'.') gives you paths which cannot be used with
> open() on filenames which are not encodable the ANSI code page.
> Unencodable characters are replaced with "?". The following issue was
> opened to document this weird behaviour (but the doc was never
> completed):
>
> "Document that bytes OS API can returns unusable results on Windows"
> http://bugs.python.org/issue16700

OK, that seems fine, but obviously of limited interest to Unix users
who aren't worried about cross-platform portability :-)

> When the new os.scandir() API was designed, I asked to *not* support
> bytes filenames since they are "broken by design".
> https://www.python.org/dev/peps/pep-0471/
>
> Recently, an user complained that os.walk() doesn't work with bytes on
> Windows anymore:
>
> "Regression: os.walk now using os.scandir() breaks bytes filenames on windows"
> http://bugs.python.org/issue25911
>
> Serhiy Storchaka just pushed a change to reintroduce support bytes
> support on Windows in os.walk(), but I would prefer to do the
> *opposite*: drop supports for bytes filenames on Windows.

But leave those APIs as Unix only? That seems like a regression, too
(sure, the bytes APIs are problematic on Windows, but only for certain
characters AIUI). Windows users currently using programs written using
the bytes API (presumably originally intended for Unix where the bytes
API was a deliberate choice), who don't hit any encoding issues
currently, will see those programs broken for no reason other than
"users using different character sets than you may have been hitting
issues before". That seems like a weird justification to me...

> Are we brave enough to force users to use the "right" type for filenames?

If it were *all* users I'd say it's worth considering. But
practicality beats purity here IMO, and I feel that allowing people's
code to be "portable by default" is a more important goal than
enforcing encoding purity on a single platform.

Paul


More information about the Python-Dev mailing list