<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;">Hmm, doesn't seem to be explicitly listed as a deprecation, though discussion form around that time makes it clear that everyone thought it was.<br><br>I also found this proposal to use strict mbcs to decode bytes for use against the file system, which is basically the same as what I'm proposing now apart from the more limited encoding: https://mail.python.org/pipermail/python-dev/2011-October/114203.html<br><br>It definitely results in less C code to maintain if we do the decode ourselves. We could use strict mbcs, but I'd leave the deprecation warnings in there. Or perhaps we provide an env var to use mbcs as the file system encoding but default to utf8 (I already have one for selecting legacy console encoding)? Callers should be asking the sys module for the encoding anyway, so I'd expect few libraries to be impacted, though applications might prefer it.<br><br>Top-posted from my Windows Phone</div></div><div dir="ltr"><hr><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">From: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:p.f.moore@gmail.com">Paul Moore</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Sent: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">8/16/2016 3:54</span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">To: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:ncoghlan@gmail.com">Nick Coghlan</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Cc: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;"><a href="mailto:python-ideas@python.org">python-ideas</a></span><br><span style="font-family: Calibri,sans-serif; font-size: 11pt; font-weight: bold;">Subject: </span><span style="font-family: Calibri,sans-serif; font-size: 11pt;">Re: [Python-ideas] Fix default encodings on Windows</span><br><br></div>On 15 August 2016 at 19:26, Steve Dower <steve.dower@python.org> wrote:<br>> Passing path_as_bytes in that location has been deprecated since 3.3, so we<br>> are well within our rights (and probably overdue) to make it a TypeError in<br>> 3.6. While it's obviously an invalid assumption, for the purposes of<br>> changing the language we can assume that no existing code is passing bytes<br>> into any functions where it has been deprecated.<br>><br>> As far as I'm concerned, there are currently no filesystem APIs on Windows<br>> that accept paths as bytes.<br><br>[...]<br><br>On 16 August 2016 at 03:00, Nick Coghlan <ncoghlan@gmail.com> wrote:<br>> The problem is that bytes-as-paths actually *does* work for Mac OS X<br>> and systemd based Linux distros properly configured to use UTF-8 for<br>> OS interactions. This means that a lot of backend network service code<br>> makes that assumption, especially when it was originally written for<br>> Python 2, and rather than making it work properly on Windows, folks<br>> just drop Windows support as part of migrating to Python 3.<br>><br>> At an ecosystem level, that means we're faced with a choice between<br>> implicitly encouraging folks to make their code *nix only, and finding<br>> a way to provide a more *nix like experience when running on Windows<br>> (where UTF-8 encoded binary data just works, and either other<br>> encodings lead to mojibake or else you use chardet to figure things<br>> out).<br>><br>> Steve is suggesting that the latter option is preferable, a view I<br>> agree with since it lowers barriers to entry for Windows based<br>> developers to contribute to primarily *nix focused projects.<br><br>So does this mean that you're recommending reverting the deprecation<br>of bytes as paths in favour of documenting that bytes as paths is<br>acceptable, but it will require an encoding of UTF-8 rather than the<br>current behaviour? If so, that raises some questions:<br><br>1. Is it OK to backtrack on a deprecation by changing the behaviour<br>like this? (I think it is, but others who rely on the current,<br>deprecated, behaviour may not).<br>2. Should we be making "always UTF-8" the behaviour on all platforms,<br>rather than just Windows (e.g., Unix systems which haven't got UTF-8<br>as their locale setting)? This doesn't seem to be a Windows-specific<br>question any more (I'm assuming that if bytes-as-paths are deprecated,<br>that's a cross-platform change, but see below).<br><br>Having said all this, I can't find the documentation stating that<br>bytes paths are deprecated - the open() documentation for 3.5 says<br>"file is either a string or bytes object giving the pathname (absolute<br>or relative to the current working directory) of the file to be opened<br>or an integer file descriptor of the file to be wrapped" and there's<br>no mention of a deprecation. Steve - could you provide a reference?<br><br>Paul<br>_______________________________________________<br>Python-ideas mailing list<br>Python-ideas@python.org<br>https://mail.python.org/mailman/listinfo/python-ideas<br>Code of Conduct: http://python.org/psf/codeofconduct/<br></body></html>