[issue21927] BOM appears in stdin when using Powershell

eryksun report at bugs.python.org
Wed Jul 16 19:43:08 CEST 2014


eryksun added the comment:

> PS C:\Users\jaraco> echo £ | py -3 -c "import sys; print(repr(sys.stdin.buffer.read()))"
> b'?\r\n'

> Curiously, it appears as if powershell is actually receiving 
> a question mark from the pipe.

PowerShell calls ReadConsoleW to read the console input buffer, i.e. it reads "£" as a wide character from the command line. The default encoding when writing to the pipe should be ASCII [*]. If that's the case it explains the question mark that Python reads from stdin. It's the default replacement character (WC_DEFAULTCHAR) used by WideCharToMultiByte. 

[*] http://blogs.msdn.com/b/powershell/archive/2006/12/11/outputencoding-to-the-rescue.aspx

You can change PowerShell's output encoding to match the console:

    $OutputEncoding = [Console]::OutputEncoding

If the console codepage is 65001, the above is equivalent to setting 

    $OutputEncoding = [System.Text.Encoding]::UTF8

http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8

As Victor mentioned, this setting always writes a BOM, and under codepage 65001 it actually writes 2 BOMs (at least in PowerShell 2). Victor also mentioned that you can avoid the BOM by passing $False to the constructor:

    $OutputEncoding = New-Object System.Text.UTF8Encoding($False)

http://msdn.microsoft.com/en-us/library/system.text.utf8encoding

There's still a BOM under codepage 65001, but maybe that's fixed in PowerShell 3. 

I avoid setting the console to codepage 65001 anyway. ReadFile/WriteFile incorrectly return the number of characters read/written instead of the number of bytes because the call is actually handled by ReadConsoleA/WriteConsoleA. Maybe that's finally fixed in Windows 8.

----------
nosy: +eryksun

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21927>
_______________________________________


More information about the Python-bugs-list mailing list