First two bytes of 'stdout' are lost
Cameron Simpson
cs at cskk.id.au
Thu Apr 11 17:55:55 EDT 2024
On 11Apr2024 14:42, Olivier B. <perso.olivier.barthelemy at gmail.com> wrote:
>I am trying to use StringIO to capture stdout, in code that looks like this:
>
>import sys
>from io import StringIO
>old_stdout = sys.stdout
>sys.stdout = mystdout = StringIO()
>print( "patate")
>mystdout.seek(0)
>sys.stdout = old_stdout
>print(mystdout.read())
>
>Well, it is not exactly like this, since this works properly
Aye, I just tried that. All good.
>This code is actually run from C++ using the C Python API.
>This worked quite well, so the code was right at some point. But now,
>two things changed:
> - Now using python 3.11.7 instead of 3.7.12
> - Now using only the python limited C API
Maybe you should post the code then: the exact Python code and the exact
C++ code.
>And it seems that now, mystdout.read() always misses the first two
>characters that have been written to stdout.
>
>My first ideas was something related to the BOM improperly truncated
>at some point, but i am manipulating UTF-8, so the bom would be 3
>bytes, not 2.
I didn't think UTF-8 needed a BOM. Somone will doubtless correct me.
However, does the `mystdout.read()` code _know_ you're using UTF-8? I
have the vague impression that eg some Windows systems default to UTF-16
of some flavour, possibly _with_ a BOM.
I'm suggesting that you rigorously check that the bytes->text bits know
what text encoding they're using. If you've left an encoding out
anywhere, put it in explicitly.
>Hopefully someone has a clue on what would have changed in Python for
>this to stop working compared to python 3.7?
None at all, alas. My experience with the Python C API is very limited.
More information about the Python-list
mailing list