fileinput
patatetom at gmail.com
patatetom at gmail.com
Tue Oct 29 05:27:56 EDT 2019
Le lundi 28 octobre 2019 11:48:29 UTC+1, Peter J. Holzer a écrit :
> On 2019-10-25 22:12:23 +0200, Pascal wrote:
> > for line in fileinput.input(source):
> > print(line.strip())
> >
> > -----------------------
> >
> > python3.7.4 myscript.py myfile.log
> > Traceback (most recent call last):
> > ...
> > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> > invalid continuation byte
> [...]
> > for line in fileinput.input(source,
> > openhook=fileinput.hook_encoded("utf-8", "ignore")):
> > print(line.strip())
>
> The file you were trying to read was obviously not encoded in UTF-8,
> since you got a decode error.
>
> So the first question you should ask is:
>
> Is it supposed to be encoded in UTF-8 (and just corrupted) or is in
> supposed to be encoded in something else (e.g. iso-8859-1 or win-1252)?
>
> If it is supposed to be in UTF-8 but may contain errors, ignoring errors
> may be reasonable.
>
> If is supposed to be something else, determine what that "something
> else" actually is, and use that.
>
> hp
>
> --
> _ | Peter J. Holzer | we build much bigger, better disasters now
> |_|_) | | because we have much more sophisticated
> | | | hjp at hjp.at | management tools.
> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
you're right, the log file came from Windows and was encoded in iso-8859-1, but my question was about the difference in result between reading a file and reading from stdin.
More information about the Python-list
mailing list