fileinput
Peter J. Holzer
hjp-python at hjp.at
Mon Oct 28 06:48:11 EDT 2019
On 2019-10-25 22:12:23 +0200, Pascal wrote:
> for line in fileinput.input(source):
> print(line.strip())
>
> -----------------------
>
> python3.7.4 myscript.py myfile.log
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799:
> invalid continuation byte
[...]
> for line in fileinput.input(source,
> openhook=fileinput.hook_encoded("utf-8", "ignore")):
> print(line.strip())
The file you were trying to read was obviously not encoded in UTF-8,
since you got a decode error.
So the first question you should ask is:
Is it supposed to be encoded in UTF-8 (and just corrupted) or is in
supposed to be encoded in something else (e.g. iso-8859-1 or win-1252)?
If it is supposed to be in UTF-8 but may contain errors, ignoring errors
may be reasonable.
If is supposed to be something else, determine what that "something
else" actually is, and use that.
hp
--
_ | Peter J. Holzer | we build much bigger, better disasters now
|_|_) | | because we have much more sophisticated
| | | hjp at hjp.at | management tools.
__/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20191028/5b1e202e/attachment.sig>
More information about the Python-list
mailing list