read a file and remove Mojibake chars
Peter Otten
__peter__ at web.de
Thu Apr 7 05:49:33 EDT 2016
Daiyue Weng wrote:
> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning, the code is like,
>
> file_str = open(file_path, 'r', encoding='utf-8').read()
> print(repr(open(file_path, 'r', encoding='utf-8').read())
>
> part of the string (been printing) containing Mojibake chars is like,
>
> '锘縶\n "name": "__NAME__"'
>
> I tried to remove the non utf-8 chars using the code,
>
> def read_config_file(fname):
> with open(fname, "r", encoding='utf-8') as fp:
> for line in fp:
> line = line.strip()
> line = line.decode('utf-8','ignore').encode("utf-8")
>
> return fp.read()
>
> but it doesn't work, so how to remove the Mojibakes in this case?
I'd first investigate if the file can correctly be decoded using an encoding
other than UTF-8, but if it's really hopeless and your best bet is to ignore
all non-ascii characters try
def read_config_file(fname):
with open(fname, "r", encoding="ascii", errors="ignore") as f:
return f.read()
More information about the Python-list
mailing list