Changing strings in files
Cameron Simpson
cs at cskk.id.au
Tue Nov 10 17:55:26 EST 2020
On 11Nov2020 07:25, Chris Angelico <rosuav at gmail.com> wrote:
>If the main job of the program, as in this situation, is to read the
>entire file, I would probably have it read in the first 1KB or 16KB or
>thereabouts, see if that has any NUL bytes, and if not, proceed to
>read in the rest of the file. But depending on the situation, I might
>actually have a hard limit on the file size (say, "any file over 1GB
>isn't what I'm looking for"), so that would reduce the risks too.
You could shoehorn my suggested code for this efficiently.
It had a loop body like this:
is_text = False
try:
# expect utf-8, fail if non-utf-8 bytes encountered
with open(filename, encoding='utf-8', errors='strict') as f:
for lineno, line in enumerate(f, 1):
... other checks on each line of the file ...
if not line.endswith('\n'):
raise ValueError("line %d: no trailing newline" lineno)
if str.isprintable(line[:-1]):
raise ValueError("line %d: not all printable" % lineno)
# if we get here all checks passed, consider the file
# to
# be text
is_text = True
except Exception as e:
print(filename, "not text", e)
if not is_text:
print("skip", filename)
continue
which scans the entire file to see if it is all text (criteria to be
changed to suit the user, but I was going for clean strict utf-8 decode,
all chars "printable"). Since we're doing that, we could accumulate the
lines as we went and make the replacement in memory. If we get all the
way out the bottom, rewrite the file.
If memory is a concern, we could copy modified lines to a temporary
file, and copy back if everything was good (or not if we make no
replacements).
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list