Changing strings in files
Cameron Simpson
cs at cskk.id.au
Tue Nov 10 02:37:54 EST 2020
On 10Nov2020 07:24, Manfred Lotz <ml_news at posteo.de> wrote:
>I have a situation where in a directory tree I want to change a certain
>string in all files where that string occurs.
>
>My idea was to do
>
>- os.scandir and for each file
Use os.walk for trees. scandir does a single directory.
> - check if a file is a text file
This requires reading the entire file. You want to check that it
consists entirely of lines of text. In your expected text encoding -
these days UTF-8 is the common default, but getting this correct is
essential if you want to recognise text. So as a first cut, totally
untested:
for dirpath, filenames, dirnames in os.walk(top_dirpath):
is_text = False
try:
# expect utf-8, fail if non-utf-8 bytes encountered
with open(filename, encoding='utf-8', errors='strict') as f:
for lineno, line in enumerate(f, 1):
... other checks on each line of the file ...
if not line.endswith('\n'):
raise ValueError("line %d: no trailing newline" lineno)
if str.isprintable(line[:-1]):
raise ValueError("line %d: not all printable" % lineno)
# if we get here all checks passed, consider the file to
# be text
is_text = True
except Exception as e:
print(filename, "not text", e)
if not is_text:
print("skip", filename)
continue
You could add all sorts of other checks. "text" is a loosely defined
idea. But you could assert: all these lines decoded cleanly, so I can't
do much damage rewriting them.
> - if it is not a text file skip that file
> - change the string as often as it occurs in that file
You could, above, gather up all the lines in the file in a list. If you
get through, replace your string in the list and if anything was
changed, rewrite the file from the list of lines.
>What is the best way to check if a file is a text file? In a script I
>could use the `file` command which is not ideal as I have to grep the
>result.
Not to mention relying on file, which (a) has a simple idea of text and
(b) only looks at the start of each file, not the whole content. Very
dodgy.
If you're really batch editing files, you could (a) put everything into
a VCS (eg hg or git) so you can roll back changes or (b) work on a copy
of your directory tree or (c) just print the "text" filenames to stdout
and pipe that into GNU parallel, invoking "sed -i.bak s/this/that/g" to
batch edit the checked files, keeping a backup.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list