Concatenating files in order
Tim Chase
python.list at tim.thechases.com
Tue May 23 16:46:52 EDT 2017
On 2017-05-23 19:29, Mahmood Naderan via Python-list wrote:
> There are some text files ending with _chunk_i where 'i' is an
> integer. For example,
>
> XXX_chunk_0
> XXX_chunk_1
> ...
>
> I want to concatenate them in order. Thing is that the total number
> of files may be variable. Therefore, I can not specify the number
> in my python script. It has to be "for all files ending with
> _chunk_i".
>
> Next, I can write
>
> with open('final.txt', 'w') as outf:
> for fname in filenames:
> with open(fname) as inf:
> for line in inf:
> outf.write(line)
>
>
> How can I specify the "filenames"?
Does the *file* or the *filename* end in _chunk_i? If it's the
file-name and they come in in-order, you can just skip them:
for fname in filenames:
*_, chunk, i = filename.split('_')
if chunk == "chunk" and i.isdigit():
with open(fname) as inf:
for line in inf:
outf.write(line)
If they're not sorted, you'd have to sort & filter them first. I'd
recommend a sorting & filtering generator:
import re
interesting_re = re.compile('chunk_(\d+)$', re.I)
def filter_and_sort(filenames):
yield from sorted((
fname
for fname in filenames
if interesting_re.search(fname)
),
key=lambda v: int(v.rsplit('_', 1)[-1])
)
for fname in filter_and_sort(filenames):
with open(fname) as inf:
for line in inf:
outf.write(line)
If the "chunk_i" is *content* in the file, it's a good bit more work
to search through all the files for the data, note which file
contains which tag, then reopen/seek(0) each file and write them out
in order (you'd also have to consider the edge where a file has more
than one "chunk_i" that straddles other files).
-tkc
More information about the Python-list
mailing list