generating list of files matching condition
Seb
spluque at gmail.com
Wed Nov 23 22:58:16 EST 2016
Hello,
Given a list of files:
In [81]: ec_files[0:10]
Out[81]:
[u'EC_20160604002000.csv',
u'EC_20160604010000.csv',
u'EC_20160604012000.csv',
u'EC_20160604014000.csv',
u'EC_20160604020000.csv']
where the numbers are are a timestamp with format %Y%m%d%H%M%S, I'd like
to generate a list of matching files for each 2-hr period in a 2-h
frequency time series. Ultimately I'm using Pandas to read and handle
the data in each group of files. For the task of generating the files
for each 2-hr period, I've done the following:
beg_tstamp = pd.to_datetime(ec_files[0][-18:-4],
format="%Y%m%d%H%M%S")
end_tstamp = pd.to_datetime(ec_files[-1][-18:-4],
format="%Y%m%d%H%M%S")
tstamp_win = pd.date_range(beg_tstamp, end_tstamp, freq="2H")
So tstamp_win is the 2-hr frequency time series spanning the timestamps
in the files in ec_files.
I've generated the list of matching files for each tstamp_win using a
comprehension:
win_files = []
for i, w in enumerate(tstamp_win):
nextw = w + pd.Timedelta(2, "h")
ifiles = [x for x in ec_files if
pd.to_datetime(x[-18:-4], format="%Y%m%d%H%M%S") >= w and
pd.to_datetime(x[-18:-4], format="%Y%m%d%H%M%S") < nextw]
win_files.append(ifiles)
However, this is proving very slow, and was wondering whether there's a
better/faster way to do this. Any tips would be appreciated.
--
Seb
More information about the Python-list
mailing list