Data structure for plotting monotonically expanding data set
Loris Bennett
loris.bennett at fu-berlin.de
Thu May 27 05:28:11 EDT 2021
Hi,
I currently a have around 3 years' worth of files like
home.20210527
home.20210526
home.20210525
...
so around 1000 files, each of which contains information about data
usage in lines like
name kb
alice 123
bob 4
...
zebedee 9999999
(there are actually more columns). I have about 400 users and the
individual files are around 70 KB in size.
Once a month I want to plot the historical usage as a line graph for the
whole period for which I have data for each user.
I already have some code to extract the current usage for a single from
the most recent file:
for line in open(file, "r"):
columns = line.split()
if len(columns) < data_column:
logging.debug("no. of cols.: %i less than data col", len(columns))
continue
regex = re.compile(user)
if regex.match(columns[user_column]):
usage = columns[data_column]
logging.info(usage)
return usage
logging.error("unable to find %s in %s", user, file)
return "none"
Obviously I will want to extract all the data for all users from a file
once I have opened it. After looping over all files I would naively end
up with, say, a nested dict like
{"20210527": { "alice" : 123, , ..., "zebedee": 9999999},
"20210526": { "alice" : 123, "bob" : 3, ..., "zebedee": 9},
"20210525": { "alice" : 123, "bob" : 1, ..., "zebedee": 9999999},
"20210524": { "alice" : 123, ..., "zebedee": 9},
"20210523": { "alice" : 123, ..., "zebedee": 9999999},
...}
where the user keys would vary over time as accounts, such as 'bob', are
added and latter deleted.
Is creating a potentially rather large structure like this the best way
to go (I obviously could limit the size by, say, only considering the
last 5 years)? Or is there some better approach for this kind of
problem? For plotting I would probably use matplotlib.
Cheers,
Loris
--
This signature is currently under construction.
More information about the Python-list
mailing list