Data structure for plotting monotonically expanding data set
Peter J. Holzer
hjp-python at hjp.at
Thu May 27 13:03:53 EDT 2021
On 2021-05-27 11:28:11 +0200, Loris Bennett wrote:
> I currently a have around 3 years' worth of files like
>
> home.20210527
> home.20210526
> home.20210525
> ...
>
> so around 1000 files, each of which contains information about data
> usage in lines like
>
> name kb
> alice 123
> bob 4
> ...
> zebedee 9999999
>
> (there are actually more columns). I have about 400 users and the
> individual files are around 70 KB in size.
>
> Once a month I want to plot the historical usage as a line graph for the
> whole period for which I have data for each user.
[...]
> Obviously I will want to extract all the data for all users from a file
> once I have opened it. After looping over all files I would naively end
> up with, say, a nested dict like
>
> {"20210527": { "alice" : 123, , ..., "zebedee": 9999999},
> "20210526": { "alice" : 123, "bob" : 3, ..., "zebedee": 9},
> "20210525": { "alice" : 123, "bob" : 1, ..., "zebedee": 9999999},
> "20210524": { "alice" : 123, ..., "zebedee": 9},
> "20210523": { "alice" : 123, ..., "zebedee": 9999999},
> ...}
>
> where the user keys would vary over time as accounts, such as 'bob', are
> added and latter deleted.
>
> Is creating a potentially rather large structure like this the best way
> to go (I obviously could limit the size by, say, only considering the
> last 5 years)?
I don't think that would be a problem. However, I assume that you want
to create one graph per user, not a single graph with 400 lines (that
would be very cluttered). So I would swap the levels around:
{
"alice": { "20210527": 123, "20210526": 123, ... },
"bob": { "20210526": 3, "20210525", 1, ... },
"zebedee": { "20210527": 9999999, "20210526": 9, ... }
}
That way you have the data for each graph grouped together.
It might also be a good idea to use actual date objects instead of
strings.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp at hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20210527/1588721c/attachment.sig>
More information about the Python-list
mailing list