Suggested datatype for getting latest information from log files
Martin A. Brown
martin at linux-ip.net
Thu Feb 11 13:58:33 EST 2016
Greetings,
>I have timestamped log files I need to read through and keep track
>of the most upto date information.
>
>For example lets say we had a log file
>
>timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten
I do not quite understand the distinction between timeStamp and
timeNow.
>I need to keep track of every 'name' in this table, I don't want
>duplicate values so if values come in from a later timestamp that
>is different then that needs to get updated. For example if a later
>timestamp showed 'dave' with less marbles that should get updated.
>
>I thought a dictionary would be a good idea because of the key
>restrictions ensuring no duplicates, so the data would always
>update -
Yes. A dictionary seems reasonable.
>However because they are unordered and I need to do some more
>processing on the data afterwards I'm having trouble.
Ordered how? For each name, you need to keep the stream of data
ordered? This is what I'm assuming based on your problem
description.
If the order of names (dave, steve and jenny) is important, then you
should look to OrderedDict as JM has suggested.
I am inferring from your description that the order of events (along
a timeline) is what is important, not the sequence of players to
each other(, since that is already in the logfile).
>For example lets assume that once I have the most upto date values
>from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten
>to get an interval then write all the info together to some other
>database. Crucially order is important here.
Again, it's not utterly clear what "order" means. If order of
events for a single player is important, then see below.
>I don't know of a particular name will appear in the records or
>not, so it needs to created on the first instance and updated from
>then on.
Again, a dictionary is great for this.
It seems that you could benefit, also from a list (to store an event
and the time at which the event occurred). But, you don't want to
store all of history, so you want to use a bounded length list. You
may find a collections.deque useful here.
>Could anyone suggest some good approaches or suggested data
>structures for this?
First, JM already pointed you to OrderedDict, which may help
depending on exactly what you are trying to order.
There are two other data structures in the collections module that
may be helpful for you. I perceive the following (from your
description).
You have a set of names (players).
You wish to store, for each name, a value (marblesHeld).
You wish to store, for each name, a value (timeSinceLastEaten).
I recommend learning how to use both:
collections.defaultdict [0]: so you can dynamically create
entries for new players in the marble game without checking if
they already exist in the dictionary (very convenient!)
collectionst.deque [1]: in this case, I'm suggesting using it as
a bounded-length list; you keep adding stuff to it and after
it stores X entries, the old ones will "fall off"
Note, I fabricated players and data, but the bit that you are
probably interested in is the interaction between the dictionary,
whose keys are the names of the players, and whose values contain
the deque capturing (the last 10 entries) of the users marble count
and the time at which this occurred.
mydeque = functools.partial(collections.deque, maxlen=10)
record = collections.defaultdict(mydeque)
Storing both the marble count and the time will allow you to
calculate at any time later the duration since the user last had a
marble count change.
I don't understand how the eating fits into your problem, but maybe
my code (below) will afford you an example of how to approach the
problem with a few of Python's wonderfully convenient standard
library data structures.
Good luck,
-Martin
P.S. I just read your reply to JM, and it looks like you also are
trying to figure out how to read the input data. Is it CSV? Could
you simply use the csv module [2]?
[0] https://docs.python.org/3/library/collections.html#collections.defaultdict
[1] https://docs.python.org/3/library/collections.html#collections.deque
[2] https://docs.python.org/3/library/csv.html
#! /usr/bin/python3
import time
import random
import functools
import collections
import pprint
players = ['Steve', 'Jenny', 'Dave', 'Samuel', 'Jerzy', 'Ellen']
mydeque = functools.partial(collections.deque, maxlen=10)
def marblegame(rounds):
record = collections.defaultdict(mydeque)
for _ in range(rounds):
now = time.time()
who = random.choice(players)
marbles = random.randint(0, 100)
record[who].append((marbles, now))
for whom, marblehistory in record.items():
print(whom, end=": ")
pprint.pprint(marblehistory)
if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
count = int(sys.argv[1])
else:
count = 30
marblegame(count)
# -- end of file
--
Martin A. Brown
http://linux-ip.net/
More information about the Python-list
mailing list