[Tutor] suggestions for splitting file based on date

Dave Angel davea at davea.name
Sat Jul 20 11:24:12 CEST 2013


On 07/20/2013 01:00 AM, Sivaram Neelakantan wrote:
> On Sat, Jul 20 2013,Dave Angel wrote:
>
      <snip>
>
> These are small,fixed line extracts.
>
>>
>> Once you determine the offset in the file for those 180, 90, and 30
>> day points, it's a simple matter to just seek to one such spot and
>> process all the records following.  Most records need never be read
>> from disk at all.
>
> Will this work when the trading days and calendar days are not the
> same?  My 30 days is the calendar days while the 30 trading days could
> mean an extra 1-2 calendar weeks.
>

Certainly it'll work.  Once you've done your binary search to find one 
of the 3 starting places. you can process the data sequentially.

If I can describe your file spec, you have a file of fixed-length 
records, each with a date stamp.  You have a variable number of records 
per day (zero or one, the way you describe it, but that doesn't affect 
our algorithm).  You want to make a list of all of the records since 
todays_date-N, where N is 30, 90, 180, or whatever.

Since you don't have tons of data, you could load all of it into a list 
ahead of time.  Manipulating that list will be easier than manipulating 
the file.  But because the records are fixed size, the two are isomorphic.

So:  read all the records into a list.  (Each item in the list is a 
tuple of date and data)

For a particular value of "age", create a sublist of those records newer 
than age:  target is today-age.  Do a binary search in the list for 
target.  Using that index as a starting point, return a slice in the list.


Now just call that function 3 times, for your three different values of age.

Since you've got an in-memory list, it's straightforward to use 
bisect.bisect_left() to do the binary search.i  But since your data is 
small, a simple linear search isn't too much slower either.  See the 
following link:

http://docs.python.org/2/library/bisect.html#searching-sorted-lists
http://docs.python.org/3.3/library/bisect.html#searching-sorted-lists


-- 
DaveA



More information about the Tutor mailing list