[Tutor] suggestions for splitting file based on date
Dave Angel
davea at davea.name
Sat Jul 20 11:24:12 CEST 2013
On 07/20/2013 01:00 AM, Sivaram Neelakantan wrote:
> On Sat, Jul 20 2013,Dave Angel wrote:
>
<snip>
>
> These are small,fixed line extracts.
>
>>
>> Once you determine the offset in the file for those 180, 90, and 30
>> day points, it's a simple matter to just seek to one such spot and
>> process all the records following. Most records need never be read
>> from disk at all.
>
> Will this work when the trading days and calendar days are not the
> same? My 30 days is the calendar days while the 30 trading days could
> mean an extra 1-2 calendar weeks.
>
Certainly it'll work. Once you've done your binary search to find one
of the 3 starting places. you can process the data sequentially.
If I can describe your file spec, you have a file of fixed-length
records, each with a date stamp. You have a variable number of records
per day (zero or one, the way you describe it, but that doesn't affect
our algorithm). You want to make a list of all of the records since
todays_date-N, where N is 30, 90, 180, or whatever.
Since you don't have tons of data, you could load all of it into a list
ahead of time. Manipulating that list will be easier than manipulating
the file. But because the records are fixed size, the two are isomorphic.
So: read all the records into a list. (Each item in the list is a
tuple of date and data)
For a particular value of "age", create a sublist of those records newer
than age: target is today-age. Do a binary search in the list for
target. Using that index as a starting point, return a slice in the list.
Now just call that function 3 times, for your three different values of age.
Since you've got an in-memory list, it's straightforward to use
bisect.bisect_left() to do the binary search.i But since your data is
small, a simple linear search isn't too much slower either. See the
following link:
http://docs.python.org/2/library/bisect.html#searching-sorted-lists
http://docs.python.org/3.3/library/bisect.html#searching-sorted-lists
--
DaveA
More information about the Tutor
mailing list