[Tutor] suggestions for splitting file based on date
Sivaram Neelakantan
nsivaram.net at gmail.com
Sat Jul 20 07:00:04 CEST 2013
On Sat, Jul 20 2013,Dave Angel wrote:
> On 07/19/2013 04:00 PM, Peter Otten wrote:
>> Sivaram Neelakantan wrote:
>>
[snipped 35 lines]
>
> I see Alan has assumed that the data is already divided into day-size
> hunks, so that subscripting those hunks is possible. He also assumed
> all the data will fit in memory at one time.
Yes, I'm just taking the day end index close, so the fixed line format
is small and the number of records is ~2K.
>
> But in my envisioning of your description, I pictured a variable
> number of records per day, with each record being a variable length
> stream of bytes starting with a length field. I pictured needing to
> handle a month with either zero entries or one with 3 billion entries.
> And even if a month is reasonable, I pictured the file as having 10
> years of spurious data before you get to the 180 day point.
I'll get there eventually.....once you see my name splashed on Wall
Street, raking in the millions. :-)
>
> Are you looking for an optimal solution, or just one that works? What
> order do you want the final data to be in. How is the data organized
> on disk? Is each record a fixed size? If so, you can efficiently do
> a binary search in the file to find the 30, 90, and 180 day points.
These are small,fixed line extracts.
>
> Once you determine the offset in the file for those 180, 90, and 30
> day points, it's a simple matter to just seek to one such spot and
> process all the records following. Most records need never be read
> from disk at all.
Will this work when the trading days and calendar days are not the
same? My 30 days is the calendar days while the 30 trading days could
mean an extra 1-2 calendar weeks.
sivaram
--
More information about the Tutor
mailing list