A faster way of finding historical highs/lows

Eamonn Sullivan eamonn_sullivan at blueyonder.co.uk
Sat Jun 12 15:32:47 CEST 2004


David Fraser <davidf at sjsoft.com> wrote in message news:<cad8uc$e9r$2 at ctb-nnrp2.saix.net>...
> Eamonn Sullivan wrote:
> > Peter Hansen <peter at engcorp.com> wrote in message news:<SN2dnT9ob92aPFTdRVn-gQ at powergate.ca>...
> > 
> >>Eamonn Sullivan wrote:
> >>
> >>
> >>>1. Find the most recent date when there was an equal or higher (or
> >>>lower) value than X. 
> >>
> >>The fastest algorithm might depend on how you use the data, as well.
> >>For example, do you update the data often, and search it rarely,
> >>or update it rarely and do the search very often?  If searching
> >>many times between updates, some preprocessing will likely make things
> >>go much faster.
> >>
> >>Both of your examples sound to me like they would benefit by
> >>using sort(), then a binary search.  Sort is very fast relative
> >>to things like the Python loops you are doing, so using it to
> >>prepare the data before the search can be a good step.
> >>
> >>-Peter
> > 
> > 
> > Thanks for this. At the moment, the software answers a few questions
> > (highest/lowest, longest streak, etc.) once per retrieval of data. The
> > database retrieval, though, is *by far* the biggest time sapper, so a
> > little preprocessing would be almost unnoticeable in comparison,
> > probably (depends on how much, of course).
> > 
> > So, I'm guessing I could sort on the particular price field I'm using
> > (decorate-sort-undecorate), and then just find the most recent date
> > among the subset of data that meets the criteria (higher or lower). Is
> > that what you mean? By binary search, do you mean further reorganizing
> > the data into a binary tree using date?
> 
> If you have this in a relational database, you might find that the 
> database can answer the question quicker for you, using indexes if 
> neccessary ...
>    select max(xxx) from yyy where zzz > 40
> with an index on xxx and zzz will usually be done quickly internally by 
> the database, and then you just get the result returned rather than 
> having to process it
> 
> David


Unfortunately, I don't have that level of control. It's market data
through an ActiveX control (on Windows) or C API (elsewhere) with only
a handful of calls -- e.g. GetHistoricalData or Subscribe (for
realtime). So, what I do is pull down the history (up to the present
moment) on a security for a number of price fields and then answer a
bunch of questions on a local copy: high/low value since, biggest
gain/loss since, longest streak since, percentage of average volume,
etc. User may then click a Refresh button (to fill today's field with
fresher data) and then typically moves on to another security. In most
cases, the hourglass stays on for 3-4 seconds or less to answer *all*
of the questions, or as long as 15 seconds if, for example, it's the
longest streak in eons.

The reason I'm doing this, though, is that I'd like to eventually make
the Refresh button unnecessary -- automatically pull in near realtime
data as it happens. The several seconds then become a bit too much
overhead. I know I can do other tricks (if the value hasn't exceeded a
certain window, then the high/low since doesn't need recalculating),
but I'm taking the problem in bits and seeing if I can improve the
searching algorithms first.



More information about the Python-list mailing list