[Numpy-discussion] OT: A Way to Approximate and Compress a 3D Surface

Tue Nov 20 13:43:44 EST 2007

A Tuesday 20 November 2007, Geoffrey Zhu escrigué:
> Hi Everyone,
>
> This is off topic for this mailing list but I don't know where else
> to ask.
>
> I have N tabulated data points { (x_i, y_i, z_i) } that describes a
> 3D surface. The surface is pretty "smooth." However, the number of
> data points is too large to be stored and manipulated efficiently. To
> make it easier to deal with, I am looking for an easy method to
> compress and approximate the data. Maybe the approximation can be
> described by far fewer number of coefficients.
>
> If you can give me some hints about possible numpy or non-numpy
> solutions or let me know where is better to ask this kind of
> question, I would really appreciate it.

First, a good and easy try would be to use PyTables.  It does support 
on-the-flight compression, that is, allows you to access compressed 
dataset slices without decompressing the complete dataset.  This, in 
combination with a handy 'shuffle' filter (also included), allows for 
pretty good compression ratios on numerical data.  See [1] [2] for a 
discussion on how to use and what you can expect from a 
compressor/shuffle process on PyTables.

Also, if you can afford lossy compression, you may want to try 
truncation (quantization) before compressing as it does benefit the 
compression rate quite a lot.   Feel free to experiment with the next 
function (Jeffrey Whittaker was the original author):

def _quantize(data,least_significant_digit):
    """quantize data to improve compression.
    data is quantized using around(scale*data)/scale,
    where scale is 2**bits, and bits is determined from
    the least_significant_digit.
    For example, if least_significant_digit=1, bits will be 4."""

    precision = 10.**-least_significant_digit
    exp = math.log(precision,10)
    if exp < 0:
        exp = int(math.floor(exp))
    else:
        exp = int(math.ceil(exp))
    bits = math.ceil(math.log(10.**-exp,2))
    scale = 2.**bits
    return numpy.around(scale*data)/scale

[1] http://www.pytables.org/docs/manual/ch05.html#compressionIssues
[2] http://www.pytables.org/docs/manual/ch05.html#ShufflingOptim

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"