[Image-SIG] experimental data diagram digitalization

Fri Nov 26 20:20:39 CET 2010

Hi Tomislav,

to me it seems like PIL would only to a (little) part in this, but I think it 
could work the way you've outlined the process.

On Fri, 26 Nov 2010 23:45:09 tomislav_maric at gmx.com wrote:
> 1) Create a .png of the diagram I find in the literature (.pdf articles, or
> theses).  
> 2) Clean up the diagram (remove the axes, the text and leave
> only the data that I am interested in). 
> 3) Read the image.

I think after this step, it might be good to convert the image to a NumPy 
array. If you've got the image data binarised, or otherwise converted to 
something with a strong contrast, you can then use the "query" operation 
numpy.where(condition, [x, y]) to get an array of all the points that are for 
example non-white. This could for example look like this:

numpy.array(numpy.where(graph_array > 128), dtype=float)

See an example here with a floating point array here:

In [3]: a = numpy.random.normal(0.5, size=(5, 5))

In [4]: a
Out[4]: 
array([[ 1.02824407, -0.10784655,  0.50478651,  1.8077713 ,  0.73332092],
       [ 1.21246923, -0.33658738,  0.29709342, -0.56360425, -0.2158604 ],
       [ 0.14956347,  0.44197572,  0.11578998,  1.39439779,  1.71079914],
       [ 1.06089915,  0.68276441,  1.65573349,  0.79238584,  1.15568584],
       [ 0.97881477,  0.14273089, -0.93478545,  0.38605599, -0.36599775]])

In [5]: numpy.where(a > 0.5)
Out[5]: 
(array([0, 0, 0, 0, 1, 2, 2, 3, 3, 3, 3, 3, 4]),
 array([0, 2, 3, 4, 0, 3, 4, 0, 1, 2, 3, 4, 0]))

This way you've got arrays with all x and y coordinates for all pixels 
belonging to the graph. The conversion to a new numpy array as stated in the 
line above the example converts the results to a new 2D array, but uses floats 
for the values, so you can conveniently go and apply scaling to 
millimetres/units to the coordinates rather than keeping them in integers.

> 4) Apply a filter that will result in only those pixels that are non-white
> (pick up the experimental data).  
> 5) Scale the result data of the filter (in pixels) to the actual coordinates
> in the image in milimeters. 

If you're using all the coordinates in one big numpy array, you can apply the 
scaling to the whole array at once, just by multiplying it with the scaling 
coefficients.

> 6) Scale the milimeter coordinates to the actual scale of the diagram (read
> from the original .pdf), to get the true coordinates (in my case, I have
> time in seconds and pressure in kPa).

Hope that helps,

Guy

-- 
Guy K. Kloss
School of Computing + Mathematical Sciences
Auckland University of Technology
Private Bag 92006, Auckland 1142
phone: +64 9 921 9999 ext. 5032
eMail: Guy.Kloss at aut.ac.nz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/image-sig/attachments/20101127/b2588191/attachment.pgp>