# [Image-SIG] experimental data diagram digitalization

Guy K. Kloss guy.kloss at aut.ac.nz
Fri Nov 26 20:20:39 CET 2010

```Hi Tomislav,

to me it seems like PIL would only to a (little) part in this, but I think it
could work the way you've outlined the process.

On Fri, 26 Nov 2010 23:45:09 tomislav_maric at gmx.com wrote:
> 1) Create a .png of the diagram I find in the literature (.pdf articles, or
> theses).
> 2) Clean up the diagram (remove the axes, the text and leave
> only the data that I am interested in).

I think after this step, it might be good to convert the image to a NumPy
array. If you've got the image data binarised, or otherwise converted to
something with a strong contrast, you can then use the "query" operation
numpy.where(condition, [x, y]) to get an array of all the points that are for
example non-white. This could for example look like this:

numpy.array(numpy.where(graph_array > 128), dtype=float)

See an example here with a floating point array here:

In [3]: a = numpy.random.normal(0.5, size=(5, 5))

In [4]: a
Out[4]:
array([[ 1.02824407, -0.10784655,  0.50478651,  1.8077713 ,  0.73332092],
[ 1.21246923, -0.33658738,  0.29709342, -0.56360425, -0.2158604 ],
[ 0.14956347,  0.44197572,  0.11578998,  1.39439779,  1.71079914],
[ 1.06089915,  0.68276441,  1.65573349,  0.79238584,  1.15568584],
[ 0.97881477,  0.14273089, -0.93478545,  0.38605599, -0.36599775]])

In [5]: numpy.where(a > 0.5)
Out[5]:
(array([0, 0, 0, 0, 1, 2, 2, 3, 3, 3, 3, 3, 4]),
array([0, 2, 3, 4, 0, 3, 4, 0, 1, 2, 3, 4, 0]))

This way you've got arrays with all x and y coordinates for all pixels
belonging to the graph. The conversion to a new numpy array as stated in the
line above the example converts the results to a new 2D array, but uses floats
for the values, so you can conveniently go and apply scaling to
millimetres/units to the coordinates rather than keeping them in integers.

> 4) Apply a filter that will result in only those pixels that are non-white
> (pick up the experimental data).
> 5) Scale the result data of the filter (in pixels) to the actual coordinates
> in the image in milimeters.

If you're using all the coordinates in one big numpy array, you can apply the
scaling to the whole array at once, just by multiplying it with the scaling
coefficients.

> 6) Scale the milimeter coordinates to the actual scale of the diagram (read
> from the original .pdf), to get the true coordinates (in my case, I have
> time in seconds and pressure in kPa).

Hope that helps,

Guy

--
Guy K. Kloss
School of Computing + Mathematical Sciences
Auckland University of Technology
Private Bag 92006, Auckland 1142
phone: +64 9 921 9999 ext. 5032
eMail: Guy.Kloss at aut.ac.nz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/image-sig/attachments/20101127/b2588191/attachment.pgp>
```