Generating a dataset of images in RAM
Dear all, I proposed on kagle an image processing/supervised classification problem <https://www.kaggle.com/jeanpat/d/jeanpat/metaphase/generating-overlapping-ch...> concerning the resolution of overlapping chromosomes.The aim is to produce a large dataset of examples. A first dataset was produced <https://www.kaggle.com/jeanpat/overlapping-chromosomes>, but it seems to be too small to yield good results for supervised classification with a neural network. As explained in the first notebook, 8Go is not enough to process, mainly to resize/crop, the images. My question how a large batch of images >>100 000 can be resized? Thanks. Jean-Patrick PS I can't hide that It would be great if some would be interrested by the problem itself and give some help on the resolution itself or some advices on the proposed code.
Hi Jean-Patrick, Why do you need to load everything into RAM to resize it? This a perfect use-case for streaming data processing. Have a look at my notebook from EuroSciPy 2015 for some examples: https://github.com/jni/streaming-talk/blob/master/Big%20data%20in%20little%2... Specifically, as you generate examples, you should be writing them to disk directly. Then you are limited by disk size, instead of RAM size. I hope that helps! Juan. On 27 September 2016 at 8:27:09 pm, Jean-Patrick Pommier ( jeanpatrick.pommier@gmail.com) wrote: Dear all, I proposed on kagle an image processing/supervised classification problem <https://www.kaggle.com/jeanpat/d/jeanpat/metaphase/generating-overlapping-ch...> concerning the resolution of overlapping chromosomes.The aim is to produce a large dataset of examples. A first dataset was produced <https://www.kaggle.com/jeanpat/overlapping-chromosomes>, but it seems to be too small to yield good results for supervised classification with a neural network. As explained in the first notebook, 8Go is not enough to process, mainly to resize/crop, the images. My question how a large batch of images >>100 000 can be resized? Thanks. Jean-Patrick PS I can't hide that It would be great if some would be interrested by the problem itself and give some help on the resolution itself or some advices on the proposed code. -- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. To post to this group, send email to scikit-image@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/scikit-image/cc5cdf4e-6847-4872-a10f-a5981... <https://groups.google.com/d/msgid/scikit-image/cc5cdf4e-6847-4872-a10f-a598148edf56%40googlegroups.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout.
Thanks Juan, I didn't know toolz. Le mardi 27 septembre 2016 12:27:08 UTC+2, Jean-Patrick Pommier a écrit :
Dear all,
I proposed on kagle an image processing/supervised classification problem <https://www.kaggle.com/jeanpat/d/jeanpat/metaphase/generating-overlapping-ch...> concerning the resolution of overlapping chromosomes.The aim is to produce a large dataset of examples. A first dataset was produced <https://www.kaggle.com/jeanpat/overlapping-chromosomes>, but it seems to be too small to yield good results for supervised classification with a neural network. As explained in the first notebook, 8Go is not enough to process, mainly to resize/crop, the images. My question how a large batch of images >>100 000 can be resized?
Thanks.
Jean-Patrick
PS I can't hide that It would be great if some would be interrested by the problem itself and give some help on the resolution itself or some advices on the proposed code.
participants (2)
-
Jean-Patrick Pommier
-
Juan Nunez-Iglesias