hi, It has been very very long overdue but we finally have an attempt of making our text io functions actually use text IO instead of bytes IO. This means genfromtxt, loadtxt, fromregex and savetxt should support unicode input files of any python supported encoding and universal newlines. This is the first stepping stone to finally making numpy python3 compatible.
The code is available in: https://github.com/numpy/numpy/pull/4208
Great effort has been spent to keep it backward compatible but we only have our testsuite as a reference which for sure does not cover all of the workarounds employed for this issue in the last 8 years. So we need people to dig out their ugliest hacks and test if they still work with this changeset. Functions that need testing are: loadtxt genfromtxt fromregex savetxt
Test on any input that worked in older versions of numpy (including gzip compressed) and inputs that did not work because they where encoded in something other than latin1 or had issues with linebreaks.
The PR adds an encoding keyword argument to all functions dealing with text input and output. All streams opened by the function have been changed from byte streams to text streams. As previously only latin1 encoded byte streams were supported, all input bytestreams are still decoded as such.
Converters added by the user may have been relying on the input to them being bytes. To deal with that the default encoding argument is 'bytes' which corresponds to the default encoding (None) and enables conversion to latin1 encoded bytes before passing to user converters. If you want to use converters based on strings now you have to explicitly set encoding to something else (e.g. None).
Currently the functions do not support the newlines keyword argument the python IO strings support. This probably will still get added.
Related issues and discussions:
https://github.com/numpy/numpy/issues/4600 https://github.com/numpy/numpy/issues/3184 https://github.com/numpy/numpy/issues/4939 https://github.com/numpy/numpy/issues/4543 http://numpy-discussion.10968.n7.nabble.com/using-loadtxt-to-load-a-text-fil... http://numpy-discussion.10968.n7.nabble.com/genfromtxt-universal-newline-sup... https://github.com/dhomeier/numpy/commit/995ec93