[Tutor] multiprocessing question

eryksun eryksun at gmail.com
Fri Nov 28 18:14:57 CET 2014


On Thu, Nov 27, 2014 at 2:40 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>
>>CsvIter._get_row_lookup should work on a regular file from built-in
>>open (not codecs.open), opened in binary mode. I/O on a regular file
>>will release the GIL back to the main thread. mmap objects don't do
>>this.
>
> Will io.open also work? Until today I thought that Python 3's open was
what is
> codecs.open in Python 2 (probably because Python3 is all about ustrings,
and
> py3-open has an encoding argument).

If you're using mmap in __getitem__, then open the file in binary mode to
parse the byte offsets for lines. This makes the operation of __getitem__
lockless, except for initialization. If you instead use the file interface
(tell, seek, read) in __getitem__, you'll have to synchronize access to
protect the file pointer.

>>Binary mode ensures the offsets are valid for use with
>>the mmap object in __getitem__. This requires an ASCII compatible
>>>encoding such as UTF-8.
>
> What do you mean exactly with "ascii compatible"? Does it mean 'superset
of ascii',
> such as utf-8, windows-1252, latin-1? Hmmm, but Asian encodings like
cp874 and
> shift-JIS are thai/japanese on top of ascii, so this makes me doubt. In
my code I
> am using icu to guess the encoding; I simply put 'utf-8' in the sample
code for
> brevity.

The 2.x csv module only works with byte strings that are ASCII compatible.
It doesn't support encodings such as UTF-16 that have nulls. Also, the
reader is hard-coded to use ASCII '\r' and '\n' as line terminators. I'd
have to read the source to see what else is hard coded.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20141128/230b5e78/attachment.html>


More information about the Tutor mailing list