[Numpy-discussion] Question about memmap

Robert Kern robert.kern at gmail.com
Wed Jun 10 03:30:28 EDT 2009


On Wed, Jun 10, 2009 at 02:25, Gökhan SEVER<gokhansever at gmail.com> wrote:
> On Wed, Jun 10, 2009 at 2:13 AM, Pauli Virtanen <pav at iki.fi> wrote:
>>
>> Wed, 10 Jun 2009 01:51:19 -0500, Gökhan SEVER kirjoitti:
>> > What's the reason again that memmap only works with binary files?
>>
>> There are no separate "text files" and "binary files". All files are
>> binary, some just contain text that in some cases represents an array of
>> numbers.
>>
>> Memmap views also text files as binary. It returns you an array
>> representing the *character data* in the file.
>>
>> > Could the functionality be extended to text files as well?
>>
>> In principle, yes. But this would need special parsing of the text in the
>> memmap. Doing this right would be considerably more work than just
>> representing the binary data. Also, I doubt that this would be very
>> useful: representing large amounts of data as text is not efficient. I
>> also think few people have interest in this feature.
>
> I was expecting to see a similar result to loadtxt() function with memmap().
> I just can't map the numbers in to an array but the whole file represented
> as characters. Now I see why I don't see what it's actually in my test.txt
> in terms of numbers.
>
> Reading more from memmap.py, I see that it uses mmap module. Your
> explanations confirm my observation that text files should also work here
> --providing that missing special parsing. I don't have much idea of how to
> implement this...

No, numpy.memmap cannot be made to deal meaningfully with text files
(except as an array of characters, perhaps, but that's not what we're
talking about). In order to parse the text into an array of numbers,
all of the memory has to be read. The resulting floating point array
will not (and cannot) be synchronized in any way back to the text in
the file.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list