[Numpy-discussion] read not byte aligned records

Sun May 10 15:11:29 EDT 2015

For the archive, I tried to use bitarray instead of bitstring and for
same file parsing went from 180ms to 60ms. Code was finally shorter and
more simple but less easy to jump into (documentation).

Performance is still far from using fromstring or fromfile which gives
like 5ms for similar size of file but byte aligned.

Aymeric

my code is below:

def readBitarray(self, bita, channelList=None):
        """ reads stream of record bytes using bitarray module needed
for not byte aligned data

        Parameters
        ------------
        bitarray : stream
            stream of bytes
        channelList : List of str, optional

        Returns
        --------
        rec : numpy recarray
            contains a matrix of raw data in a recarray (attributes
corresponding to channel name)
        """
        from bitarray import bitarray
        B = bitarray(endian="little")  # little endian by default
        B.frombytes(bytes(bita))
        # initialise data structure
        if channelList is None:
            channelList = self.channelNames
        format = []
        for channel in self:
            if channel.name in channelList:
                format.append(channel.RecordFormat)
        buf = recarray(self.numberOfRecords, format)
        # read data
        for chan in range(len(self)):
            if self[chan].name in channelList:
                record_bit_size = self.CGrecordLength * 8
                temp = [B[self[chan].posBitBeg + record_bit_size * i:\
                        self[chan].posBitEnd + record_bit_size * i]\
                         for i in range(self.numberOfRecords)]
                nbytes = len(temp[0].tobytes())
                if not nbytes == self[chan].nBytes and \
                        self[chan].signalDataType not in (6, 7, 8, 9,
10, 11, 12): # not Ctype byte length
                    byte = 8 * (self[chan].nBytes - nbytes) *
bitarray([False])
                    for i in range(self.numberOfRecords):  # extend data
of bytes to match numpy requirement
                        temp[i].append(byte)
                temp = [self[chan].CFormat.unpack(temp[i].tobytes())[0] \
                        for i in range(self.numberOfRecords)]
                buf[self[chan].name] = asarray(temp)
        return buf

Le 05/05/15 15:39, Benjamin Root a écrit :
> I have been very happy with the bitarray package. I don't know if it
> is faster than bitstring, but it is worth a mention. Just watch out
> for any hashing operations on its objects, it doesn't seem to do them
> right (set(), dict(), etc...), but comparison operations work just fine.
>
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150510/9d8ec65c/attachment.html>