[Numpy-discussion] Numpy Array of dtype=object with strings and floats question
Darryl Wallace
darryl.wallace at prosensus.ca
Tue Nov 10 13:53:21 EST 2009
Hello,
On Tue, Nov 10, 2009 at 1:32 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
> On Tue, Nov 10, 2009 at 12:09 PM, Darryl Wallace
> <darryl.wallace at prosensus.ca> wrote:
> > Hello again,
> > The best way so far that's come to my attention is to use:
> > numpy.ma.masked_object
> > The problem with this is that it's looking for a specific instance of an
> > object. So if the user had some elements of their array that were, for
> > example, "randomString" , then it would not be picked up
> > e.g.
> > ---
> > from numpy import *
> > mixedArray=array([1,2, '', 3, 4, 'randomString'], dtype=object)
> > mixedArrayMask = ma.masked_object(mixedArray, 'randomString').mask
> > ---
> > then mixedArrayMask will yield:
> >
> > array([ False, False, False, False, False, True])
> > Can anyone help me so that all strings are found in the array without
> having
> > to explicitly loop through them in Python?
> > Thanks,
> > Darryl
>
> Why not stick to a same Missing-Value-Code or for all the non-valid
> data? I don't know how MA module would handle mixed MVCs in a same
> array without modifying the existing code. Otherwise looping over the
> array an masking the str instances as NaN would be my alternative
> solution.
>
The reason I don't stick to a standard missing value code is because a user
may import other things in the datasheet that we need, like row or column
labels, or maybe getting data from a specific source which reports missing
data as a specific string.
I currently do as you suggested. But when the dataset size becomes large,
it gets to be quite slow due to the overhead of python looping.
Thanks
>
>
> >
> > On Fri, Nov 6, 2009 at 3:56 PM, Darryl Wallace <
> darryl.wallace at prosensus.ca>
> > wrote:
> >>
> >> What I'm doing is importing some data from excel and sometimes there are
> >> strings in the worksheet. Often times a user will use an empty cell or
> a
> >> string to represent data that is missing.
> >> e.g.
> >> from numpy import *
> >> mixedArray=array([1, 2, '', 3, 4, 'String'], dtype=object)
> >> Two questions:
> >> 1) Is there a quick way to find the elements in the array that are the
> >> strings without iterating over each element in the array?
> >> or
> >> 2) Could I quickly turn it into a masked array of type float where all
> >> string elements are set as missing points?
> >> I've been struggling with this for a while and can't come across a
> method
> >> that will all me to do it without iterating over each element.
> >> Any help or pointers in the right direction would be greatly
> appreciated.
> >> Thanks,
> >> Darryl
> >
> >
> >
> > --
> > ______________________________________
> > Darryl Wallace: Project Leader
> > ProSensus Inc.
> > McMaster Innovation Park
> > 175 Longwood Road South, Suite 301
> > Hamilton, Ontario, L8P 0A1
> > Canada (GMT -05:00)
> >
> > Tel: 1-905-528-9136
> > Fax: 1-905-546-1372
> >
> > Web site: http://www.prosensus.ca/
> > ______________________________________
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
>
>
> --
> Gökhan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
______________________________________
Darryl Wallace: Project Leader
ProSensus Inc.
McMaster Innovation Park
175 Longwood Road South, Suite 301
Hamilton, Ontario, L8P 0A1
Canada (GMT -05:00)
Tel: 1-905-528-9136
Fax: 1-905-546-1372
Web site: http://www.prosensus.ca/
______________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091110/dfa9700c/attachment.html>
More information about the NumPy-Discussion
mailing list