<div class="gmail_quote">Hello,</div><div class="gmail_quote"><br></div><div class="gmail_quote">On Tue, Nov 10, 2009 at 1:32 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com">gokhansever@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">On Tue, Nov 10, 2009 at 12:09 PM, Darryl Wallace<br>

<div class="im"><<a href="mailto:darryl.wallace@prosensus.ca">darryl.wallace@prosensus.ca</a>> wrote:<br>

</div><div class="im">> Hello again,<br>

> The best way so far that's come to my attention is to use:<br>

> numpy.ma.masked_object<br>

> The problem with this is that it's looking for a specific instance of an<br>

> object.  So if the user had some elements of their array that were, for<br>

> example, "randomString" , then it would not be picked up<br>

> e.g.<br>

> ---<br>

> from numpy import *<br>

> mixedArray=array([1,2, '', 3, 4, 'randomString'], dtype=object)<br>

> mixedArrayMask = ma.masked_object(mixedArray, 'randomString').mask<br>

> ---<br>

> then mixedArrayMask will yield:<br>

><br>

> array([ False, False, False, False, False, True])<br>

> Can anyone help me so that all strings are found in the array without having<br>

> to explicitly loop through them in Python?<br>

> Thanks,<br>

> Darryl<br>

<br>

</div>Why not stick to a same Missing-Value-Code or for all the non-valid<br>

data? I don't know how MA module would handle mixed MVCs in a same<br>

array without modifying the existing code. Otherwise looping over the<br>

array an masking the str instances as NaN would be my alternative<br>

solution.<br></blockquote><div><br></div><div>The reason  I don't stick to a standard missing value code is because a user may import other things in the datasheet that we need, like row or column labels, or maybe getting data from a specific source which reports missing data as a specific string.</div>

<div><br></div><div>I currently do as you suggested.  But when the dataset size becomes large, it gets to be quite slow due to the overhead of python looping.</div><div><br></div><div>Thanks<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div><div></div><div class="h5"><br>

<br>

><br>

> On Fri, Nov 6, 2009 at 3:56 PM, Darryl Wallace <<a href="mailto:darryl.wallace@prosensus.ca">darryl.wallace@prosensus.ca</a>><br>

> wrote:<br>

>><br>

>> What I'm doing is importing some data from excel and sometimes there are<br>

>> strings in the worksheet.  Often times a user will use an empty cell or a<br>

>> string to represent data that is missing.<br>

>> e.g.<br>

>> from numpy import *<br>

>> mixedArray=array([1, 2, '', 3, 4, 'String'], dtype=object)<br>

>> Two questions:<br>

>> 1) Is there a quick way to find the elements in the array that are the<br>

>> strings without iterating over each element in the array?<br>

>> or<br>

>> 2) Could I quickly turn it into a masked array of type float where all<br>

>> string elements are set as missing points?<br>

>> I've been struggling with this for a while and can't come across a method<br>

>> that will all me to do it without iterating over each element.<br>

>> Any help or pointers in the right direction would be greatly appreciated.<br>

>> Thanks,<br>

>> Darryl<br>

><br>

><br>

><br>

> --<br>

> ______________________________________<br>

> Darryl Wallace: Project Leader<br>

> ProSensus Inc.<br>

> McMaster Innovation Park<br>

> 175 Longwood Road South, Suite 301<br>

> Hamilton, Ontario, L8P 0A1<br>

> Canada        (GMT -05:00)<br>

><br>

> Tel:       1-905-528-9136<br>

> Fax:       1-905-546-1372<br>

><br>

> Web site:  <a href="http://www.prosensus.ca/" target="_blank">http://www.prosensus.ca/</a><br>

> ______________________________________<br>

><br>

</div></div>> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

><br>

><br>

<font color="#888888"><br>

<br>

<br>

--<br>

Gökhan<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

</font></blockquote></div><br><br clear="all"><br>-- <br>______________________________________<br>Darryl Wallace: Project Leader<br>ProSensus Inc.<br>McMaster Innovation Park<br>175 Longwood Road South, Suite 301<br>Hamilton, Ontario, L8P 0A1<br>

Canada        (GMT -05:00) <br><br>Tel:       1-905-528-9136<br>Fax:       1-905-546-1372<br><br>Web site:  <a href="http://www.prosensus.ca/">http://www.prosensus.ca/</a><br>______________________________________<br>