
Hi everyone, I'm quite new to numpy and python either. Could someone, please, tell me what I'm doing wrong? Here goes my peace of code: def stats(filename): """Utilility to perform some basic statistics on columns.""" tab = get_textab(filename) stat_list = [ ] for row in sort_tab(tab): if row['length'] >= 15: stat_list.append(row) stat_array = np.array(stat_list) print type(sort_tab(tab)) print type(stat_array) #print stat_array.mean(axis=0) print np.mean(stat_array, axis=0) Which results in: <type 'numpy.ndarray'> <type 'numpy.ndarray'> Traceback (most recent call last): File "/home/ferreirafm/bin/cross.py", line 213, in <module> main() File "/home/ferreirafm/bin/cross.py", line 204, in main stats(filename) File "/home/ferreirafm/bin/cross.py", line 146, in stats print np.mean(stat_array, axis=0) File "/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py", line 2374, in mean return mean(axis, dtype, out) TypeError: unsupported operand type(s) for +: 'numpy.void' and 'numpy.void' -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32945124.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

On Fri, Dec 9, 2011 at 11:47 AM, ferreirafm <ferreirafm@lim12.fm.usp.br> wrote:
Hi everyone, I'm quite new to numpy and python either. Could someone, please, tell me what I'm doing wrong? Here goes my peace of code:
def stats(filename): """Utilility to perform some basic statistics on columns.""" tab = get_textab(filename) stat_list = [ ] for row in sort_tab(tab): if row['length'] >= 15: stat_list.append(row) stat_array = np.array(stat_list) print type(sort_tab(tab)) print type(stat_array) #print stat_array.mean(axis=0) print np.mean(stat_array, axis=0)
Which results in: <type 'numpy.ndarray'> <type 'numpy.ndarray'>
When posting to the mailing list, it's a good idea to have a small, self contained example (otherwise we can't reproduce your problem). In this specific case, I'd like to be able to see what the outputs of "print tab" and "print stat_array" are. Regards Stéfan

Stéfan van der Walt wrote:
When posting to the mailing list, it's a good idea to have a small, self contained example (otherwise we can't reproduce your problem). In this specific case, I'd like to be able to see what the outputs of "print tab" and "print stat_array" are.
Regards Stéfan
Hi Stéfan, Thanks for your replay. Have a look in the arrays at: http://ompldr.org/vYm83ZA Regards, Fred -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32951098.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

On Sat, Dec 10, 2011 at 5:47 AM, ferreirafm <ferreirafm@lim12.fm.usp.br>wrote:
Hi Stéfan, Thanks for your replay. Have a look in the arrays at: http://ompldr.org/vYm83ZA Regards, Fred --
I can recreate this error if tab is a structured ndarray - what is the dtype of tab? If that is correct, I think you could fix this by simplifying things. Since tab is already an ndarray, you should not need to convert it back into a python list. By converting the ndarray back to a list you are making an extra level of "wrapping" as a python object, which is ultimately why you get that error about adding numpy.void. Unfortunately you cannot take directly take a mean of a struct dtype; structs are generic so they could have fields with strings, or objects, etc, that would be invalid for a mean calculation. However the following code fragment should work pretty efficiently. It will make a 1-element array of the same dtype as tab, and then populate it with the mean value of all elements where the length is >= 15. Note that dtype.fields.keys() gives you a nice way to iterate over the fields in the struct dtype: length_mask = tab['length'] >= 15 tab_means = np.zeros(1, dtype=tab.dtype) for k in tab.dtype.fields.keys(): tab_means[k] = np.mean( tab[k][mask] ) In general this would not work if tab has a field that is not a simple numeric type, such as a str, object, ... But it looks like your arrays are all numeric from your example above. Hope that helps, Aronne

Aronne Merrelli wrote:
I can recreate this error if tab is a structured ndarray - what is the dtype of tab?
If that is correct, I think you could fix this by simplifying things. Since tab is already an ndarray, you should not need to convert it back into a python list. By converting the ndarray back to a list you are making an extra level of "wrapping" as a python object, which is ultimately why you get that error about adding numpy.void.
Unfortunately you cannot take directly take a mean of a struct dtype; structs are generic so they could have fields with strings, or objects, etc, that would be invalid for a mean calculation. However the following code fragment should work pretty efficiently. It will make a 1-element array of the same dtype as tab, and then populate it with the mean value of all elements where the length is >= 15. Note that dtype.fields.keys() gives you a nice way to iterate over the fields in the struct dtype:
length_mask = tab['length'] >= 15 tab_means = np.zeros(1, dtype=tab.dtype) for k in tab.dtype.fields.keys(): tab_means[k] = np.mean( tab[k][mask] )
In general this would not work if tab has a field that is not a simple numeric type, such as a str, object, ... But it looks like your arrays are all numeric from your example above.
Hope that helps, Aronne
HI Aronne, Thanks for your replay. Indeed, tab is a mix of different column types: tab.dtype: [('sgi', '<i8'), ('length', '<i8'), ('nident', '<i8'), ('pident', '<f8'), ('positive', '<i8'), ('ppos', '<f8'), ('mismatch', '<i8'), ('qstart', '<i8'), ('qend', '<i8'), ('sstart', '<i8'), ('send', '<i8'), ('gapopen', '<i8'), ('gaps', '<i8'), ('evalue', '<f8'), ('bitscore', '<f8'), ('score', '<f8')] Interestingly, I couldn't be able to import some columns of digits as strings like as with R dataframe objects. I'll try to adapt your example to my needs and let you know the results. Regards. -- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32955052.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

Hi Fred, I would suggest you to have a look at pandas (http://pandas.sourceforge.net/) . It was really helpful for me. It seems well suited for the type of data that you are working with. It has nice "brodcasting" capabilities to apply numpy functions to a set column. http://pandas.sourceforge.net/basics.html#descriptive-statistics http://pandas.sourceforge.net/basics.html#function-application Cheers, Eraldo On Sun, Dec 11, 2011 at 1:49 PM, ferreirafm <ferreirafm@lim12.fm.usp.br>wrote:
Aronne Merrelli wrote:
I can recreate this error if tab is a structured ndarray - what is the dtype of tab?
If that is correct, I think you could fix this by simplifying things. Since tab is already an ndarray, you should not need to convert it back into a python list. By converting the ndarray back to a list you are making an extra level of "wrapping" as a python object, which is ultimately why you get that error about adding numpy.void.
Unfortunately you cannot take directly take a mean of a struct dtype; structs are generic so they could have fields with strings, or objects, etc, that would be invalid for a mean calculation. However the following code fragment should work pretty efficiently. It will make a 1-element array of the same dtype as tab, and then populate it with the mean value of all elements where the length is >= 15. Note that dtype.fields.keys() gives you a nice way to iterate over the fields in the struct dtype:
length_mask = tab['length'] >= 15 tab_means = np.zeros(1, dtype=tab.dtype) for k in tab.dtype.fields.keys(): tab_means[k] = np.mean( tab[k][mask] )
In general this would not work if tab has a field that is not a simple numeric type, such as a str, object, ... But it looks like your arrays
are
all numeric from your example above.
Hope that helps, Aronne
HI Aronne, Thanks for your replay. Indeed, tab is a mix of different column types: tab.dtype: [('sgi', '<i8'), ('length', '<i8'), ('nident', '<i8'), ('pident', '<f8'), ('positive', '<i8'), ('ppos', '<f8'), ('mismatch', '<i8'), ('qstart', '<i8'), ('qend', '<i8'), ('sstart', '<i8'), ('send', '<i8'), ('gapopen', '<i8'), ('gaps', '<i8'), ('evalue', '<f8'), ('bitscore', '<f8'), ('score', '<f8')] Interestingly, I couldn't be able to import some columns of digits as strings like as with R dataframe objects. I'll try to adapt your example to my needs and let you know the results. Regards.
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32955052.html Sent from the Numpy-discussion mailing list archive at Nabble.com.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Eraldo, Thanks for your suggestion. I was using pytables but give up after known that some very useful capabilities are sold as a professional package. However, it still useful to many printing and data manipulation and, also, it can handle extremely large datasets (which is not my case.). Regards, Fred Eraldo Pomponi wrote:
I would suggest you to have a look at pandas (http://pandas.sourceforge.net/) . It was really helpful for me. It seems well suited for the type of data that you are working with. It has nice "brodcasting" capabilities to apply numpy functions to a set column. http://pandas.sourceforge.net/basics.html#descriptive-statistics http://pandas.sourceforge.net/basics.html#function-application
Cheers, Eraldo
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32970295.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

Hi Fred, Pandas has a nice interface to PyTable if you still need it: http://pandas.sourceforge.net/io.html#hdf5-pytables However, my intention was just to point you to pandas because it is really a powerful tool if you need to deal with tabular heterogenic data. It is also important to notice that there are plans in the numpy community to include/port "part" of this package directly in the codebase. This says a lot about how good it is... Best, Eraldo On Tue, Dec 13, 2011 at 9:01 PM, ferreirafm <ferreirafm@lim12.fm.usp.br>wrote:
Hi Eraldo, Thanks for your suggestion. I was using pytables but give up after known that some very useful capabilities are sold as a professional package. However, it still useful to many printing and data manipulation and, also, it can handle extremely large datasets (which is not my case.). Regards, Fred
Eraldo Pomponi wrote:
I would suggest you to have a look at pandas (http://pandas.sourceforge.net/) . It was really helpful for me. It seems well suited for the type of data that you are working with. It has nice "brodcasting" capabilities to apply numpy functions to
a
set column. http://pandas.sourceforge.net/basics.html#descriptive-statistics http://pandas.sourceforge.net/basics.html#function-application
Cheers, Eraldo
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32970295.html Sent from the Numpy-discussion mailing list archive at Nabble.com.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Eraldo, Indeed, Pandas is a really really nice module! If it is going to take part of numpy, that's even better. Thanks for the suggestion. All the Best, Fred Eraldo Pomponi wrote:
Hi Fred,
Pandas has a nice interface to PyTable if you still need it:
http://pandas.sourceforge.net/io.html#hdf5-pytables
However, my intention was just to point you to pandas because it is really a powerful tool if you need to deal with tabular heterogenic data. It is also important to notice that there are plans in the numpy community to include/port "part" of this package directly in the codebase. This says a lot about how good it is...
Best, Eraldo
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32975342.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

Hi Eraldo, Indeed Pandas is a really really nice module. If it going to take part of numpy, that's even better. Thanks for the suggestion. All the Best, Fred Eraldo Pomponi wrote:
Hi Fred,
Pandas has a nice interface to PyTable if you still need it:
http://pandas.sourceforge.net/io.html#hdf5-pytables
However, my intention was just to point you to pandas because it is really a powerful tool if you need to deal with tabular heterogenic data. It is also important to notice that there are plans in the numpy community to include/port "part" of this package directly in the codebase. This says a lot about how good it is...
Best, Eraldo
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32975344.html Sent from the Numpy-discussion mailing list archive at Nabble.com.

Note that the pytables pro you are referring to is no longer behind a pay wall. Recently the project went through some changes and the pro versions disappeared. All pro features where merged into the main project and, are as a consequence, also available for free. Regards, David On 13/12/11 21:01, ferreirafm wrote:
Hi Eraldo, Thanks for your suggestion. I was using pytables but give up after known that some very useful capabilities are sold as a professional package. However, it still useful to many printing and data manipulation and, also, it can handle extremely large datasets (which is not my case.). Regards, Fred
Eraldo Pomponi wrote:
I would suggest you to have a look at pandas (http://pandas.sourceforge.net/) . It was really helpful for me. It seems well suited for the type of data that you are working with. It has nice "brodcasting" capabilities to apply numpy functions to a set column. http://pandas.sourceforge.net/basics.html#descriptive-statistics http://pandas.sourceforge.net/basics.html#function-application
Cheers, Eraldo

Thanks for the correction. Good to know! I've got this outdated information from pytable's mailing list. Regards, Fred David Verelst wrote:
Note that the pytables pro you are referring to is no longer behind a pay wall. Recently the project went through some changes and the pro versions disappeared. All pro features where merged into the main project and, are as a consequence, also available for free.
Regards, David
-- View this message in context: http://old.nabble.com/numpy.mean-problems-tp32945124p32975340.html Sent from the Numpy-discussion mailing list archive at Nabble.com.
participants (5)
-
Aronne Merrelli
-
David Verelst
-
Eraldo Pomponi
-
ferreirafm
-
Stéfan van der Walt