On Wed, Feb 19, 2020 at 5:23 PM FilippoM <privacy_please@filippo.it> wrote:
Hi, I've got a Pandas data frame that looks like this
In [69]: data.head Out[69]: <bound method NDFrame.head of OS and Version Status 0 Android VIDEO_OK 1 Android 4.2.2 VIDEO_OK 2 Android 9 VIDEO_OK 3 iOS 13.3 VIDEO_OK 4 Windows 10 VIDEO_OK 5 Android 9 VIDEO_OK ... ... 24 Windows 10 VIDEO_OK 25 Android 9 VIDEO_OK 26 Android 6.0.1 VIDEO_OK 27 Windows XP VIDEO_OK 28 Android 8.0.0 VIDEO_FAILURE 29 Android 6.0 VIDEO_OK ... ... 2994 iOS 9.1 VIDEO_OK 2995 Android 9 VIDEO_OK 2996 Windows 10 VIDEO_OK 2997 Android 9 VIDEO_OK 2998 Windows 10 VIDEO_OK 2999 iOS 13.3 VIDEO_OK
with 109 possible values of the OS columns and just two possible values ()VIDEO_OK and VIDEO_FAILURE) in the status column.
How can I use Pandas' dataframe magic to calculate, for each of the possible 109 values, how many have VIDEO_OK, and how many have VIDEO_FAILURE I have respectively?
I would like to end up with something like
In[]: num_of_oks{"iOS 13.3"} Out: 15
In[]: num_of_not_oks{"iOS 13.3"} Out: 3
I am trying to do some matplotlib scatter plotting
Thanks
Have you considered using traditional unix tools, like cut and count? Or traditional SQL.