[Numpy-discussion] [pystatsmodels] Re: ANN: pandas 0.10.0 released

Chang She chang at lambdafoundry.com
Fri Dec 21 16:13:32 EST 2012


On Dec 21, 2012, at 3:27 PM, Collin Sellman <collin.sellman at gmail.com> wrote:

> Thanks, Wes and team.  I've been looking through the new features, but haven't found any documentation on the integration with the Google Analytics API.  I was just in the midst of trying to pull data into Pandas from GA in v.0.9.0, so would love to try what you built in .10.
> 
> -Collin
> 
> On Monday, December 17, 2012 10:19:49 AM UTC-7, Wes McKinney wrote:
> hi all, 
> 
> I'm super excited to announce the pandas 0.10.0 release. This is 
> a major release including a new high performance file reading 
> engine with tons of new user-facing functionality as well, a 
> bunch of work on the HDF5/PyTables integration layer, 
> much-expanded Unicode support, a new option/configuration 
> interface, integration with the Google Analytics API, and a wide 
> array of other new features, bug fixes, and performance 
> improvements. I strongly recommend that all users get upgraded as 
> soon as feasible. Many performance improvements made are quite 
> substantial over 0.9.x, see vbenchmarks at the end of the e-mail. 
> 
> As of this release, we are no longer supporting Python 2.5. Also, 
> this is the first release to officially support Python 3.3. 
> 
> Note: there are a number of minor, but necessary API changes that 
> long-time pandas users should pay attention to in the What's New. 
> 
> Thanks to all who contributed to this release, especially Chang 
> She, Yoval P, and Jeff Reback (and everyone else listed in the 
> commit log!). 
> 
> As always source archives and Windows installers are on PyPI. 
> 
> What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html 
> Installers: http://pypi.python.org/pypi/pandas 
> 
> $ git log v0.9.1..v0.10.0 --pretty=format:%aN | sort | uniq -c | sort -rn 
>     246 Wes McKinney 
>     140 y-p 
>      99 Chang She 
>      45 jreback 
>      18 Abraham Flaxman 
>      17 Jeff Reback 
>      14 locojaydev 
>      11 Keith Hughitt 
>       5 Adam Obeng 
>       2 Dieter Vandenbussche 
>       1 zach powers 
>       1 Luke Lee 
>       1 Laurent Gautier 
>       1 Ken Van Haren 
>       1 Jay Bourque 
>       1 Donald Curtis 
>       1 Chris Mulligan 
>       1 alex arsenovic 
>       1 A. Flaxman 
> 
> Happy data hacking! 
> 
> - Wes 
> 
> What is it 
> ========== 
> pandas is a Python package providing fast, flexible, and 
> expressive data structures designed to make working with 
> relational, time series, or any other kind of labeled data both 
> easy and intuitive. It aims to be the fundamental high-level 
> building block for doing practical, real world data analysis in 
> Python. 
> 
> Links 
> ===== 
> Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst 
> Documentation: http://pandas.pydata.org 
> Installers: http://pypi.python.org/pypi/pandas 
> Code Repository: http://github.com/pydata/pandas 
> Mailing List: http://groups.google.com/group/pydata 
> 
> Performance vs. v0.9.0 
> ====================== 
> 
> Benchmarks from https://github.com/pydata/pandas/tree/master/vb_suite 
> Ratio < 1 means that v0.10.0 is faster 
> 
>                                            v0.10.0     v0.9.0      ratio 
> name 
> unstack_sparse_keyspace                     1.2813   144.1262     0.0089 
> groupby_frame_apply_overhead               20.1520   337.3330     0.0597 
> read_csv_comment2                          25.3097   363.2860     0.0697 
> groupbym_frame_apply                       75.1554   504.1661     0.1491 
> frame_iteritems_cached                      0.0711     0.3919     0.1815 
> read_csv_thou_vb                           35.2690   191.9360     0.1838 
> concat_small_frames                        12.9019    55.3561     0.2331 
> join_dataframe_integer_2key                 5.8184    21.5823     0.2696 
> series_value_counts_strings                 5.3824    19.1262     0.2814 
> append_frame_single_homogenous              0.3413     0.9319     0.3662 
> read_csv_vb                                18.4084    46.9500     0.3921 
> read_csv_standard                          12.0651    29.9940     0.4023 
> panel_from_dict_all_different_indexes      73.6860   158.2949     0.4655 
> frame_constructor_ndarray                   0.0471     0.0958     0.4918 
> groupby_first                               3.8502     7.1988     0.5348 
> groupby_last                                3.6962     6.7792     0.5452 
> panel_from_dict_two_different_indexes      50.7428    86.4980     0.5866 
> append_frame_single_mixed                   1.2950     2.1930     0.5905 
> frame_get_numeric_data                      0.0695     0.1119     0.6212 
> replace_fillna                              4.6349     7.0540     0.6571 
> frame_to_csv                              281.9340   427.7921     0.6590 
> replace_replacena                           4.7154     7.1207     0.6622 
> frame_iteritems                             2.5862     3.7463     0.6903 
> series_align_int64_index                   29.7370    41.2791     0.7204 
> join_dataframe_integer_key                  1.7980     2.4303     0.7398 
> groupby_multi_size                         31.0066    41.7001     0.7436 
> groupby_frame_singlekey_integer             2.3579     3.1649     0.7450 
> write_csv_standard                        326.8259   427.3241     0.7648 
> groupby_simple_compress_timing             41.2113    52.3993     0.7865 
> frame_fillna_inplace                       16.2843    20.0491     0.8122 
> reindex_fillna_backfill                     0.1364     0.1667     0.8181 
> groupby_multi_series_op                    15.2914    18.6651     0.8193 
> groupby_multi_cython                       17.2169    20.4420     0.8422 
> frame_fillna_many_columns_pad              14.9510    17.5114     0.8538 
> panel_from_dict_equiv_indexes              25.8427    29.9682     0.8623 
> merge_2intkey_nosort                       19.0755    22.1138     0.8626 
> sparse_series_to_frame                    167.8529   192.9920     0.8697 
> reindex_fillna_pad                          0.1410     0.1617     0.8720 
> merge_2intkey_sort                         44.7863    51.3315     0.8725 
> reshape_stack_simple                        2.6698     3.0502     0.8753 
> groupby_indices                             7.2264     8.2314     0.8779 
> sort_level_one                              4.3845     4.9902     0.8786 
> sort_level_zero                             4.3362     4.9198     0.8814 
> write_store                                16.0587    18.2042     0.8821 
> frame_reindex_both_axes                     0.3726     0.4183     0.8907 
> groupby_multi_different_numpy_functions    13.4164    15.0509     0.8914 
> index_int64_intersection                   25.3705    28.1867     0.9001 
> groupby_frame_median                        7.7491     8.6011     0.9009 
> frame_drop_dup_na_inplace                   2.6290     2.9155     0.9017 
> dataframe_reindex_columns                   0.3052     0.3372     0.9049 
> join_dataframe_index_multi                 20.5651    22.6893     0.9064 
> frame_ctor_list_of_dict                   101.7439   112.2260     0.9066 
> groupby_pivot_table                        18.4551    20.3184     0.9083 
> reindex_frame_level_align                   0.9644     1.0531     0.9158 
> stat_ops_level_series_sum_multiple          7.3637     8.0230     0.9178 
> write_store_mixed                          38.2528    41.6604     0.9182 
> frame_reindex_both_axes_ix                  0.4550     0.4950     0.9192 
> stat_ops_level_frame_sum_multiple           8.1975     8.9055     0.9205 
> panel_from_dict_same_index                 25.7938    28.0147     0.9207 
> groupby_series_simple_cython                5.1310     5.5624     0.9224 
> frame_sort_index_by_columns                41.9577    45.1816     0.9286 
> groupby_multi_python                       54.9727    59.0400     0.9311 
> datetimeindex_add_offset                    0.2417     0.2584     0.9356 
> frame_boolean_row_select                    0.2905     0.3100     0.9373 
> frame_reindex_axis1                         2.9760     3.1742     0.9376 
> stat_ops_level_series_sum                   2.3382     2.4937     0.9376 
> groupby_multi_different_functions          14.0333    14.9571     0.9382 
> timeseries_timestamp_tzinfo_cons            0.0159     0.0169     0.9397 
> stats_rolling_mean                          1.6904     1.7959     0.9413 
> melt_dataframe                              1.5236     1.6181     0.9416 
> timeseries_asof_single                      0.0548     0.0582     0.9416 
> frame_ctor_nested_dict_int64              134.3100   142.6389     0.9416 
> join_dataframe_index_single_key_bigger     15.6578    16.5949     0.9435 
> stat_ops_level_frame_sum                    3.2475     3.4414     0.9437 
> indexing_dataframe_boolean_rows             0.2382     0.2518     0.9459 
> timeseries_asof_nan                        10.0433    10.6006     0.9474 
> frame_reindex_axis0                         1.4403     1.5184     0.9485 
> concat_series_axis1                        69.2988    72.8099     0.9518 
> join_dataframe_index_single_key_small       6.8492     7.1847     0.9533 
> dataframe_reindex_daterange                 0.4054     0.4240     0.9562 
> join_dataframe_index_single_key_bigger      6.4616     6.7578     0.9562 
> timeseries_timestamp_downsample_mean        4.5849     4.7787     0.9594 
> frame_fancy_lookup                          2.5498     2.6544     0.9606 
> series_value_counts_int64                   2.5569     2.6581     0.9619 
> frame_fancy_lookup_all                     30.7510    31.8465     0.9656 
> index_int64_union                          82.2279    85.1500     0.9657 
> indexing_dataframe_boolean_rows_object      0.4809     0.4977     0.9662 
> frame_ctor_nested_dict                     91.6129    94.8122     0.9663 
> stat_ops_series_std                         0.2450     0.2533     0.9673 
> groupby_frame_cython_many_columns           3.7642     3.8894     0.9678 
> timeseries_asof                            10.4352    10.7721     0.9687 
> series_ctor_from_dict                       3.7707     3.8749     0.9731 
> frame_drop_dup_inplace                      3.0007     3.0746     0.9760 
> timeseries_large_lookup_value               0.0242     0.0248     0.9764 
> read_table_multiple_date_baseline        1201.2930  1224.3881     0.9811 
> dti_reset_index                             0.6339     0.6457     0.9817 
> read_table_multiple_date                 2600.7280  2647.8729     0.9822 
> reindex_frame_level_reindex                 0.9524     0.9674     0.9845 
> reindex_multiindex                          1.3483     1.3685     0.9853 
> frame_insert_500_columns                  102.1249   103.4329     0.9874 
> frame_drop_duplicates                      19.3780    19.6157     0.9879 
> reindex_daterange_backfill                  0.1870     0.1889     0.9899 
> stats_rank2d_axis0_average                 25.0480    25.2801     0.9908 
> series_align_left_monotonic                13.1929    13.2558     0.9953 
> timeseries_add_irregular                   22.4635    22.5122     0.9978 
> read_store_mixed                           13.4398    13.4560     0.9988 
> lib_fast_zip                               11.1289    11.1354     0.9994 
> match_strings                               0.3831     0.3833     0.9995 
> read_store                                  5.5526     5.5290     1.0043 
> timeseries_sort_index                      22.7172    22.5976     1.0053 
> timeseries_1min_5min_mean                   0.6224     0.6175     1.0079 
> stats_rank2d_axis1_average                 14.6569    14.5339     1.0085 
> reindex_daterange_pad                       0.1886     0.1867     1.0102 
> timeseries_period_downsample_mean           6.4241     6.3480     1.0120 
> frame_drop_duplicates_na                   19.3303    19.0970     1.0122 
> stats_rank_average_int                     23.3569    22.9996     1.0155 
> lib_fast_zip_fillna                        14.1394    13.8473     1.0211 
> index_datetime_intersection                17.2626    16.8986     1.0215 
> timeseries_1min_5min_ohlc                   0.7054     0.6891     1.0237 
> stats_rank_average                         31.3440    30.3845     1.0316 
> timeseries_infer_freq                      10.9854    10.6439     1.0321 
> timeseries_slice_minutely                   0.0637     0.0611     1.0418 
> index_datetime_union                       17.9083    17.1640     1.0434 
> series_align_irregular_string              89.9470    85.1344     1.0565 
> series_constructor_ndarray                  0.0127     0.0119     1.0742 
> indexing_panel_subset                       0.5692     0.5214     1.0917 
> groupby_apply_dict_return                  46.3497    42.3220     1.0952 
> reshape_unstack_simple                      3.2901     2.9089     1.1310 
> timeseries_to_datetime_iso8601              4.2305     3.6015     1.1746 
> frame_to_string_floats                     53.6217    37.2041     1.4413 
> reshape_pivot_time_series                 170.4340   107.9068     1.5795 
> sparse_frame_constructor                    6.2714     3.5053     1.7891 
> datetimeindex_normalize                    37.2718     6.9329     5.3761 
> 
> Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio 


Hi Collin, 

I didn't add it to the official docs because of the authentication step complicating the doc build, but you can reference this brief blog post I wrote here: 
http://quantabee.wordpress.com/2012/12/17/google-analytics-pandas/

Best,

Chang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121221/4a5b242a/attachment.html>


More information about the NumPy-Discussion mailing list