Re: [Numpy-discussion] [pystatsmodels] Re: ANN: pandas 0.10.0 released
On Dec 21, 2012, at 3:27 PM, Collin Sellman <collin.sellman@gmail.com> wrote:
Thanks, Wes and team. I've been looking through the new features, but haven't found any documentation on the integration with the Google Analytics API. I was just in the midst of trying to pull data into Pandas from GA in v.0.9.0, so would love to try what you built in .10.
-Collin
On Monday, December 17, 2012 10:19:49 AM UTC-7, Wes McKinney wrote: hi all,
I'm super excited to announce the pandas 0.10.0 release. This is a major release including a new high performance file reading engine with tons of new user-facing functionality as well, a bunch of work on the HDF5/PyTables integration layer, much-expanded Unicode support, a new option/configuration interface, integration with the Google Analytics API, and a wide array of other new features, bug fixes, and performance improvements. I strongly recommend that all users get upgraded as soon as feasible. Many performance improvements made are quite substantial over 0.9.x, see vbenchmarks at the end of the e-mail.
As of this release, we are no longer supporting Python 2.5. Also, this is the first release to officially support Python 3.3.
Note: there are a number of minor, but necessary API changes that long-time pandas users should pay attention to in the What's New.
Thanks to all who contributed to this release, especially Chang She, Yoval P, and Jeff Reback (and everyone else listed in the commit log!).
As always source archives and Windows installers are on PyPI.
What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html Installers: http://pypi.python.org/pypi/pandas
$ git log v0.9.1..v0.10.0 --pretty=format:%aN | sort | uniq -c | sort -rn 246 Wes McKinney 140 y-p 99 Chang She 45 jreback 18 Abraham Flaxman 17 Jeff Reback 14 locojaydev 11 Keith Hughitt 5 Adam Obeng 2 Dieter Vandenbussche 1 zach powers 1 Luke Lee 1 Laurent Gautier 1 Ken Van Haren 1 Jay Bourque 1 Donald Curtis 1 Chris Mulligan 1 alex arsenovic 1 A. Flaxman
Happy data hacking!
- Wes
What is it ========== pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational, time series, or any other kind of labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
Links ===== Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst Documentation: http://pandas.pydata.org Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/pydata/pandas Mailing List: http://groups.google.com/group/pydata
Performance vs. v0.9.0 ======================
Benchmarks from https://github.com/pydata/pandas/tree/master/vb_suite Ratio < 1 means that v0.10.0 is faster
v0.10.0 v0.9.0 ratio name unstack_sparse_keyspace 1.2813 144.1262 0.0089 groupby_frame_apply_overhead 20.1520 337.3330 0.0597 read_csv_comment2 25.3097 363.2860 0.0697 groupbym_frame_apply 75.1554 504.1661 0.1491 frame_iteritems_cached 0.0711 0.3919 0.1815 read_csv_thou_vb 35.2690 191.9360 0.1838 concat_small_frames 12.9019 55.3561 0.2331 join_dataframe_integer_2key 5.8184 21.5823 0.2696 series_value_counts_strings 5.3824 19.1262 0.2814 append_frame_single_homogenous 0.3413 0.9319 0.3662 read_csv_vb 18.4084 46.9500 0.3921 read_csv_standard 12.0651 29.9940 0.4023 panel_from_dict_all_different_indexes 73.6860 158.2949 0.4655 frame_constructor_ndarray 0.0471 0.0958 0.4918 groupby_first 3.8502 7.1988 0.5348 groupby_last 3.6962 6.7792 0.5452 panel_from_dict_two_different_indexes 50.7428 86.4980 0.5866 append_frame_single_mixed 1.2950 2.1930 0.5905 frame_get_numeric_data 0.0695 0.1119 0.6212 replace_fillna 4.6349 7.0540 0.6571 frame_to_csv 281.9340 427.7921 0.6590 replace_replacena 4.7154 7.1207 0.6622 frame_iteritems 2.5862 3.7463 0.6903 series_align_int64_index 29.7370 41.2791 0.7204 join_dataframe_integer_key 1.7980 2.4303 0.7398 groupby_multi_size 31.0066 41.7001 0.7436 groupby_frame_singlekey_integer 2.3579 3.1649 0.7450 write_csv_standard 326.8259 427.3241 0.7648 groupby_simple_compress_timing 41.2113 52.3993 0.7865 frame_fillna_inplace 16.2843 20.0491 0.8122 reindex_fillna_backfill 0.1364 0.1667 0.8181 groupby_multi_series_op 15.2914 18.6651 0.8193 groupby_multi_cython 17.2169 20.4420 0.8422 frame_fillna_many_columns_pad 14.9510 17.5114 0.8538 panel_from_dict_equiv_indexes 25.8427 29.9682 0.8623 merge_2intkey_nosort 19.0755 22.1138 0.8626 sparse_series_to_frame 167.8529 192.9920 0.8697 reindex_fillna_pad 0.1410 0.1617 0.8720 merge_2intkey_sort 44.7863 51.3315 0.8725 reshape_stack_simple 2.6698 3.0502 0.8753 groupby_indices 7.2264 8.2314 0.8779 sort_level_one 4.3845 4.9902 0.8786 sort_level_zero 4.3362 4.9198 0.8814 write_store 16.0587 18.2042 0.8821 frame_reindex_both_axes 0.3726 0.4183 0.8907 groupby_multi_different_numpy_functions 13.4164 15.0509 0.8914 index_int64_intersection 25.3705 28.1867 0.9001 groupby_frame_median 7.7491 8.6011 0.9009 frame_drop_dup_na_inplace 2.6290 2.9155 0.9017 dataframe_reindex_columns 0.3052 0.3372 0.9049 join_dataframe_index_multi 20.5651 22.6893 0.9064 frame_ctor_list_of_dict 101.7439 112.2260 0.9066 groupby_pivot_table 18.4551 20.3184 0.9083 reindex_frame_level_align 0.9644 1.0531 0.9158 stat_ops_level_series_sum_multiple 7.3637 8.0230 0.9178 write_store_mixed 38.2528 41.6604 0.9182 frame_reindex_both_axes_ix 0.4550 0.4950 0.9192 stat_ops_level_frame_sum_multiple 8.1975 8.9055 0.9205 panel_from_dict_same_index 25.7938 28.0147 0.9207 groupby_series_simple_cython 5.1310 5.5624 0.9224 frame_sort_index_by_columns 41.9577 45.1816 0.9286 groupby_multi_python 54.9727 59.0400 0.9311 datetimeindex_add_offset 0.2417 0.2584 0.9356 frame_boolean_row_select 0.2905 0.3100 0.9373 frame_reindex_axis1 2.9760 3.1742 0.9376 stat_ops_level_series_sum 2.3382 2.4937 0.9376 groupby_multi_different_functions 14.0333 14.9571 0.9382 timeseries_timestamp_tzinfo_cons 0.0159 0.0169 0.9397 stats_rolling_mean 1.6904 1.7959 0.9413 melt_dataframe 1.5236 1.6181 0.9416 timeseries_asof_single 0.0548 0.0582 0.9416 frame_ctor_nested_dict_int64 134.3100 142.6389 0.9416 join_dataframe_index_single_key_bigger 15.6578 16.5949 0.9435 stat_ops_level_frame_sum 3.2475 3.4414 0.9437 indexing_dataframe_boolean_rows 0.2382 0.2518 0.9459 timeseries_asof_nan 10.0433 10.6006 0.9474 frame_reindex_axis0 1.4403 1.5184 0.9485 concat_series_axis1 69.2988 72.8099 0.9518 join_dataframe_index_single_key_small 6.8492 7.1847 0.9533 dataframe_reindex_daterange 0.4054 0.4240 0.9562 join_dataframe_index_single_key_bigger 6.4616 6.7578 0.9562 timeseries_timestamp_downsample_mean 4.5849 4.7787 0.9594 frame_fancy_lookup 2.5498 2.6544 0.9606 series_value_counts_int64 2.5569 2.6581 0.9619 frame_fancy_lookup_all 30.7510 31.8465 0.9656 index_int64_union 82.2279 85.1500 0.9657 indexing_dataframe_boolean_rows_object 0.4809 0.4977 0.9662 frame_ctor_nested_dict 91.6129 94.8122 0.9663 stat_ops_series_std 0.2450 0.2533 0.9673 groupby_frame_cython_many_columns 3.7642 3.8894 0.9678 timeseries_asof 10.4352 10.7721 0.9687 series_ctor_from_dict 3.7707 3.8749 0.9731 frame_drop_dup_inplace 3.0007 3.0746 0.9760 timeseries_large_lookup_value 0.0242 0.0248 0.9764 read_table_multiple_date_baseline 1201.2930 1224.3881 0.9811 dti_reset_index 0.6339 0.6457 0.9817 read_table_multiple_date 2600.7280 2647.8729 0.9822 reindex_frame_level_reindex 0.9524 0.9674 0.9845 reindex_multiindex 1.3483 1.3685 0.9853 frame_insert_500_columns 102.1249 103.4329 0.9874 frame_drop_duplicates 19.3780 19.6157 0.9879 reindex_daterange_backfill 0.1870 0.1889 0.9899 stats_rank2d_axis0_average 25.0480 25.2801 0.9908 series_align_left_monotonic 13.1929 13.2558 0.9953 timeseries_add_irregular 22.4635 22.5122 0.9978 read_store_mixed 13.4398 13.4560 0.9988 lib_fast_zip 11.1289 11.1354 0.9994 match_strings 0.3831 0.3833 0.9995 read_store 5.5526 5.5290 1.0043 timeseries_sort_index 22.7172 22.5976 1.0053 timeseries_1min_5min_mean 0.6224 0.6175 1.0079 stats_rank2d_axis1_average 14.6569 14.5339 1.0085 reindex_daterange_pad 0.1886 0.1867 1.0102 timeseries_period_downsample_mean 6.4241 6.3480 1.0120 frame_drop_duplicates_na 19.3303 19.0970 1.0122 stats_rank_average_int 23.3569 22.9996 1.0155 lib_fast_zip_fillna 14.1394 13.8473 1.0211 index_datetime_intersection 17.2626 16.8986 1.0215 timeseries_1min_5min_ohlc 0.7054 0.6891 1.0237 stats_rank_average 31.3440 30.3845 1.0316 timeseries_infer_freq 10.9854 10.6439 1.0321 timeseries_slice_minutely 0.0637 0.0611 1.0418 index_datetime_union 17.9083 17.1640 1.0434 series_align_irregular_string 89.9470 85.1344 1.0565 series_constructor_ndarray 0.0127 0.0119 1.0742 indexing_panel_subset 0.5692 0.5214 1.0917 groupby_apply_dict_return 46.3497 42.3220 1.0952 reshape_unstack_simple 3.2901 2.9089 1.1310 timeseries_to_datetime_iso8601 4.2305 3.6015 1.1746 frame_to_string_floats 53.6217 37.2041 1.4413 reshape_pivot_time_series 170.4340 107.9068 1.5795 sparse_frame_constructor 6.2714 3.5053 1.7891 datetimeindex_normalize 37.2718 6.9329 5.3761
Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio
Hi Collin, I didn't add it to the official docs because of the authentication step complicating the doc build, but you can reference this brief blog post I wrote here: http://quantabee.wordpress.com/2012/12/17/google-analytics-pandas/ Best, Chang
participants (1)
-
Chang She