[Numpy-discussion] [pystatsmodels] Re: ANN: pandas 0.10.0 released
Chang She
chang at lambdafoundry.com
Fri Dec 21 16:13:32 EST 2012
On Dec 21, 2012, at 3:27 PM, Collin Sellman <collin.sellman at gmail.com> wrote:
> Thanks, Wes and team. I've been looking through the new features, but haven't found any documentation on the integration with the Google Analytics API. I was just in the midst of trying to pull data into Pandas from GA in v.0.9.0, so would love to try what you built in .10.
>
> -Collin
>
> On Monday, December 17, 2012 10:19:49 AM UTC-7, Wes McKinney wrote:
> hi all,
>
> I'm super excited to announce the pandas 0.10.0 release. This is
> a major release including a new high performance file reading
> engine with tons of new user-facing functionality as well, a
> bunch of work on the HDF5/PyTables integration layer,
> much-expanded Unicode support, a new option/configuration
> interface, integration with the Google Analytics API, and a wide
> array of other new features, bug fixes, and performance
> improvements. I strongly recommend that all users get upgraded as
> soon as feasible. Many performance improvements made are quite
> substantial over 0.9.x, see vbenchmarks at the end of the e-mail.
>
> As of this release, we are no longer supporting Python 2.5. Also,
> this is the first release to officially support Python 3.3.
>
> Note: there are a number of minor, but necessary API changes that
> long-time pandas users should pay attention to in the What's New.
>
> Thanks to all who contributed to this release, especially Chang
> She, Yoval P, and Jeff Reback (and everyone else listed in the
> commit log!).
>
> As always source archives and Windows installers are on PyPI.
>
> What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html
> Installers: http://pypi.python.org/pypi/pandas
>
> $ git log v0.9.1..v0.10.0 --pretty=format:%aN | sort | uniq -c | sort -rn
> 246 Wes McKinney
> 140 y-p
> 99 Chang She
> 45 jreback
> 18 Abraham Flaxman
> 17 Jeff Reback
> 14 locojaydev
> 11 Keith Hughitt
> 5 Adam Obeng
> 2 Dieter Vandenbussche
> 1 zach powers
> 1 Luke Lee
> 1 Laurent Gautier
> 1 Ken Van Haren
> 1 Jay Bourque
> 1 Donald Curtis
> 1 Chris Mulligan
> 1 alex arsenovic
> 1 A. Flaxman
>
> Happy data hacking!
>
> - Wes
>
> What is it
> ==========
> pandas is a Python package providing fast, flexible, and
> expressive data structures designed to make working with
> relational, time series, or any other kind of labeled data both
> easy and intuitive. It aims to be the fundamental high-level
> building block for doing practical, real world data analysis in
> Python.
>
> Links
> =====
> Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst
> Documentation: http://pandas.pydata.org
> Installers: http://pypi.python.org/pypi/pandas
> Code Repository: http://github.com/pydata/pandas
> Mailing List: http://groups.google.com/group/pydata
>
> Performance vs. v0.9.0
> ======================
>
> Benchmarks from https://github.com/pydata/pandas/tree/master/vb_suite
> Ratio < 1 means that v0.10.0 is faster
>
> v0.10.0 v0.9.0 ratio
> name
> unstack_sparse_keyspace 1.2813 144.1262 0.0089
> groupby_frame_apply_overhead 20.1520 337.3330 0.0597
> read_csv_comment2 25.3097 363.2860 0.0697
> groupbym_frame_apply 75.1554 504.1661 0.1491
> frame_iteritems_cached 0.0711 0.3919 0.1815
> read_csv_thou_vb 35.2690 191.9360 0.1838
> concat_small_frames 12.9019 55.3561 0.2331
> join_dataframe_integer_2key 5.8184 21.5823 0.2696
> series_value_counts_strings 5.3824 19.1262 0.2814
> append_frame_single_homogenous 0.3413 0.9319 0.3662
> read_csv_vb 18.4084 46.9500 0.3921
> read_csv_standard 12.0651 29.9940 0.4023
> panel_from_dict_all_different_indexes 73.6860 158.2949 0.4655
> frame_constructor_ndarray 0.0471 0.0958 0.4918
> groupby_first 3.8502 7.1988 0.5348
> groupby_last 3.6962 6.7792 0.5452
> panel_from_dict_two_different_indexes 50.7428 86.4980 0.5866
> append_frame_single_mixed 1.2950 2.1930 0.5905
> frame_get_numeric_data 0.0695 0.1119 0.6212
> replace_fillna 4.6349 7.0540 0.6571
> frame_to_csv 281.9340 427.7921 0.6590
> replace_replacena 4.7154 7.1207 0.6622
> frame_iteritems 2.5862 3.7463 0.6903
> series_align_int64_index 29.7370 41.2791 0.7204
> join_dataframe_integer_key 1.7980 2.4303 0.7398
> groupby_multi_size 31.0066 41.7001 0.7436
> groupby_frame_singlekey_integer 2.3579 3.1649 0.7450
> write_csv_standard 326.8259 427.3241 0.7648
> groupby_simple_compress_timing 41.2113 52.3993 0.7865
> frame_fillna_inplace 16.2843 20.0491 0.8122
> reindex_fillna_backfill 0.1364 0.1667 0.8181
> groupby_multi_series_op 15.2914 18.6651 0.8193
> groupby_multi_cython 17.2169 20.4420 0.8422
> frame_fillna_many_columns_pad 14.9510 17.5114 0.8538
> panel_from_dict_equiv_indexes 25.8427 29.9682 0.8623
> merge_2intkey_nosort 19.0755 22.1138 0.8626
> sparse_series_to_frame 167.8529 192.9920 0.8697
> reindex_fillna_pad 0.1410 0.1617 0.8720
> merge_2intkey_sort 44.7863 51.3315 0.8725
> reshape_stack_simple 2.6698 3.0502 0.8753
> groupby_indices 7.2264 8.2314 0.8779
> sort_level_one 4.3845 4.9902 0.8786
> sort_level_zero 4.3362 4.9198 0.8814
> write_store 16.0587 18.2042 0.8821
> frame_reindex_both_axes 0.3726 0.4183 0.8907
> groupby_multi_different_numpy_functions 13.4164 15.0509 0.8914
> index_int64_intersection 25.3705 28.1867 0.9001
> groupby_frame_median 7.7491 8.6011 0.9009
> frame_drop_dup_na_inplace 2.6290 2.9155 0.9017
> dataframe_reindex_columns 0.3052 0.3372 0.9049
> join_dataframe_index_multi 20.5651 22.6893 0.9064
> frame_ctor_list_of_dict 101.7439 112.2260 0.9066
> groupby_pivot_table 18.4551 20.3184 0.9083
> reindex_frame_level_align 0.9644 1.0531 0.9158
> stat_ops_level_series_sum_multiple 7.3637 8.0230 0.9178
> write_store_mixed 38.2528 41.6604 0.9182
> frame_reindex_both_axes_ix 0.4550 0.4950 0.9192
> stat_ops_level_frame_sum_multiple 8.1975 8.9055 0.9205
> panel_from_dict_same_index 25.7938 28.0147 0.9207
> groupby_series_simple_cython 5.1310 5.5624 0.9224
> frame_sort_index_by_columns 41.9577 45.1816 0.9286
> groupby_multi_python 54.9727 59.0400 0.9311
> datetimeindex_add_offset 0.2417 0.2584 0.9356
> frame_boolean_row_select 0.2905 0.3100 0.9373
> frame_reindex_axis1 2.9760 3.1742 0.9376
> stat_ops_level_series_sum 2.3382 2.4937 0.9376
> groupby_multi_different_functions 14.0333 14.9571 0.9382
> timeseries_timestamp_tzinfo_cons 0.0159 0.0169 0.9397
> stats_rolling_mean 1.6904 1.7959 0.9413
> melt_dataframe 1.5236 1.6181 0.9416
> timeseries_asof_single 0.0548 0.0582 0.9416
> frame_ctor_nested_dict_int64 134.3100 142.6389 0.9416
> join_dataframe_index_single_key_bigger 15.6578 16.5949 0.9435
> stat_ops_level_frame_sum 3.2475 3.4414 0.9437
> indexing_dataframe_boolean_rows 0.2382 0.2518 0.9459
> timeseries_asof_nan 10.0433 10.6006 0.9474
> frame_reindex_axis0 1.4403 1.5184 0.9485
> concat_series_axis1 69.2988 72.8099 0.9518
> join_dataframe_index_single_key_small 6.8492 7.1847 0.9533
> dataframe_reindex_daterange 0.4054 0.4240 0.9562
> join_dataframe_index_single_key_bigger 6.4616 6.7578 0.9562
> timeseries_timestamp_downsample_mean 4.5849 4.7787 0.9594
> frame_fancy_lookup 2.5498 2.6544 0.9606
> series_value_counts_int64 2.5569 2.6581 0.9619
> frame_fancy_lookup_all 30.7510 31.8465 0.9656
> index_int64_union 82.2279 85.1500 0.9657
> indexing_dataframe_boolean_rows_object 0.4809 0.4977 0.9662
> frame_ctor_nested_dict 91.6129 94.8122 0.9663
> stat_ops_series_std 0.2450 0.2533 0.9673
> groupby_frame_cython_many_columns 3.7642 3.8894 0.9678
> timeseries_asof 10.4352 10.7721 0.9687
> series_ctor_from_dict 3.7707 3.8749 0.9731
> frame_drop_dup_inplace 3.0007 3.0746 0.9760
> timeseries_large_lookup_value 0.0242 0.0248 0.9764
> read_table_multiple_date_baseline 1201.2930 1224.3881 0.9811
> dti_reset_index 0.6339 0.6457 0.9817
> read_table_multiple_date 2600.7280 2647.8729 0.9822
> reindex_frame_level_reindex 0.9524 0.9674 0.9845
> reindex_multiindex 1.3483 1.3685 0.9853
> frame_insert_500_columns 102.1249 103.4329 0.9874
> frame_drop_duplicates 19.3780 19.6157 0.9879
> reindex_daterange_backfill 0.1870 0.1889 0.9899
> stats_rank2d_axis0_average 25.0480 25.2801 0.9908
> series_align_left_monotonic 13.1929 13.2558 0.9953
> timeseries_add_irregular 22.4635 22.5122 0.9978
> read_store_mixed 13.4398 13.4560 0.9988
> lib_fast_zip 11.1289 11.1354 0.9994
> match_strings 0.3831 0.3833 0.9995
> read_store 5.5526 5.5290 1.0043
> timeseries_sort_index 22.7172 22.5976 1.0053
> timeseries_1min_5min_mean 0.6224 0.6175 1.0079
> stats_rank2d_axis1_average 14.6569 14.5339 1.0085
> reindex_daterange_pad 0.1886 0.1867 1.0102
> timeseries_period_downsample_mean 6.4241 6.3480 1.0120
> frame_drop_duplicates_na 19.3303 19.0970 1.0122
> stats_rank_average_int 23.3569 22.9996 1.0155
> lib_fast_zip_fillna 14.1394 13.8473 1.0211
> index_datetime_intersection 17.2626 16.8986 1.0215
> timeseries_1min_5min_ohlc 0.7054 0.6891 1.0237
> stats_rank_average 31.3440 30.3845 1.0316
> timeseries_infer_freq 10.9854 10.6439 1.0321
> timeseries_slice_minutely 0.0637 0.0611 1.0418
> index_datetime_union 17.9083 17.1640 1.0434
> series_align_irregular_string 89.9470 85.1344 1.0565
> series_constructor_ndarray 0.0127 0.0119 1.0742
> indexing_panel_subset 0.5692 0.5214 1.0917
> groupby_apply_dict_return 46.3497 42.3220 1.0952
> reshape_unstack_simple 3.2901 2.9089 1.1310
> timeseries_to_datetime_iso8601 4.2305 3.6015 1.1746
> frame_to_string_floats 53.6217 37.2041 1.4413
> reshape_pivot_time_series 170.4340 107.9068 1.5795
> sparse_frame_constructor 6.2714 3.5053 1.7891
> datetimeindex_normalize 37.2718 6.9329 5.3761
>
> Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio
Hi Collin,
I didn't add it to the official docs because of the authentication step complicating the doc build, but you can reference this brief blog post I wrote here:
http://quantabee.wordpress.com/2012/12/17/google-analytics-pandas/
Best,
Chang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121221/4a5b242a/attachment.html>
More information about the NumPy-Discussion
mailing list