Hi all, We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901 What are the most important changes: - We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change. So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger. *But we would like to get some feedback on this!* Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()? Some examples of the changes with the current state of the PR are shown below: Previous Behavior In [1]: pd.get_option('max_seq_items') Out[1]: 100 In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64') In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64') In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern New Behavior In [1]: pd.get_option('max_seq_items') Out[1]: 10 In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo') In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104) In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern') In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
I like the changes you propose, the new version is much more readable. I used to be wary of calling df.index because it can be slow and the output is a bit messy, and I'm usually too lazy to select just a slice of it, so having something like this done by default is a welcome change. Just a question, does it apply also to multiindexes? Cheers! On Friday, April 17, 2015 at 12:07:44 PM UTC+2, Joris Van den Bossche wrote:
Hi all,
We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901
What are the most important changes:
- We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.
*But we would like to get some feedback on this!*
Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()?
Some examples of the changes with the current state of the PR are shown below:
Previous Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 100
In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern
In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
New Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 10
In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)
In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
This is probably not the sort of comment you're looking for, but I'd like to see more of a table-style output. I can just put a 'values' at the end to get the more numpy like output (which is easier to read IMO), but it won't stop at 10 or 100 unless I tell it to. Nevertheless, I think it's much easer to read this: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern').values Out[442]: array(['2013-01-01T00:00:00.000000000-0500', '2013-01-02T00:00:00.000000000-0500', '2013-01-03T00:00:00.000000000-0500', '2013-01-04T00:00:00.000000000-0500', '2013-01-05T00:00:00.000000000-0500', than this: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[443]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche wrote:
Hi all,
We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901
What are the most important changes:
- We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.
*But we would like to get some feedback on this!*
Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()?
Some examples of the changes with the current state of the PR are shown below:
Previous Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 100
In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern
In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
New Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 10
In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)
In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
John, you are quoting the current impl (which is first), the new is like this: In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern') In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern') Lorenzo, to answer your question, MultiIndexes are unchanged (and CategoricalIndex are new). We *could* make them a single line but would be pretty crowded. Note that MultiIndex and CategoricalIndex are multi-line repr and do no truncate sequences (of e.g. labels), this is consistent with previous versions. (easy to change this though) In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']) Out[1]: MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], names=[u'first', u'second']) In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo') Out[4]: CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1], categories=[0, 1, 2, 3, 4], ordered=False, name=u'foo', dtype='category') On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote:
This is probably not the sort of comment you're looking for, but I'd like to see more of a table-style output. I can just put a 'values' at the end to get the more numpy like output (which is easier to read IMO), but it won't stop at 10 or 100 unless I tell it to. Nevertheless, I think it's much easer to read this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern').values Out[442]: array(['2013-01-01T00:00:00.000000000-0500', '2013-01-02T00:00:00.000000000-0500', '2013-01-03T00:00:00.000000000-0500', '2013-01-04T00:00:00.000000000-0500', '2013-01-05T00:00:00.000000000-0500',
than this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[443]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche wrote:
Hi all,
We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901
What are the most important changes:
- We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.
*But we would like to get some feedback on this!*
Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()?
Some examples of the changes with the current state of the PR are shown below:
Previous Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 100
In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern
In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
New Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 10
In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)
In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
I like the suggestion of John to have something more like the output of numpy arrays. For example, the proposed repr: In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern') would then be something like this: In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern') 2015-04-21 2:53 GMT+02:00 Jeff <jeffreback@gmail.com>:
John, you are quoting the current impl (which is first), the new is like this:
In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
Lorenzo, to answer your question, MultiIndexes are unchanged (and CategoricalIndex are new). We *could* make them a single line but would be pretty crowded.
Note that MultiIndex and CategoricalIndex are multi-line repr and do no truncate sequences (of e.g. labels), this is consistent with previous versions. (easy to change this though)
In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']) Out[1]: MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], names=[u'first', u'second'])
In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo') Out[4]: CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1], categories=[0, 1, 2, 3, 4], ordered=False, name=u'foo', dtype='category')
On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote:
This is probably not the sort of comment you're looking for, but I'd like to see more of a table-style output. I can just put a 'values' at the end to get the more numpy like output (which is easier to read IMO), but it won't stop at 10 or 100 unless I tell it to. Nevertheless, I think it's much easer to read this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern').values Out[442]: array(['2013-01-01T00:00:00.000000000-0500', '2013-01-02T00:00:00.000000000-0500', '2013-01-03T00:00:00.000000000-0500', '2013-01-04T00:00:00.000000000-0500', '2013-01-05T00:00:00.000000000-0500',
than this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[443]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche wrote:
Hi all,
We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901
What are the most important changes:
- We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.
*But we would like to get some feedback on this!*
Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()?
Some examples of the changes with the current state of the PR are shown below:
Previous Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 100
In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern
In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
New Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 10
In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)
In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
--
You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Follow-up of this discussion: as you may have seen, the changes were released in 0.16.1 (see the whatsnew docs: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#index-representati... ). In the end, we used the suggestion of John to go for a bit more numpy style output. There will probably still be some quirks/things to improve, you can report them at this follow-up issue: https://github.com/pydata/pandas/issues/10095 Joris 2015-04-21 2:59 GMT+02:00 Joris Van den Bossche < jorisvandenbossche@gmail.com>:
I like the suggestion of John to have something more like the output of numpy arrays.
For example, the proposed repr:
In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
would then be something like this:
In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
2015-04-21 2:53 GMT+02:00 Jeff <jeffreback@gmail.com>:
John, you are quoting the current impl (which is first), the new is like this:
In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
Lorenzo, to answer your question, MultiIndexes are unchanged (and CategoricalIndex are new). We *could* make them a single line but would be pretty crowded.
Note that MultiIndex and CategoricalIndex are multi-line repr and do no truncate sequences (of e.g. labels), this is consistent with previous versions. (easy to change this though)
In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']) Out[1]: MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], names=[u'first', u'second'])
In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo') Out[4]: CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1], categories=[0, 1, 2, 3, 4], ordered=False, name=u'foo', dtype='category')
On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote:
This is probably not the sort of comment you're looking for, but I'd like to see more of a table-style output. I can just put a 'values' at the end to get the more numpy like output (which is easier to read IMO), but it won't stop at 10 or 100 unless I tell it to. Nevertheless, I think it's much easer to read this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern').values Out[442]: array(['2013-01-01T00:00:00.000000000-0500', '2013-01-02T00:00:00.000000000-0500', '2013-01-03T00:00:00.000000000-0500', '2013-01-04T00:00:00.000000000-0500', '2013-01-05T00:00:00.000000000-0500',
than this:
pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[443]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche wrote:
Hi all,
We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901
What are the most important changes:
- We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items). - The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.
*But we would like to get some feedback on this!*
Do you like the changes? For DatetimeIndex? For the number of shown values? Would you want different behaviour for repr() and str()?
Some examples of the changes with the current state of the PR are shown below:
Previous Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 100
In [2]: pd.Index(range(4), name='foo') Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
In [3]: pd.Index(range(104), name='foo') Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[4]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] Length: 4, Freq: D, Timezone: US/Eastern
In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') Out[5]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] Length: 104, Freq: D, Timezone: US/Eastern
New Behavior
In [1]: pd.get_option('max_seq_items') Out[1]: 10
In [9]: pd.Index(range(4), name='foo') Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
In [10]: pd.Index(range(104), name='foo') Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)
In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern') Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern') Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
--
You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
participants (4)
-
Jeff -
John E -
Joris Van den Bossche -
Lorenzo De Leo