[Matplotlib-users] Reshaping Tabular Data to Column Data
Scott Lasley
selasley at icloud.com
Tue Mar 24 22:56:32 EDT 2020
Here's a version without the unnecessary string manipulations
# read data from noaa, set the index to the first column
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data',
skiprows=1,
skipfooter=3,
index_col=[0],
header=None,
engine='python', # c engine doesn't have skipfooter
delim_whitespace=True)
# unstack the dataframe to get a Series with a multiindex
# consisting of the column name and date
df = df.unstack()
# create a new index using the existing multiindex levels
ndx = [y * 100 + c
for (c, y) in df.index.to_flat_index()]
df.index = ndx
df = df.sort_index()
> On Mar 24, 2020, at 9:28 PM, Scott Lasley via Matplotlib-users <matplotlib-users at python.org> wrote:
>
> Does this do what you want?
>
> # read data from noaa, set index to first column and column names to '00' - '12'
> df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data',
> skiprows=1,
> skipfooter=3,
> index_col=[0],
> header=None,
> names=['{:02d}'.format(i) for i in range(13)],
> delim_whitespace=True)
> # unstack the dataframe to get a Series with and a multiindex
> # consisting of the date and column name
> df = df.unstack()
> # sort index by [date, column name] to "group" the dates
> df = df.sort_index(level=[1,0])
> # create a new index using the existing multiindex levels
> ndx = [int(''.join([y,c]))
> for y in df.index.levels[1].astype(str)
> for c in df.index.levels[0]]
> df.index = ndx
> df.head(28)
>
> 194801 -99.90
> 194802 -99.90
> 194803 -99.90
> ...
> 194911 -99.90
> 194912 -99.90
> 195001 -3.65
> 195002 -1.69
> 195003 -0.06
> 195004 -0.23
>
> df.tail(12)
>
> 202001 -0.95
> 202002 -0.07
> 202003 -99.90
> 202004 -99.90
> 202005 -99.90
> 202006 -99.90
> 202007 -99.90
> 202008 -99.90
> 202009 -99.90
> 202010 -99.90
> 202011 -99.90
> 202012 -99.90
>
> Note that the 202001 and 202002 values do not agree with the values in your email message but do agree with the values on the website.
>
> Hth,
> Scott
>
>
>> On Mar 24, 2020, at 6:12 PM, Benson, Randall <Randall.Benson at avangrid.com> wrote:
>>
>> Hi –
>> I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –
>>
>> pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data')
>>
>> The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –
>>
>> 1948 2020
>> 0 1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
>> 1 1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
>> 2 1950 -3.65 -1.69 -0.06 -0.23 -0.40 -1.9...
>> 3 1951 -1.18 -2.11 -1.09 0.47 1.19 -0.5...
>> 4 1952 -3.19 0.08 -0.69 0.89 -0.79 0.5...
>>
>> And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –
>>
>> 194801 -99.90
>> 194802 -99.90
>> 194803 -99.90
>> ...
>> 202001 -3.45
>> 202002 0.87
>>
>> I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,
>> Randall
>>
>>
>> <image001.jpg>
>> Randall P. Benson, PhD
>> Wind Asset Meteorologist, Energy Resource - Onshore
>>
>> 1125 NW Couch Street, Suite 700
>> Portland, Oregon, USA 97209
>> Telephone 503-796-7129
>> Cell 971-227-2477
>> randall.benson at avangrid.com
>>
>> <image002.png>
>>
>> In the interest of the environment,
>> please print only if necessary and recycle.
>>
>>
>>
>> ==============================================================
>>
>> Please consider the environment before printing this email.
>>
>> If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.
>>
>> The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.
>>
>> ==============================================================
>>
>> _______________________________________________
>> Matplotlib-users mailing list
>> Matplotlib-users at python.org
>> https://mail.python.org/mailman/listinfo/matplotlib-users
>
> _______________________________________________
> Matplotlib-users mailing list
> Matplotlib-users at python.org
> https://mail.python.org/mailman/listinfo/matplotlib-users
More information about the Matplotlib-users
mailing list