Problem in defining multidimensional array matrix and regression
Thomas Jollans
tjol at tjol.eu
Sun Nov 19 17:01:20 EST 2017
On 19/11/17 18:55, shalu.ashu50 at gmail.com wrote:
> Hello Peter,
>
> Many thanks for your suggestion.
> Now I am using Pandas &
> I already did that but now I need to make a multi-dimensional array for reading all variables (5 in this case) at one x-axis, so I can perform multiple regression analysis.
>
> I am not getting how to bring all variables at one axis (e.g. at x-axis)?
Pandas is great at this: index a single row of a DataFrame with your
favourite selector from
http://pandas.pydata.org/pandas-docs/stable/indexing.html (or just loop
over the DataFrame's .iterrows)
If you want a multi-dimensional array with all the data, numpy.loadtxt
can do that for you.
>
> Thanks
> Vishal
>
> On Sunday, 19 November 2017 22:32:06 UTC+5:30, Peter Otten wrote:
>> shalu.ashu50 at gmail.com wrote:
>>
>>> Hi, All,
>>>
>>> I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and
>>> others are predictors (at x). I want to do multiple regression and create
>>> a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I
>>> want to read rainfall as a separate variable and others in separate
>>> columns, so I can apply the algo. However, I am not able to make a proper
>>> matrix for them.
>>>
>>> Here are my data and codes?
>>> Please suggest me for the same.
>>> I am new to Python.
>>>
>>> RF P1 P2 P3 P4 P5
>>> 120.235 0.234 -0.012 0.145 21.023 0.233
>>> 200.14 0.512 -0.021 0.214 22.21 0.332
>>> 185.362 0.147 -0.32 0.136 24.65 0.423
>>> 201.895 0.002 -0.12 0.217 30.25 0.325
>>> 165.235 0.256 0.001 0.22 31.245 0.552
>>> 198.236 0.012 -0.362 0.215 32.25 0.333
>>> 350.263 0.98 -0.85 0.321 38.412 0.411
>>> 145.25 0.046 -0.36 0.147 39.256 0.872
>>> 198.654 0.65 -0.45 0.224 40.235 0.652
>>> 245.214 0.47 -0.325 0.311 26.356 0.632
>>> 214.02 0.18 -0.012 0.242 22.01 0.745
>>> 147.256 0.652 -0.785 0.311 18.256 0.924
>>>
>>> import numpy as np
>>> import statsmodels as sm
>>> import statsmodels.formula as smf
>>> import csv
>>>
>>> with open("pcp1.csv", "r") as csvfile:
>>> readCSV=csv.reader(csvfile)
>>>
>>> rainfall = []
>>> csvFileList = []
>>>
>>> for row in readCSV:
>>> Rain = row[0]
>>> rainfall.append(Rain)
>>>
>>> if len (row) !=0:
>>> csvFileList = csvFileList + [row]
>>>
>>> print(csvFileList)
>>> print(rainfall)
>>
>> You are not the first to read tabular data from a file; therefore numpy (and
>> pandas) offer highlevel function to do just that. Once you have the complete
>> table extracting a specific column is easy. For instance:
>>
>> $ cat rainfall.txt
>> RF P1 P2 P3 P4 P5
>> 120.235 0.234 -0.012 0.145 21.023 0.233
>> 200.14 0.512 -0.021 0.214 22.21 0.332
>> 185.362 0.147 -0.32 0.136 24.65 0.423
>> 201.895 0.002 -0.12 0.217 30.25 0.325
>> 165.235 0.256 0.001 0.22 31.245 0.552
>> 198.236 0.012 -0.362 0.215 32.25 0.333
>> 350.263 0.98 -0.85 0.321 38.412 0.411
>> 145.25 0.046 -0.36 0.147 39.256 0.872
>> 198.654 0.65 -0.45 0.224 40.235 0.652
>> 245.214 0.47 -0.325 0.311 26.356 0.632
>> 214.02 0.18 -0.012 0.242 22.01 0.745
>> 147.256 0.652 -0.785 0.311 18.256 0.924
>> $ python3
>> Python 3.4.3 (default, Nov 17 2016, 01:08:31)
>> [GCC 4.8.4] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import numpy
>>>>> rf = numpy.genfromtxt("rainfall.txt", names=True)
>>>>> rf["RF"]
>> array([ 120.235, 200.14 , 185.362, 201.895, 165.235, 198.236,
>> 350.263, 145.25 , 198.654, 245.214, 214.02 , 147.256])
>>>>> rf["P3"]
>> array([ 0.145, 0.214, 0.136, 0.217, 0.22 , 0.215, 0.321, 0.147,
>> 0.224, 0.311, 0.242, 0.311])
>
More information about the Python-list
mailing list