[Tutor] Reading CSV files in Pandas

Manish Tripathi tr.manish at gmail.com
Sat Oct 19 16:29:13 CEST 2013


I am trying to import a csv file in Pandas but it throws an error. The
format of the data when opened in notepad++ is as follows with first row
being column names:

"End Customer Organization ID,End Customer Organization Name,End
Customer Top Parent Organization ID,End Customer Top Parent
Organization Name,Reseller Top Parent ID,Reseller Top Parent
Name,Business,Rev Sum Division,Rev Sum Category,Product
Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing
Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales
Date""11027676,Baroda Western Uttar Pradesh Gramin
Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of
Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,""Hcl
Infosystems Ltd - Partnerdghftrutyhb
frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw"",Server &
CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server &
CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL
CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL
CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open
Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,""Open
Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho"",125.85,1,FY07,12/28/2006""12835756,Uttam
Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O
Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics
NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer
Enhanc. Def,0,0,FY09,9/15/2008""12233135,Bhagwan Singh
Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt
Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL
CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA -
New,0,0,FY09,9/15/2008""11602305,Maya Academy Of Advanced
Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt
Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows
Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet
L&SA Recur-Def,0,0,FY09,9/25/2008""13336009,Remiel Softech Solution
Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington
C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics
NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New
Customer Enhanc. Def,0,0,FY09,12/23/2008""7872800,Science Application
International Corporation,2839760,GOVERNMENT OF
KARNATAKA,10237455,Cubic Computing P.L,Server & CAL,SQL Server &
CAL,SQL Server Standard,SQL Server Standard
Edition,Non-specific,Open,Open SA/UA,Deferred Open SA -
Renewal,0,0,FY09,1/15/2009""13096361,Pratham Software Pvt
Ltd,13096361,Pratham Software Pvt Ltd,10133086,Krap
Computer,Information Worker,Office,Office Standard / Basic,Office
Standard,2007,Open,Open L,Open
Std,7132.44,28,FY09,9/24/2008""12192276,Texmo Precision
Castings,12192276,Texmo Precision Castings,4059430,Quadra Systems. -
Partner,Server & CAL,Windows Server & CAL,Windows Standard
Server,Windows Server Standard,Non-specific,Open,Open L&SA,Deferred
Open L&SA - New,0,0,FY09,11/15/2008"

*Kindly note that the same file when double clicked in the csv format opens
in excel with comma separated values BUT with NO quotation marks in each
line as shown in notepad++.*

I have used encoding as UTF-8 which gives the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position
13: invalid start byte

Then used encoding='cp1252' first and then tried with latin1.

df=pd.read_csv(filename,encoding='cp1252')
or

df=pd.read_csv(filename,encoding='latin1')

With both the encodings it didn't give any error and the data got imported
but as one single column and not as different columns.

Does it have to do with the "" marks present before each line in the data?
I had a similar csv file with comma separated values, but that didn't have
double quotation marks in each line and that got imported correctly both
with cp1252 and latin1. But not for UTF-8 even though the file was saved in
utf8 format in notepad++. But in this case utf8 doesnt work as usual and
other two import it as single column.

Please advise.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20131019/664e57cd/attachment.html>


More information about the Tutor mailing list