[Tutor] CVS File Opening

Wed May 27 00:51:06 CEST 2009

2009/5/26 Paras K. <paras80 at gmail.com>:
> Hello,
>
> I have been working on this script / program all weekend. I emailed this
> address before and got some great help. I hope that I can get that again!
>
>
> First to explain what I need to do:
>
> Have about 6 CSV files that I need to read. Then I need to split based on a
> range of IP address and if the count number is larger than 75.
>
> I currently merge all the CSV files by using the command line:
>
> C:Reports> copy *.csv merge.csv
>
> Then I run the dos command: for /F "TOKENS=* SKIP=1" %i in ('find "."
> merge.csv ^| find /v "----"') do echo %i>> P2PMerge.csv
>
> From some of my friends they tell me that should remove that last carriage
> return, which it does, however when it goes through the python script it
> returns no values.

Why would you need to strip off a carriage return? And why would you
not process the csv files one after another? It would be easier to
have some example data.

> Now if I open the merge.csv and remove that carriage return manually and
> save it as P2PMerge.csv the script runs just fine.
>
> Here is my source code:
>
> # P2P Report / Bitorrent Report
> # Version 1.0
> # Last Updated: May 26, 2009
> # This script is designed to go through the cvs files and find the valid IP
> Address
> # Then copys them all to a new file
> import sys
> import win32api
> import win32ui
> import shutil
> import string
> import os
> import os.path
> import csv

You import csv but do not use it below?

> #Global Variables
> P2Pfiles = []
> totalcount = 0
> t = 0
> #still in the development process -- where to get the files from
> #right now the location is C:\P2P
> def getp2preportdestion():
>     win32ui.MessageBox('Welcome to P2P Reporting.\nThis program is designed
> to aid in the P2P reporting. \n\nThe locations of P2P Reports should be in
> C:\P2P \nWith no subdirectories.\n\nVersion 1.0 - \n\nPress "OK" to continue
> with this program.')
>     p2preport = 'C://P2P\\'
>     return p2preport
>
>
> #Main Program
> #Get location of directories
> p2ploc = getp2preportdestion()
> #Checking to make sure directory is there.
> if os.path.exists(p2ploc):
>     if os.path.isfile(p2ploc +'/p2pmerge.csv'):
>         win32ui.MessageBox('P2PMerge.csv file does exists.\n\nWill continue
> with P2P Reporting.')
>     else:
>          win32ui.MessageBox('P2PMerge.csv files does not exists. \n\nPlease
> run XXXXXXX.bat files first.')
>          sys.exit()
> else:
>     win32ui.MessageBox('The C:\P2P directory does not exists.\n\nPlease
> create and copy all the files there.\nThen re-run this script')
>     sys.exit()
> fh = open('C://P2P/P2PMerge.csv', "rb")
> ff = open('C://P2P/P2PComplete.csv', "wb")
> igot1 = fh.readlines()
>
> for line in igot1:

You can also write the below and get rid of igot1.
for line in fh.readlines():

>     readline = line
>     ipline = readline
>     ctline = readline

You are making variables to the same object and all are not necessary.
See below idle session which should show what I mean.

>>> line = [1,2,3,4]
>>> readline = line
>>> ipline = readline
>>> ctline = readline
>>> line
[1, 2, 3, 4]
>>> line.append('This will be copied to readline, iplin and ctline')
>>> readline
[1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']
>>> ipline
[1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']
>>> ctline
[1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']

>     count = ctline.split(',')[2]
>     count2 = int(count)
>     print count2
>     t = count2

Again making variables to the same object? And you really do not not need t.

>     ip = ipline.split(' ')[0]

so all the above can be simplified like:
       data = line.split(' ')
       count = int(data[2])
       ip = data[0]

>     split_ip = ip.split('.')
>     if ((split_ip[0] == '192') and (t >=75)):

The above then would be:
       if ip.startswith('192') and count >= 75:

>         ff.write(readline)
This will change as well:
           ff.write(line)

You can figure out the rest ;-)

>         totalcount +=1
>     elif ((split_ip[0] == '151') and (t >=75)):
>         ff.write(readline)
>         totalcount +=1
>     elif (((split_ip[0] == '142') and (split_ip[1]) == '152') and (t >=75)):
>           ff.write(readline)
>           totalcount +=1
>
> tc = str(totalcount)
> win32ui.MessageBox('Total Number of IPs in P2P Reporting: '+ tc)
> fh.close()
> ff.close()
>
>
> What I am looking for is an working example of how to go through the
> directory and read each csv file within that directory or how to remove the
> carriage return at the end of the csv file.

You can avoid the removal of this carriage return, read below. But if
you really need to you can use str.rstrip('carriage return').

> NOTE: This is not for a class - it is for work to assist me in reading
> multiple csv files within a couple days.
>
> Any assistance is greatly appreciated.

Use te glob module which can easilly find all csv files in a
directory. In general I would loop over each file and do your
processing. Like,

import glob

totalcount = 0
for f in glob.glob('inpath' + '*csv'):
    for line in f.readlines():
        You code comes here.

Greets
Sander