[Tutor] CVS File Opening

Paras K. paras80 at gmail.com
Wed May 27 15:44:52 CEST 2009


As requested - here is some example rows from the csv files:


    117.86.68.157 BitTorrent Client Activity 1 5/21/2009 6:56
82.210.106.99 BitTorrent
Client Activity 1 5/20/2009 12:39 81.132.134.83 BitTorrent Client
Activity 1 5/21/2009
3:14

The rows are: IP, Activity, Count, Date / Time these are typical log files.



On Tue, May 26, 2009 at 6:51 PM, Sander Sweers <sander.sweers at gmail.com>wrote:

> 2009/5/26 Paras K. <paras80 at gmail.com>:
> > Hello,
> >
> > I have been working on this script / program all weekend. I emailed this
> > address before and got some great help. I hope that I can get that again!
> >
> >
> > First to explain what I need to do:
> >
> > Have about 6 CSV files that I need to read. Then I need to split based on
> a
> > range of IP address and if the count number is larger than 75.
> >
> > I currently merge all the CSV files by using the command line:
> >
> > C:Reports> copy *.csv merge.csv
> >
> > Then I run the dos command: for /F "TOKENS=* SKIP=1" %i in ('find "."
> > merge.csv ^| find /v "----"') do echo %i>> P2PMerge.csv
> >
> > From some of my friends they tell me that should remove that last
> carriage
> > return, which it does, however when it goes through the python script it
> > returns no values.
>
> Why would you need to strip off a carriage return? And why would you
> not process the csv files one after another? It would be easier to
> have some example data.
>
> > Now if I open the merge.csv and remove that carriage return manually and
> > save it as P2PMerge.csv the script runs just fine.
> >
> > Here is my source code:
> >
> > # P2P Report / Bitorrent Report
> > # Version 1.0
> > # Last Updated: May 26, 2009
> > # This script is designed to go through the cvs files and find the valid
> IP
> > Address
> > # Then copys them all to a new file
> > import sys
> > import win32api
> > import win32ui
> > import shutil
> > import string
> > import os
> > import os.path
> > import csv
>
> You import csv but do not use it below?
>
> > #Global Variables
> > P2Pfiles = []
> > totalcount = 0
> > t = 0
> > #still in the development process -- where to get the files from
> > #right now the location is C:\P2P
> > def getp2preportdestion():
> >     win32ui.MessageBox('Welcome to P2P Reporting.\nThis program is
> designed
> > to aid in the P2P reporting. \n\nThe locations of P2P Reports should be
> in
> > C:\P2P \nWith no subdirectories.\n\nVersion 1.0 - \n\nPress "OK" to
> continue
> > with this program.')
> >     p2preport = 'C://P2P\\'
> >     return p2preport
> >
> >
> > #Main Program
> > #Get location of directories
> > p2ploc = getp2preportdestion()
> > #Checking to make sure directory is there.
> > if os.path.exists(p2ploc):
> >     if os.path.isfile(p2ploc +'/p2pmerge.csv'):
> >         win32ui.MessageBox('P2PMerge.csv file does exists.\n\nWill
> continue
> > with P2P Reporting.')
> >     else:
> >          win32ui.MessageBox('P2PMerge.csv files does not exists.
> \n\nPlease
> > run XXXXXXX.bat files first.')
> >          sys.exit()
> > else:
> >     win32ui.MessageBox('The C:\P2P directory does not exists.\n\nPlease
> > create and copy all the files there.\nThen re-run this script')
> >     sys.exit()
> > fh = open('C://P2P/P2PMerge.csv', "rb")
> > ff = open('C://P2P/P2PComplete.csv', "wb")
> > igot1 = fh.readlines()
> >
> > for line in igot1:
>
> You can also write the below and get rid of igot1.
> for line in fh.readlines():
>
> >     readline = line
> >     ipline = readline
> >     ctline = readline
>
> You are making variables to the same object and all are not necessary.
> See below idle session which should show what I mean.
>
> >>> line = [1,2,3,4]
> >>> readline = line
> >>> ipline = readline
> >>> ctline = readline
> >>> line
> [1, 2, 3, 4]
> >>> line.append('This will be copied to readline, iplin and ctline')
> >>> readline
> [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']
> >>> ipline
> [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']
> >>> ctline
> [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline']
>
> >     count = ctline.split(',')[2]
> >     count2 = int(count)
> >     print count2
> >     t = count2
>
> Again making variables to the same object? And you really do not not need
> t.
>
> >     ip = ipline.split(' ')[0]
>
> so all the above can be simplified like:
>       data = line.split(' ')
>       count = int(data[2])
>       ip = data[0]
>
> >     split_ip = ip.split('.')
> >     if ((split_ip[0] == '192') and (t >=75)):
>
> The above then would be:
>       if ip.startswith('192') and count >= 75:
>
> >         ff.write(readline)
> This will change as well:
>           ff.write(line)
>
> You can figure out the rest ;-)
>
> >         totalcount +=1
> >     elif ((split_ip[0] == '151') and (t >=75)):
> >         ff.write(readline)
> >         totalcount +=1
> >     elif (((split_ip[0] == '142') and (split_ip[1]) == '152') and (t
> >=75)):
> >           ff.write(readline)
> >           totalcount +=1
> >
> > tc = str(totalcount)
> > win32ui.MessageBox('Total Number of IPs in P2P Reporting: '+ tc)
> > fh.close()
> > ff.close()
> >
> >
> > What I am looking for is an working example of how to go through the
> > directory and read each csv file within that directory or how to remove
> the
> > carriage return at the end of the csv file.
>
> You can avoid the removal of this carriage return, read below. But if
> you really need to you can use str.rstrip('carriage return').
>
> > NOTE: This is not for a class - it is for work to assist me in reading
> > multiple csv files within a couple days.
> >
> > Any assistance is greatly appreciated.
>
> Use te glob module which can easilly find all csv files in a
> directory. In general I would loop over each file and do your
> processing. Like,
>
> import glob
>
> totalcount = 0
> for f in glob.glob('inpath' + '*csv'):
>    for line in f.readlines():
>        You code comes here.
>
> Greets
> Sander
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090527/154fdf06/attachment-0001.htm>


More information about the Tutor mailing list