[Tutor] How long can a line be for readline() or readlines() ?? ((LONG)) file, example and console

Schmidt, Allen J. aschmidt@nv.cc.va.us
Fri, 9 Nov 2001 11:09:59 -0500


importClassifieds.py

------
import MySQLdb
import string
import re
import time
import fileinput
import os

def doit(dir,ext1,ext2):
   """Rename files in "dir" with extension "ext1" to extension "ext2"."""

   print "Directory",dir
   print "Ext1", ext1
   print "Ext2", ext2
   files = os.listdir(dir)
   files.sort()

   for file in files:
      root,ext = os.path.splitext(file)

      if ext == ext1:
         oldfile = os.path.join(dir,file)
         newfile = os.path.join(dir,root+ext2)
         print "Renaming %s to %s"%(oldfile,newfile)
   
         if os.path.exists(os.path.join(dir,newfile)):
            print "*** Unable to rename %s to %s (already
exists"%(oldfile,newfile)
         else:
            try:
               os.rename(oldfile,newfile)
               print "Rename worked for file: ",newfile
            except:
               print "*** Unable to rename %s"%oldfile

def main():
   """ main """
   print "starting...."
   curtime=time.localtime(time.time())
   fmt='%Y%m%d_%H:%M'
   ourtimestamp=time.strftime(fmt,curtime)
   starttime=time.strftime(fmt,curtime)
   
   replaceApos=re.compile('\'', re.IGNORECASE|re.DOTALL)
   replaceComma=re.compile(',', re.IGNORECASE|re.DOTALL)
   replacePeriod=re.compile('\.', re.IGNORECASE|re.DOTALL)
   
   files = os.listdir("F:\\classifieds\\current\\")
   
   dbc = MySQLdb.connect(host="127.0.0.1", db="Classifieds")  # here's where
we hook up to the database and get a cursor
   crsr = dbc.cursor() 
   for file in files:
     if file[-3:]=='txt':
       filein = open("F:\\classifieds\\current\\"+file,"r")
       dash = file.index('-')
       dot = file.index('.')
       date = file[:dash]
       day = file[dash+1:dot]
       print "Processing: ",file
       for linein in filein.readlines():
         lineout = replaceApos.sub('\'\'', linein)
         lineout=string.split(lineout,'|')
         sql="insert into current (adnum, logo, rundate, runday, status,
adtext, category) values
('"+lineout[1]+"','"+lineout[2]+"','"+date+"','"+day+"','ENABLED','"+lineout
[3]+"','"+lineout[4]+"')"
         crsr.execute(sql)
       filein.close()
   doit('F:\\classifieds\\current\\','.txt','.ARCHIVE')
     

if __name__ == "__main__":
    main()

-----------------------
END of importClassifieds.py

START of one line of the file
------------------------

|00822chp||<A
HREF="http://www.harvestadsdepot.com/fredricksbrg/outgoing/00822chp.gif">Cli
ck here to see the display ad!</A><BR>SPOTSYLVANIA COUNTY<BR><BR>Find a
Great Job Close to Home<BR><BR>WHERE SERVICE, INTEGRITY & PRIDE ARE
VALUED<BR><BR>COUNTY OF SPOTSYLVANIA * VIRGINIA<BR><BR>PATIOR UT
POTIAR<BR><BR>Spotsylvania has become one of the fastest growing communities
in Virginia and is continually seeking highly motivated, self-starting
applicants. Persons seeking employment opportunities, position details,
qualifications and/or special requirements may access Job Information (24
hours, 7 days a week) by dialing 540/582-7192, Press #2. Selections may be
made from one of three menus: #1: Job Information Line; #2: Request a Job
Application; #3: Human Resources Staff Assistance. You may also obtain the
same information along with a County application by visiting our website: <A
HREF="http://www.spotsylvania.va.us/gov/hr/jobs.htm">www.spotsylvania.va.us/
gov/hr/jobs.htm</A><BR><BR>Parks & Recreation:<BR><BR>Facility Attendant
(Part Time)...$7.00/hr.--$9.33/hr.--December 1, 2001<BR><BR>Senior/Teen
Center Attendant (Part Time)...$7.65/hr.--$11.48/hr.--November 16,
2001<BR><BR>General Services:<BR><BR>Solid Waste Equip. Op./Refuse Truck
Driver (Part Time)...$11.42/hr.--$14.28/hr.--November 9, 2001<BR><BR>Solid
Waste Equip. Op. Composting (Full Time)...$11.42/hr.--$14.28/hr.--November
9, 2001<BR><BR>Maintenance Worker/Custodial (Part
Time)...$8.78/hr.--$11.19/hr.--November 16, 2001<BR><BR>Maintenance
Technician (Full Time)...$10.88/hr.--$13.61/hr.--November 9,
2001<BR><BR>Clerk Typist (Part Time)...$8.53/hr.--$10.66/hr.--November 16,
2001<BR><BR>Utilities:<BR><BR>Utility Field Crew Worker/Op. (Full
Time)...$11.08/hr.--$13.85/hr.--November 16, 2001<BR><BR>Human
Resources:<BR><BR>Secretary (Part Time)...$9.87/hr.--$10.29/hr.--November
16, 2001<BR><BR>Benefits for full-time positions include fully paid health
insurance, paid retirement, including LEOS (Law Enforcement and Firefighter
uniformed personnel), 12 1/2 paid holidays, accrual of annual and sick
leave, and a variety of voluntary retirement and disability programs.
Regular part-time positions are eligible to receive prorated annual and sick
leave benefits.<BR><BR>A County application is required for each advertised
position.<BR><BR>For personal assistance, please contact 540/582-7018, ext.
674<BR><BR>Monday--Friday, 8:00 a.m.--4:30 p.m. * TTP
540/582-7178<BR><BR>Minorities are encouraged to
apply.<BR><BR>E.O.E/ADA<BR><BR><A
HREF="http://www.harvestadsdepot.com/fredricksbrg/outgoing/00822chp.gif">Cli
ck here to see the display ad!</A>|051|RM|
--------------
End of line of file

Start of console messages...
----------------

starting....
Processing:  20011101-HAR.txt
Traceback (most recent call last):
  File "importClassifieds.py", line 68, in ?
    main()
  File "importClassifieds.py", line 61, in main
    sql="insert into current (adnum, logo, rundate, runday, status, adtext,
category) values
('"+lineout[1]+"','"+lineout[2]+"','"+date+"','"+day+"','ENABLED','"+lineout
[3]+"','"+lineout[4]+"')"
IndexError: list index out of range

---------------
End of console messages



Does this help??? It's obvious I am a PyNewbian...I borrowed some of the
code I found on the web (probably someone from this list) and some from
examples I already had from DB methods that work.

Thanks 

Allen

-----Original Message-----
From: Danny Yoo [mailto:dyoo@hkn.eecs.berkeley.edu]
Sent: Thursday, November 08, 2001 8:52 PM
To: Schmidt, Allen J.
Cc: tutor@python.org
Subject: Re: [Tutor] How long can a line be for readline() or
readlines() ??


On Thu, 8 Nov 2001, Schmidt, Allen J. wrote:

> Is there a limit to the length of an individual line when using
> readlines() or readline() ?
> 
> I get errors when a line exceeds 2048 characters.

Hmm!  There should be no hard limit to how long a line should be!  That's
one of the reasons why Python is good for text manipulation: it can handle
these files without any handholding from us.

Can you show us the error message, as well as a sample file that triggers
the error?  Thanks!