[Tutor] Handling hundreds of thousands of inserts with MySQLdb

Gonzillaaa gonzillaaa at gmail.com
Sat Sep 2 18:03:03 CEST 2006


Hello all. I post this here since is my first attempt to solve a  
problem with python.

I have fairly big log files that I'm doing some pre-processing to, to  
cleanup the data before they go into a MySQL database. After  
processing the files look something like this:

17, , 2006-8-21 12:04:29, 0, 3.0846, 25.105, 918, -0.12183, 0.20305,  
25.389, 25.254, 180
18, , 2006-8-21 12:05:20, 17, 3.1705, 23.62, 949, 0.015228, 0.040609,  
24.984, 110.2, 186
17, , 2006-8-21 12:07:30, 0, 3.0846, 25.353, 939, -0.1269, 0.20305,  
25.254, 25.254, 293
18, , 2006-8-21 12:08:23, 17, 3.1705, 23.538, 958, 0.015228,  
0.045685, 24.984, 110.2, 188
16, , 2006-8-21 12:09:21, 17, 3.0922, 24.691, 969, 0.26904, 0.10152,  
25.389, 25.389, 175

then I have written another script to which I pass the filename as  
argument to insert that data into the db. The problem I'm getting is  
that some of the files contain 300000 records aprox. I have tried two  
approaches to inset the data but both have failed

1- use cursor.executemany which throws an error like this also if I  
understand it correctly executemany() does one insert at a time which  
seems hardly efficient.

File "./xbow_MySQL_insert.py", line 39, in MySQLInsert
     cursor.executemany("INSERT INTO table \
   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
python2.4/site-packages/MySQLdb/cursors.py", line 216, in executemany
     r = self._query(',\n'.join(q))
   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
python2.4/site-packages/MySQLdb/cursors.py", line 309, in _query
     rowcount = self._do_query(q)
   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
python2.4/site-packages/MySQLdb/cursors.py", line 273, in _do_query
     db.query(q)
_mysql_exceptions.OperationalError: (1153, "Got a packet bigger than  
'max_allowed_packet' bytes")

2- use cursor.execute with a DELAYED INSERT but I think I'm just  
passing the wrong arguments to it:

File "./xbow_MySQL_insert.py", line 43, in MySQLInsert
     cursor.execute("INSERT DELAYED INTO arup_03 \
   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ 
python2.4/site-packages/MySQLdb/cursors.py", line 148, in execute
     query = query % db.literal(args)
TypeError: not all arguments converted during string formatting


Another solution I thought of but I'm not sure how to approach is to  
break the data into chunks and do delayed inserts with those chunks,  
although I'm not sure how to approach this (breaking the data up, and  
getting delayed inserts to work).

Here is the code so far, any comments on general improvements are  
also welcome.

Many thanks,

Gonzalo.

#######################################

import sys,string,MySQLdb

#db config
host, user, password, database = "localhost", "user", "pass",  
"xbow_data"

def MySQLInsert(mysql_data):
	db = MySQLdb.connect(host, user, password, database)
	cursor = db.cursor()
	
	cursor.executemany("INSERT INTO table \
	(id, sample, sample_time, parent, voltage, temp, light, accel_x,  
accel_y, mag_x, mag_y, mic) \
	VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",  mysql_data)

	# cursor.execute("INSERT DELAYED INTO table \
	# (id, sample, sample_time, parent, voltage, temp, light, accel_x,  
accel_y, mag_x, mag_y, mic) \
	# VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",   
mysql_data)

	db.close()
	print "done inserting into db"

def returnTuple(formatted_data):
	_data = []
	for line in formatted_data:
		line = line.split(',')
		line = [item.strip() for item in line]
		_data.append(line)
	
	return [tuple(line) for line in _data]
	

def Main():
	#check if filename was passed when the program was called
	if (len(sys.argv) < 2):
		print "Usage:" + sys.argv[0] + " <filename> "
		sys.exit()
		
	#open file to work on
	filename = sys.argv[1]
	raw_data = open(filename,"r",1)
	
	# Get data on file converted into a series of tuples to be inserted  
into db
	mysql_data =  returnTuple(raw_data)
	
	MySQLInsert(mysql_data)
	
	#close file
	raw_data.close()
	
	#print result
	print "done! " + str(len(mysql_data)) + " lines inserted"

if __name__ == '__main__': Main()









More information about the Tutor mailing list