Mejora de rendimiento

Vie Sep 19 08:15:33 CEST 2003

Buenos dias, he visto algunas cosas que se pueden mejorar :

On Thu, 2003-09-18 at 23:52, Tomás Javier Robles Prado wrote:
> Mezclando varias de las sugerencias propuestas:
> 
> from sys import argv
> from time import strptime, mktime
> 
> DATE_FORMAT = "%d/%b/%Y:%H:%M:%S +0200"
> 
> def cmp (l1, l2):
>     """Compara dos líneas de log de tipo apache según su fecha."""
>     #Si las fechas son iguales, me da igual en que orden las pone
>     if (l1[1] >=  l2[1]) :
>         return 1
>     else :
>         return -1
> 
>     
> def run (log, output):
>     try:
>         f = file (log, 'r')
>     except IOError:
>         print "No se ha podido abrir ", log
>         return
> 
>     lineas = []
> 
> 
>     print ("Leyendo de %s..." % log)
>     #Se lee el fichero
>     l = f.readline()
>     while l != "":
> 
>         #Hay que intentar descartar líneas de este tipo:
>         #127.0.0.1 34922 ==> 226 Transfer complete.
>         #o este:
>         #/usr/lib/zope/z2.py:385: UserWarning: You are running... 
>         try:
>             if l.split()[1][0] == '-':
> 
>                 #Esto se cumple si es un hit habitual 
# split tiene un segundo parámetro (el número de veces que queremos
romper la cadena: 
fecha = mktime(strptime(l.split('[',2)[1].split(']',1)[0] ,DATE_FORMAT))
>  
> 		fecha = mktime(strptime(l.split('[')[1].split(']')[0] , DATE_FORMAT)) 

la siguiente linea se puede cambiar por: lineas.append((fecha, l)) y ya
no hace falta la función cmp para la ordenacion.

>                 lineas.append((l, fecha))
> 
>         except IndexError:
>             pass
> 
>         l = f.readline()
> 
>     f.close()    
> 
>     print ("Ordenando %d líneas..." % len(lineas))
> 
>     #Se ordena la lista
>     lineas.sort(cmp)
Como resultado de la modificación anterior la linea anterior quedaría: 
      lineas.sort()
> 
>     print ("Guardando en %s..." % output)
>     try:
>         f = file(output, 'w')
>     except IOError:
>         print "No se ha podido abrir ", output
>         return       
> 
>     
> 
>     #Extraemos las lineas reales
>     lineas = [x[0] for x in lineas]
>     f.writelines(lineas)
>     f.close()
> 
>     return
> 
> 
> Y probando con un archivo de 1000 líneas y midiendo tiempos con el
> módulo profile:
> 
> versión original:
> >>> profile.run('pysort.run ("prueba.txt", "prueba-sort.txt")')
> Leyendo de prueba.txt...
> Ordenando 1000 líneas...
> Guardando en prueba-sort.txt...
>          1002 function calls in 0.300 CPU seconds
>  
>    Ordered by: standard name
>  
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    0.300    0.300 <string>:1(?)
>         0    0.000             0.000          profile:0(profiler)
>         1    0.000    0.000    0.300    0.300 profile:0(pysort.run
> ("prueba.txt", "prueba-sort.txt"))
>         1    0.080    0.080    0.300    0.300 pysort.py:18(run)
>       999    0.220    0.000    0.220    0.000 pysort.py:8(cmp)
>  
> 
> versión con mejoras:
> >>> profile.run('pysort2.run ("prueba.txt", "prueba-sort.txt")')
> Leyendo de prueba.txt...
> Ordenando 1000 líneas...
> Guardando en prueba-sort.txt...
>          1002 function calls in 0.160 CPU seconds
>  
>    Ordered by: standard name
>  
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    0.160    0.160 <string>:1(?)
>         0    0.000             0.000          profile:0(profiler)
>         1    0.000    0.000    0.160    0.160 profile:0(pysort2.run
> ("prueba.txt", "prueba-sort.txt"))
>         1    0.140    0.140    0.160    0.160 pysort2.py:17(run)
>       999    0.020    0.000    0.020    0.000 pysort2.py:8(cmp)
>  
> Im-prezionante :) 
> 
> Aunque sigue petando con python2.3 ...

------------ próxima parte ------------
_______________________________________________
Python-es mailing list
Python-es en aditel.org
http://listas.aditel.org/listinfo/python-es