socket send O(N**2) complexity

Rob Williscroft rtw at freenet.co.uk
Mon Sep 21 16:00:27 EDT 2009


Zac Burns wrote in news:mailman.211.1253559803.2807.python-list at python.org 
in comp.lang.python:

> The mysocket.mysend method given at
> http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
> complexity for long msg due to the string slicing.
> 
> I've been looking for a way to optimize this, but aside from a pure
> python 'string slice view' that looks at the original string I can't
> think of anything. Perhaps start and end keywords could be added to
> send? I can't think of a reason for the end keyword,  but it would be
> there for symmetry.
> 

I ran this script on various versions of python I have access to:

#encoding: utf-8
raw_input( "start" )

s = 'x' * 1000000
r = [None] * 1000

raw_input( "allocated 1 meg + " )

for i in xrange(1000):
  r[i] = s[:]

raw_input( "end" )

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between "allocated 1 meg + " and "end"

pypy-c (1.0.0) showed a 30k jump, and IronPython 2.0 showed a few megs
jump.

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a "copy" of s, its only the a few bytes of 
metadata (the pointer and the length) that is really copied, the 
strings character data stays where it is.

So the code you cite is in fact O(N) as the copy is constant size.

Rob.
-- 
http://www.victim-prime.dsl.pipex.com/



More information about the Python-list mailing list