Optional separatorargument for file.writelines() and StringIO.writelines()

Currently, writelines() does not add trailing line separators.. This is fine when working with readlines() but a PITA in other situations. If we added an optional separator argument, it would be easier to add newlines and we would gain some of the flexibility of str.join() at full C speed. Raymond Hettinger

Raymond Hettinger wrote:
Currently, writelines() does not add trailing line separators.. This is fine when working with readlines() but a PITA in other situations.
If we added an optional separator argument, it would be easier to add newlines and we would gain some of the flexibility of str.join() at full C speed.
Maybe not a separator but suffix, so newline will be added to last line too? -- Dmitry Vasiliev (dima at hlabs.spb.ru) http://hlabs.spb.ru

On 2004 Feb 25, at 12:13, Dmitry Vasiliev wrote:
Raymond Hettinger wrote:
Currently, writelines() does not add trailing line separators.. This is fine when working with readlines() but a PITA in other situations. If we added an optional separator argument, it would be easier to add newlines and we would gain some of the flexibility of str.join() at full C speed.
Maybe not a separator but suffix, so newline will be added to last line too?
Good point. And while a separator would be a slight nuisance to express otherwise, a "suffix" isn't -- it seems to me that f.writelines(x+'\n' for x in mylines) is a rather good way of expressing "suffix each line with a \n". I don't think this suffixing operation is so widely more important than other elaborations on items of mylines to make it worth specialcasing into a writelines argument [if anything, f.writelines(str(x) for x in mylines) would be the one elaboration that seems to me to be by far the most frequent -- still not worth specialcasing though, IMHO]. Alex

This is a big YES! A function or method should do what is says. At present, writelines() should be named append(), because it does not actually write LINES. Us older folks often forget which environment we are working in. I am too used to typing "write" when I do not want a line terminator, and "writeln" when I do. (This agrees well with Python where, if I do NOT want an automatic end-of-line, I must put a trailing comma in the Print command.) But, suddenly, when I am sending text to a file, I must remember to hang "\n" on the end of everything. Confusing and non-user-friendly. Backslash-N is not exactly a worldwide way of stating the concept "go to a new line." Not everyone is a C programmer. This also ties in with the "UNIVERSAL_NEWLINE" thing. Only in *nix does a linefeed charactor equate to the end of a line in a text file. So, in every output line of every program I must pretend to be a C programmer and append a '\n' so that the runtime system can replace it with the actual End-of-Line used by my operating system. N.U.T.S.! Let me tell the file method that I want a UNIVERSAL_NEWLINE added to each line I send and let it be done in one step. Let it be: file.writelines(x for x in mylines, True) or even better: file.writeNewlines(True) file.writelines(x for x in mylines) Then in Python 3.0 file.writeNewlines can be defaulted True for all files opened in TEXT mode. Perhaps that will help the *nix crowd remember to actually OPEN binary files in binary mode. ------------ Vernon .............. Dmitry Vasiliev wrote:
Raymond Hettinger wrote:
Currently, writelines() does not add trailing line separators.. This is fine when working with readlines() but a PITA in other situations.
If we added an optional separator argument, it would be easier to add newlines and we would gain some of the flexibility of str.join() at full C speed.
Maybe not a separator but suffix, so newline will be added to last line too?

This is a big YES!
Actually, it's a big no. Alex, immediately and correctly pointed out that what is needed in a suffix rather than a separator and the way to get that is with a generator expression: f.writelines(x+'\n' for x in mylines). I had been led astray because I was experimenting with using cStringIO.writelines() as a basis for implementing str.join() for general iterables without creating an intermediate tuple. Right now, ''.join(it) will unexpectedly consume much more memory than really needed. Raymond Hettinger

On Feb 26, 2004, at 8:13 PM, Raymond Hettinger wrote:
This is a big YES!
Actually, it's a big no. Alex, immediately and correctly pointed out that what is needed in a suffix rather than a separator and the way to get that is with a generator expression: f.writelines(x+'\n' for x in mylines).
That'll still cause the copying of every string in the iterable; interleaving the newline with the strings themselves would probably be a faster solution, I think.
I had been led astray because I was experimenting with using cStringIO.writelines() as a basis for implementing str.join() for general iterables without creating an intermediate tuple. Right now, ''.join(it) will unexpectedly consume much more memory than really needed.
This sounds like a cool idea, are you still going to implement it? Jeremy

I had been led astray because I was experimenting with using cStringIO.writelines() as a basis for implementing str.join() for general iterables without creating an intermediate tuple. Right now, ''.join(it) will unexpectedly consume much more memory than really needed.
[Jeremy Fincher]
This sounds like a cool idea, are you still going to implement it?
One way or another, I'll make str.join() smarter and faster than it is now. Which way proves to be better is still open. Raymond

On Feb 26, 2004, at 10:03 PM, Raymond Hettinger wrote:
I had been led astray because I was experimenting with using cStringIO.writelines() as a basis for implementing str.join() for general iterables without creating an intermediate tuple. Right now, ''.join(it) will unexpectedly consume much more memory than really needed.
[Jeremy Fincher]
This sounds like a cool idea, are you still going to implement it?
One way or another, I'll make str.join() smarter and faster than it is now. Which way proves to be better is still open.
In trying to time this in pure python, I discovered that cStringIO.writelines doesn't take generators.. apparently only objects that support len(...) are allowed Currently, it seems that ''.join is faster than cStringIO.writelines anyway, even if you just pass it a big list. from cStringIO import StringIO def join2(sep, seq): sio = StringIO() if not sep: sio.writelines(seq) else: # does not work def join2(): yield seq.next() for s in seq: yield sep yield s sio.writelines(join2()) return sio.getvalue() import time from itertools import repeat import operator def timeit(fn, *args, **kwargs): lst = [] for ig in xrange(100): t0 = time.time() fn(*args, **kwargs) lst.append(time.time() - t0) print fn.__name__ print '', 'max:', max(lst) print '', 'min:', min(lst) print '', 'avg:', reduce(operator.add, lst)/len(lst) if __name__ == '__main__': strings = [' ' * 1000] * 1000 print 'no separator' print '------------' timeit(join2, '', strings) timeit(''.join, strings) [crack:~] bob% python strio.py no separator ------------ join2 max: 0.0453701019287 min: 0.0157110691071 avg: 0.0181557154655 join max: 0.0425598621368 min: 0.00337600708008 avg: 0.00432039260864

[Bob Ippolito]
Currently, it seems that ''.join is faster than cStringIO.writelines anyway, even if you just pass it a big list.
I suspect something wrong with your timing suite. Under the hood, cStringIO.writelines() actually is ''.join(). It appears that the whole comparison is circular. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################

On Feb 27, 2004, at 4:12 AM, Raymond Hettinger wrote:
[Bob Ippolito]
Currently, it seems that ''.join is faster than cStringIO.writelines anyway, even if you just pass it a big list.
I suspect something wrong with your timing suite. Under the hood, cStringIO.writelines() actually is ''.join(). It appears that the whole comparison is circular.
Circular in that "ends up doing approximately the same thing but has a lot more overhead" kind of way ;)... I think the difference is easily explained by: * N calls to join2 * N StringIO() * N StringIO.writelines * N StringIO.getvalue * N StringIO dealloc * [ probably also memcpy's 2 or 3 times as much ] -bob

Vernon Cole <kf7xm@netscape.net>:
Only in *nix does a linefeed charactor equate to the end of a line in a text file. So, in every output line of every program I must pretend to be a C programmer and append a '\n' so that the runtime system can replace it with the actual End-of-Line used by my operating system.
Seems to me the 'n' in '\n' is meant to stand for "newline", not "linefeed". So it's already named in an OS-independent way. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (8)
-
Alex Martelli
-
Bob Ippolito
-
Dmitry Vasiliev
-
Greg Ewing
-
Jeremy Fincher
-
Raymond Hettinger
-
Raymond Hettinger
-
Vernon Cole