Splitting on '^' ?

Stephen Hansen apt.shansen at gmail.com
Sun Aug 16 20:50:09 CEST 2009


And .splitlines seems to be able to handle all "standard" end-of-line
> markers without any special direction (which, ironically, strikes
> me as a *little* Perlish, somehow):
>
> >>> "spam\015\012ham\015eggs\012".splitlines(True)
> ['spam\r\n', 'ham\r', 'eggs\n']
>

... actually "working correctly" and robustly is "perlish"? :)

The only reason I've ever actually used this method is this very feature of
it, that you can't readily reproduce with other methods unless you start
getting into regular expressions (and I really believe regular expressions
should not be the default place one looks to solve a problem in Python)

Then again, as soon as Python started allowing you to open files with mode
"rU", I gleefully ran through my codebase and changed every operation to
that and made sure to write out with platform-local newlines exclusively,
thus finally flipping off those darn files that users kept producing with
mixed line endings.


> Amazing.  I'm not sure this is the *best* way to do this in general
> (I would have preferred it, and IMHO it would have been more
> Pythonic, if .splitlines accepted an additional optional argument
> where one could specify the end-of-line sequence to be used for
> the splitting, defaulting to the OS's conventional sequence, and
> then it split *strictly* on that sequence).
>

If you want strict and absolute splitting, you don't need another method;
just do mystring.split(os.linesep); I mean sure, it doesn't have the
'keepends' feature -- but I don't actually understand why you want keepends
with a strict definition of endings... If you /only/ want to split on \n,
you know there's an \n on the end of each line in the returned list and can
easily be sure to write it out (for example) :)

In the modern world of mixed systems and the internet, and files being flung
around willy-nilly, and editors being configured to varying degrees of
correctness, and such.... It's Pythonic to be able to handle all these files
that anyone made on any system and treat them as they are clearly *meant* to
be treated. Since the intention *is* clear that these are all *end of line*
markers-- it's explicitly stated, just slightly differently depending on the
OS-- Python treats all of the line-endings as equal on read if you want it
to. By using either str.splitlines() or opening a text file as "rU". Thank
goodness for that :)

In some cases you may need a more pedantic approach to line endings. In that
case, just use str.split() :)

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090816/4cce4fa7/attachment.html>


More information about the Python-list mailing list