Unix line endings required for PyRun* breaking embedded Python
There is a discussion going on at the moment in postgresql-general about plpythonu (which allows you write stored procedures in Python) and line endings. The discussion starts here: http://archives.postgresql.org/pgsql-general/2005-01/msg00792.php The problem appears to be that things are working as documented in PEP-278: There is no support for universal newlines in strings passed to eval() or exec. It is envisioned that such strings always have the standard \n line feed, if the strings come from a file that file can be read with universal newlines. So what happens is that if a Windows or Mac user tries to create a Python stored procedure, it will go through to the server with Windows line endings and the embedded Python interpreter will raise a syntax error for everything except single line functions. I don't think it is possible for plpythonu to fix this by simply translating the line endings, as this would require significant knowledge of Python syntax to do correctly (triple quoted strings and character escaping I think). The timing of this thread is very unfortunate, as PostgreSQL 8.0 is being released this weekend and the (hopefully) last release of the 2.3 series next week :-( -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Stuart Bishop wrote:
I don't think it is possible for plpythonu to fix this by simply translating the line endings, as this would require significant knowledge of Python syntax to do correctly (triple quoted strings and character escaping I think).
of course it's possible: that's what the interpreter does when it loads a script or module, after all... or in other words, print repr(""" """) always prints "\n" (at least on Unix (\n) and Windows (\r\n)). </F>
On 2005 Jan 20, at 00:14, Fredrik Lundh wrote:
Stuart Bishop wrote:
I don't think it is possible for plpythonu to fix this by simply translating the line endings, as this would require significant knowledge of Python syntax to do correctly (triple quoted strings and character escaping I think).
of course it's possible: that's what the interpreter does when it loads a script or module, after all... or in other words,
print repr(""" """)
always prints "\n" (at least on Unix (\n) and Windows (\r\n)).
Mac, too (but then, that IS Unix to all intents and purposes, nowadays). Alex
Stuart> I don't think it is possible for plpythonu to fix this by simply Stuart> translating the line endings, as this would require significant Stuart> knowledge of Python syntax to do correctly (triple quoted Stuart> strings and character escaping I think). I don't see why not. If you treat the string as a file in text mode, I think you'd replace all [\r\n]+ with \n, even if it was embedded in a string: >>> s 'from math import pi\r\n"""triple-quoted string embedding CR:\rrest of string"""\r\nprint 2*pi*7\r' >>> open("foo", "w").write(s) >>> open("foo", "rU").read() 'from math import pi\n"""triple-quoted string embedding CR:\nrest of string"""\nprint 2*pi*7\n' Just re.sub("[\r\n]+", "\n", s) and I think you're good to go. Skip
Skip Montanaro wrote:
Just re.sub("[\r\n]+", "\n", s) and I think you're good to go.
I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex: s = re.sub(r"\r\n?", "\n", s) Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines(): s = "\n".join(s.splitlines()) + "\n" This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application. Just
Just van Rossum wrote:
I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex:
s = re.sub(r"\r\n?", "\n", s)
Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines():
s = "\n".join(s.splitlines()) + "\n"
This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application.
s = s.replace("\r", "\n"["\n" in s:]) </F>
Fredrik> s = s.replace("\r", "\n"["\n" in s:]) This fails on admittedly weird strings that mix line endings: >>> s = "abc\rdef\r\n" >>> s = s.replace("\r", "\n"["\n" in s:]) >>> s 'abcdef\n' where universal newline mode or Just's re.sub() gadget would work. Skip
Just> Skip Montanaro wrote: >> Just re.sub("[\r\n]+", "\n", s) and I think you're good to go. Just> I don't think that in general you want to fold multiple empty Just> lines into one. Whoops. Yes. Skip
Just van Rossum wrote:
Skip Montanaro wrote:
Just re.sub("[\r\n]+", "\n", s) and I think you're good to go.
I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex:
s = re.sub(r"\r\n?", "\n", s)
Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines():
s = "\n".join(s.splitlines()) + "\n"
This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application.
Do people consider this a bug that should be fixed in Python 2.4.1 and Python 2.3.6 (if it ever exists), or is the resposibility for doing this transformation on the application that embeds Python? -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Stuart Bishop wrote:
Do people consider this a bug that should be fixed in Python 2.4.1 and Python 2.3.6 (if it ever exists), or is the resposibility for doing this transformation on the application that embeds Python?
the text you quoted is pretty clear on this: It is envisioned that such strings always have the standard \n line feed, if the strings come from a file that file can be read with universal newlines. just add the fix, already (you don't want plpythonu to depend on a future release anyway) </F>
On 21 Jan 2005, at 08:18, Stuart Bishop wrote:
Just van Rossum wrote:
Skip Montanaro wrote:
Just re.sub("[\r\n]+", "\n", s) and I think you're good to go. I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex: s = re.sub(r"\r\n?", "\n", s) Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines(): s = "\n".join(s.splitlines()) + "\n" This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application.
Do people consider this a bug that should be fixed in Python 2.4.1 and Python 2.3.6 (if it ever exists), or is the resposibility for doing this transformation on the application that embeds Python?
It could theoretically break something: a program that uses unix line-endings but embeds \r or \r\n in string data. But this is rather theoretical, I don't think I'd have a problem with fixing this. The real problem is: who will fix it, because the fix isn't going to be as trivial as the Python code posted here, I'm afraid... -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman
On Jan 21, 2005, at 7:44, Jack Jansen wrote:
On 21 Jan 2005, at 08:18, Stuart Bishop wrote:
Just van Rossum wrote:
Skip Montanaro wrote:
Just re.sub("[\r\n]+", "\n", s) and I think you're good to go. I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex: s = re.sub(r"\r\n?", "\n", s) Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines(): s = "\n".join(s.splitlines()) + "\n" This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application.
Do people consider this a bug that should be fixed in Python 2.4.1 and Python 2.3.6 (if it ever exists), or is the resposibility for doing this transformation on the application that embeds Python?
It could theoretically break something: a program that uses unix line-endings but embeds \r or \r\n in string data.
But this is rather theoretical, I don't think I'd have a problem with fixing this. The real problem is: who will fix it, because the fix isn't going to be as trivial as the Python code posted here, I'm afraid...
Well, Python already does the right thing in Py_Main, but it does not do the right thing from the other places you can use to run code, surely it can't be that hard to fix if the code is already there? -bob
On 21-jan-05, at 14:07, Bob Ippolito wrote:
On Jan 21, 2005, at 7:44, Jack Jansen wrote:
On 21 Jan 2005, at 08:18, Stuart Bishop wrote:
Just van Rossum wrote:
Skip Montanaro wrote:
Just re.sub("[\r\n]+", "\n", s) and I think you're good to go. I don't think that in general you want to fold multiple empty lines into one. This would be my prefered regex: s = re.sub(r"\r\n?", "\n", s) Catches both DOS and old-style Mac line endings. Alternatively, you can use s.splitlines(): s = "\n".join(s.splitlines()) + "\n" This also makes sure the string ends with a \n, which may or may not be a good thing, depending on your application.
Do people consider this a bug that should be fixed in Python 2.4.1 and Python 2.3.6 (if it ever exists), or is the resposibility for doing this transformation on the application that embeds Python?
It could theoretically break something: a program that uses unix line-endings but embeds \r or \r\n in string data.
But this is rather theoretical, I don't think I'd have a problem with fixing this. The real problem is: who will fix it, because the fix isn't going to be as trivial as the Python code posted here, I'm afraid...
Well, Python already does the right thing in Py_Main, but it does not do the right thing from the other places you can use to run code, surely it can't be that hard to fix if the code is already there?
IIRC the universal newline support is in the file I/O routines, which I assume aren't used when you execute Python code from a string. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman
participants (7)
-
Alex Martelli
-
Bob Ippolito
-
Fredrik Lundh
-
Jack Jansen
-
Just van Rossum
-
Skip Montanaro
-
Stuart Bishop