Mailman 3 eval and triple quoted strings - Python-Dev

newer
Re: [Python-Dev] [Python-checkins]...

eval and triple quoted strings

older
Re: [Python-Dev] cpython: Issue...

Walter Dörwald

14 Jun 2013 14 Jun '13

9:55 a.m.

Hello all! This surprised me: >>> eval("'''\r\n'''") '\n' Where did the \r go? ast.literal_eval() has the same problem: >>> ast.literal_eval("'''\r\n'''") '\n' Is this a bug/worth fixing? Servus, Walter

Show replies by date

Guido van Rossum

14 Jun 14 Jun

10:36 a.m.

Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing. On Jun 14, 2013 8:27 AM, "Walter Dörwald" <walter@livinglogic.de> wrote:

...

Hello all!

This surprised me:

...
...
...
eval("'''\r\n'''") '\n'

Where did the \r go? ast.literal_eval() has the same problem:

...
...
...
ast.literal_eval("'''\r\n'''") '\n'

Is this a bug/worth fixing?

Servus, Walter ______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>

Ron Adam

1:11 p.m.

On 06/14/2013 10:36 AM, Guido van Rossum wrote:

...

Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing.

Should this be the same? python3 -c 'print(bytes("""\r\n""", "utf8"))' b'\r\n'

...

...
...
eval('print(bytes("""\r\n""", "utf8"))') b'\n'

Ron

...

On Jun 14, 2013 8:27 AM, "Walter Dörwald" <walter@livinglogic.de <mailto:walter@livinglogic.de>> wrote:

Hello all!

This surprised me:

>>> eval("'''\r\n'''") '\n'

Where did the \r go? ast.literal_eval() has the same problem:

>>> ast.literal_eval("'''\r\n'''") '\n'

Is this a bug/worth fixing?

Servus, Walter _________________________________________________ Python-Dev mailing list Python-Dev@python.org <mailto:Python-Dev@python.org> http://mail.python.org/__mailman/listinfo/python-dev <http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/__mailman/options/python-dev/__guido%40python.org <http://mail.python.org/mailman/options/python-dev/guido%40python.org>

PJ Eby

4:03 p.m.

On Fri, Jun 14, 2013 at 2:11 PM, Ron Adam <ron3200@gmail.com> wrote:

...

On 06/14/2013 10:36 AM, Guido van Rossum wrote:

...
Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing.

Should this be the same?

python3 -c 'print(bytes("""\r\n""", "utf8"))' b'\r\n'

...
...
...
eval('print(bytes("""\r\n""", "utf8"))') b'\n'

No, but: eval(r'print(bytes("""\r\n""", "utf8"))') should be. (And is.) What I believe you and Walter are missing is that the \r\n in the eval strings are converted early if you don't make the enclosing string raw. So what you're eval-ing is not what you think you are eval-ing, hence the confusion.

Ron Adam

15 Jun 15 Jun

2:58 p.m.

On 06/14/2013 04:03 PM, PJ Eby wrote:

...

...
...
Should this be the same?

python3 -c 'print(bytes("""\r\n""", "utf8"))' b'\r\n'

...
...
...
>>>eval('print(bytes("""\r\n""", "utf8"))') b'\n' No, but:

eval(r'print(bytes("""\r\n""", "utf8"))')

should be. (And is.)

What I believe you and Walter are missing is that the \r\n in the eval strings are converted early if you don't make the enclosing string raw. So what you're eval-ing is not what you think you are eval-ing, hence the confusion.

Yes thanks, seems like an easy mistake to make. To be clear... The string to eval is parsed when the eval line is tokenized in the scope containing the eval() function. The eval function then parses the resulting string object it receives as it's input. There is no mention of using raw strings in the docs on evel and exec. I think there should be, because the intention (in most cases) is for eval to parse the string, and not for it to be parsed or changed before it's evaluated by eval or exec. An example using a string with escape characters might make it clearer. Cheers, Ron

Guido van Rossum

3:23 p.m.

The semantics of raw strings are clear. I don't see that they should be called out especially in any context. (Except for regexps.) Usually exec() is not used with a literal anyway (what would be the point). --Guido van Rossum (sent from Android phone) On Jun 15, 2013 1:03 PM, "Ron Adam" <ron3200@gmail.com> wrote:

...

On 06/14/2013 04:03 PM, PJ Eby wrote:

...
...
Should this be the same?

...
python3 -c 'print(bytes("""\r\n""", "utf8"))' b'\r\n'

...
...
...
>>eval('print(bytes("""\r\n"**"", "utf8"))')

b'\n'

No, but:

eval(r'print(bytes("""\r\n""", "utf8"))')

should be. (And is.)

What I believe you and Walter are missing is that the \r\n in the eval strings are converted early if you don't make the enclosing string raw. So what you're eval-ing is not what you think you are eval-ing, hence the confusion.

Yes thanks, seems like an easy mistake to make.

To be clear...

The string to eval is parsed when the eval line is tokenized in the scope containing the eval() function. The eval function then parses the resulting string object it receives as it's input.

There is no mention of using raw strings in the docs on evel and exec. I think there should be, because the intention (in most cases) is for eval to parse the string, and not for it to be parsed or changed before it's evaluated by eval or exec.

An example using a string with escape characters might make it clearer.

Cheers, Ron ______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>

Ron Adam

9 p.m.

On 06/15/2013 03:23 PM, Guido van Rossum wrote:

...

The semantics of raw strings are clear. I don't see that they should be called out especially in any context. (Except for regexps.) Usually exec() is not used with a literal anyway (what would be the point).

There are about a hundred instances of eval/exec(some_string_literal) in pythons library. Most of them in the tests, and maybe about half of those testing the compiler, eval, and exec. egrep -owr --include="*.py" "(eval|exec)\(('.*'|\".*\")\)" * | wc -l 114 I have no idea in how many places a string literal is assigned to a name first and then used later in eval or exec. It's harder to grep for but would be less than... egrep -owr --include="*.py" "(eval|exec)\(.*\)" * | wc -l 438 That's overstated because some of those are comments, and some may be functions with the name ending with eval or exec. I do think that eval and exec is a similar case to regexps. And possibly often enough, the string may contain a raw string, regular expression, or a file/path name. Only a short note needed in the docs for eval, nothing more. And not even that if no thinks it's an issue. cheers, Ron

Walter Dörwald

17 Jun 17 Jun

12:04 p.m.

On 14.06.13 23:03, PJ Eby wrote:

...

On Fri, Jun 14, 2013 at 2:11 PM, Ron Adam <ron3200@gmail.com> wrote:

...
On 06/14/2013 10:36 AM, Guido van Rossum wrote:

...
Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing.

Should this be the same?

python3 -c 'print(bytes("""\r\n""", "utf8"))' b'\r\n'

...
...
...
eval('print(bytes("""\r\n""", "utf8"))') b'\n'

No, but:

eval(r'print(bytes("""\r\n""", "utf8"))')

should be. (And is.)

What I believe you and Walter are missing is that the \r\n in the eval strings are converted early if you don't make the enclosing string raw. So what you're eval-ing is not what you think you are eval-ing, hence the confusion.

I expected that eval()ing a string that contains the characters U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+000D: CR U+000A: LR U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE to return a string containing the characters: U+000D: CR U+000A: LR Making the string raw, of course turns it into: U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+005C: REVERSE SOLIDUS U+0072: LATIN SMALL LETTER R U+005C: REVERSE SOLIDUS U+006E: LATIN SMALL LETTER N U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE and eval()ing that does indeed give "\r\n" as expected. Hmm, it seems that codecs.unicode_escape_decode() does what I want:

...

...
...
codecs.unicode_escape_decode("\r\n\\r\\n\\x0d\\x0a\\u000d\\u000a") ('\r\n\r\n\r\n\r\n', 26)

Servus, Walter

Walter Dörwald

12:14 p.m.

On 17.06.13 19:04, Walter Dörwald wrote:

...

Hmm, it seems that codecs.unicode_escape_decode() does what I want:

...
...
...
codecs.unicode_escape_decode("\r\n\\r\\n\\x0d\\x0a\\u000d\\u000a") ('\r\n\r\n\r\n\r\n', 26)

Hmm, no it doesn't:

...

...
...
codecs.unicode_escape_decode("\u1234") ('á\x88´', 3)

Servus, Walter

Guido van Rossum

12:22 p.m.

On Mon, Jun 17, 2013 at 10:04 AM, Walter Dörwald <walter@livinglogic.de> wrote:

...

I expected that eval()ing a string that contains the characters

U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+000D: CR U+000A: LR U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE

to return a string containing the characters:

U+000D: CR U+000A: LR

No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does. -- --Guido van Rossum (python.org/~guido)

Greg Ewing

5:18 p.m.

Guido van Rossum wrote:

...

No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time. -- Greg

Guido van Rossum

5:35 p.m.

On Mon, Jun 17, 2013 at 3:18 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

...

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

There are other ways to get a string besides reading it from a file. Anyway, I think that if you want a string literal that contains \r\n as its line endings, you should use a syntactic solution, and the syntax ought to be the same regardless of whether you are reading it from a file or from a string literal. That syntactic solution is very clear: """line one\r line two\r line three\r """ This works everywhere. -- --Guido van Rossum (python.org/~guido)

Benjamin Peterson

6:40 p.m.

2013/6/17 Greg Ewing <greg.ewing@canterbury.ac.nz>:

...

Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It used to be that way until 2.7. People like to do things like with open("myfile.py", "rb") as fp: exec fp.read() in ns which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you. -- Regards, Benjamin

Guido van Rossum

6:52 p.m.

On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin@python.org> wrote:

...

2013/6/17 Greg Ewing <greg.ewing@canterbury.ac.nz>:

...
Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It used to be that way until 2.7. People like to do things like

with open("myfile.py", "rb") as fp: exec fp.read() in ns

which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you.

What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-). -- --Guido van Rossum (python.org/~guido)

Benjamin Peterson

7:02 p.m.

2013/6/17 Guido van Rossum <guido@python.org>:

...

On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin@python.org> wrote:

...
2013/6/17 Greg Ewing <greg.ewing@canterbury.ac.nz>:

...
Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It used to be that way until 2.7. People like to do things like

with open("myfile.py", "rb") as fp: exec fp.read() in ns

which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you.

What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-).

The coding cookie is the main one. In fact, if you can't parse that, you don't really know what encoding to open the file with at all. There's also small things like BOM handling (you have to use the utf-16-sig encoding with TextIO to get it removed) and defaulting to UTF-8 (which the io module doesn't do) which is better left to the parser. -- Regards, Benjamin

Guido van Rossum

7:06 p.m.

On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin@python.org> wrote:

...

2013/6/17 Guido van Rossum <guido@python.org>:

...
On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin@python.org> wrote:

...
2013/6/17 Greg Ewing <greg.ewing@canterbury.ac.nz>:

...
Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It used to be that way until 2.7. People like to do things like

with open("myfile.py", "rb") as fp: exec fp.read() in ns

which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you.

What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-).

The coding cookie is the main one. In fact, if you can't parse that, you don't really know what encoding to open the file with at all. There's also small things like BOM handling (you have to use the utf-16-sig encoding with TextIO to get it removed) and defaulting to UTF-8 (which the io module doesn't do) which is better left to the parser.

Maybe there are some lessons here that the TextIO module could learn? -- --Guido van Rossum (python.org/~guido)

Victor Stinner

7:22 p.m.

It may be possible to implement parsing the codec cookie as a Python codec :-) Victor 2013/6/18 Guido van Rossum <guido@python.org>:

...

On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin@python.org> wrote:

...
2013/6/17 Guido van Rossum <guido@python.org>:

...
On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin@python.org> wrote:

...
2013/6/17 Greg Ewing <greg.ewing@canterbury.ac.nz>:

...
Guido van Rossum wrote:

...
No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It used to be that way until 2.7. People like to do things like

with open("myfile.py", "rb") as fp: exec fp.read() in ns

which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you.

What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-).

The coding cookie is the main one. In fact, if you can't parse that, you don't really know what encoding to open the file with at all. There's also small things like BOM handling (you have to use the utf-16-sig encoding with TextIO to get it removed) and defaulting to UTF-8 (which the io module doesn't do) which is better left to the parser.

Maybe there are some lessons here that the TextIO module could learn?

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

Benjamin Peterson

7:49 p.m.

2013/6/17 Guido van Rossum <guido@python.org>:

...

On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin@python.org> wrote:

...
2013/6/17 Guido van Rossum <guido@python.org>:

...
On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin@python.org> wrote: What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-).

The coding cookie is the main one. In fact, if you can't parse that, you don't really know what encoding to open the file with at all. There's also small things like BOM handling (you have to use the utf-16-sig encoding with TextIO to get it removed) and defaulting to UTF-8 (which the io module doesn't do) which is better left to the parser.

Maybe there are some lessons here that the TextIO module could learn?

UTF-8 by default would be great, but that ship has sailed. Reading Python coding cookies is outside the purview of TextIOWrapper. However, it would be good to have a function in the stdlib to read a python source file to Unicode; I've definitely implemented that several times. -- Regards, Benjamin

Éric Araujo

8:07 p.m.

Le 17/06/2013 20:49, Benjamin Peterson a écrit :

...

Reading Python coding cookies is outside the purview of TextIOWrapper. However, it would be good to have a function in the stdlib to read a python source file to Unicode; I've definitely implemented that several times.

IIUC you want http://docs.python.org/3/library/tokenize#tokenize.open (3.2+). Regards

Benjamin Peterson

8:28 p.m.

2013/6/17 Éric Araujo <merwok@netwok.org>:

...

Le 17/06/2013 20:49, Benjamin Peterson a écrit :

...
Reading Python coding cookies is outside the purview of TextIOWrapper. However, it would be good to have a function in the stdlib to read a python source file to Unicode; I've definitely implemented that several times.

IIUC you want http://docs.python.org/3/library/tokenize#tokenize.open (3.2+).

Yep. :) -- Regards, Benjamin

Ron Adam

7:44 p.m.

On 06/17/2013 05:18 PM, Greg Ewing wrote:

...

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time.

It's the other way around. Eval and exec should generate the same results as pythons compiler with the same input, including errors and exceptions. The only way we can have that is if eval and exec parses everything the same way. It's the first parsing that needs to be avoided or compensated for in these cases. Raw strings (my preference) works for string literals, or you can escape the escape codes so they are still individual characters after the first translation. Or read the code directly from a file rather than importing it. For example, if you wrote your own python console program, you would want all the errors and exceptions to come from eval, including those for bad strings. You would still need to feed the bad strings to eval. If you don't then you won't get the same output from eval as the compiler does. Cheers, Ron

Ron Adam

4:52 p.m.

On 06/17/2013 12:04 PM, Walter Dörwald wrote:

...

Making the string raw, of course turns it into:

U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE U+005C: REVERSE SOLIDUS U+0072: LATIN SMALL LETTER R U+005C: REVERSE SOLIDUS U+006E: LATIN SMALL LETTER N U+0027: APOSTROPHE U+0027: APOSTROPHE U+0027: APOSTROPHE

and eval()ing that does indeed give "\r\n" as expected.

You can also escape the reverse slashes in a regular string to get the same result.

...

...
...
s1 = "'''\\r\\n'''" list(s1) ["'", "'", "'", '\\', 'r', '\\', 'n', "'", "'", "'"]

...

...
...
s2 = eval(s1) list(s2) ['\r', '\n']

...

...
...
s3 = "'''%s'''" % s2 list(s3) ["'", "'", "'", '\r', '\n', "'", "'", "'"]

...

...
...
s4 = eval(s3) list(s4) ['\n']

When a standard string literal used with eval, it's evaluated first to a string object in the same scope as the eval function is called from, then the eval function is called with that string object and it's evaluated again. So it's really being parsed twice. (That's the part that got me.) The transformation between s1 and s2 is what Phillip is referring to, and Guido is referring to the transformation from s2 to s4. (s3 is needed to avoid the end of line error of evaluating a single quoted string with \n in it.) When a sting literal is used directly with eval, it looks like it is evaluated from s1 to s4 in one step, but that isn't what is happening. Cheers, Ron (ps: Was informed my posts where showing up twice.. hopefully I got that fixed now.)

Greg Ewing

14 Jun 14 Jun

11:08 p.m.

Guido van Rossum wrote:

...

Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing.

I'm not convinced it's reasonable behaviour to re-scan the string as though it's being read from a file. It's a Python string, so it's already been through whatever line-ending transformation is appropriate to get it into memory. -- Greg

Nick Coghlan

15 Jun 15 Jun

12:18 a.m.

On 15 June 2013 14:08, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Guido van Rossum wrote:

...
Not a bug. The same is done for file input -- CRLF is changed to LF before tokenizing.

I'm not convinced it's reasonable behaviour to re-scan the string as though it's being read from a file. It's a Python string, so it's already been through whatever line-ending transformation is appropriate to get it into memory.

...

...
...
list(tokenize.tokenize((l for l in (b"""'\r\n'""", b"")).__next__))[2] TokenInfo(type=4 (NEWLINE), string='\r\n', start=(1, 1), end=(1, 3),

No, that's not the way the Python compiler works. The transformation Guido is talking about is the way the tokenizer identifiers "NEWLINE" tokens: line="'\r\n'") This long predates universal newlines mode - it's part of the compilation process, not part of the IO system. The compiler then sees the NEWLINE token in the tokenizer output, and inserts a "\n" into the triple-quoted string. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

4161

Age (days ago)

4165

Last active (days ago)

List overview

Download

23 comments

9 participants

participants (9)

Benjamin Peterson
Greg Ewing
Guido van Rossum
Nick Coghlan
PJ Eby
Ron Adam
Victor Stinner
Walter Dörwald
Éric Araujo

eval and triple quoted strings

tags

participants (9)