Readability of hex strings (Was: Use of coding cookie in 3.x stdlib)
I find "\xXX\xXX\xXX\xXX..." notation for binary data totally unreadable. Everybody who uses and analyses binary data is more familiar with plain hex dumps in the form of "XX XX XX XX...". I wonder if it is possible to introduce an effective binary string type that will be represented as h"XX XX XX" in language syntax? It will be much easier to analyze printed binary data and copy/paste such data as-is from hex editors/views. On Mon, Jul 19, 2010 at 9:45 AM, Guido van Rossum <guido@python.org> wrote:
Sounds like a good idea to try to remove redundant cookies *and* to remove most occasional use of non-ASCII characters outside comments (except for unittests specifically trying to test Unicode features). Personally I would use \xXX escapes instead of spelling out the characters in shlex.py, for example.
Both with or without the coding cookies, many ways of displaying text files garble characters outside the ASCII range, so it's better to stick to ASCII as much as possible.
--Guido
On Mon, Jul 19, 2010 at 1:21 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I was looking at the inspect module and noticed that it's source starts with "# -*- coding: iso-8859-1 -*-". I have checked and there are no non-ascii characters in the file. There are several other modules that still use the cookie:
Lib/ast.py:# -*- coding: utf-8 -*- Lib/getopt.py:# -*- coding: utf-8 -*- Lib/inspect.py:# -*- coding: iso-8859-1 -*- Lib/pydoc.py:# -*- coding: latin-1 -*- Lib/shlex.py:# -*- coding: iso-8859-1 -*- Lib/encodings/punycode.py:# -*- coding: utf-8 -*- Lib/msilib/__init__.py:# -*- coding: utf-8 -*- Lib/sqlite3/__init__.py:#-*- coding: ISO-8859-1 -*- Lib/sqlite3/dbapi2.py:#-*- coding: ISO-8859-1 -*- Lib/test/bad_coding.py:# -*- coding: uft-8 -*- Lib/test/badsyntax_3131.py:# -*- coding: utf-8 -*-
I understand that coding: utf-8 is strictly redundant in 3.x. There are cases such as Lib/shlex.py where using encoding other than utf-8 is justified. (See http://svn.python.org/view?view=rev&revision=82560). What are the guidelines for other cases? Should redundant cookies be removed? Since not all editors respect the -*- cookie, I think the answer should be "yes" particularly when the cookie is setting encoding other than utf-8. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com
[+Python-ideas -Python-Dev] import binascii def h(s): return binascii.unhexlify("".join(s.split())) h("DE AD BE EF CA FE BA BE") -- Alexandre On Mon, Jul 26, 2010 at 11:29 AM, anatoly techtonik <techtonik@gmail.com> wrote:
I find "\xXX\xXX\xXX\xXX..." notation for binary data totally unreadable. Everybody who uses and analyses binary data is more familiar with plain hex dumps in the form of "XX XX XX XX...".
I wonder if it is possible to introduce an effective binary string type that will be represented as h"XX XX XX" in language syntax? It will be much easier to analyze printed binary data and copy/paste such data as-is from hex editors/views.
On Mon, Jul 19, 2010 at 9:45 AM, Guido van Rossum <guido@python.org> wrote:
Sounds like a good idea to try to remove redundant cookies *and* to remove most occasional use of non-ASCII characters outside comments (except for unittests specifically trying to test Unicode features). Personally I would use \xXX escapes instead of spelling out the characters in shlex.py, for example.
Both with or without the coding cookies, many ways of displaying text files garble characters outside the ASCII range, so it's better to stick to ASCII as much as possible.
--Guido
On Mon, Jul 19, 2010 at 1:21 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I was looking at the inspect module and noticed that it's source starts with "# -*- coding: iso-8859-1 -*-". I have checked and there are no non-ascii characters in the file. There are several other modules that still use the cookie:
Lib/ast.py:# -*- coding: utf-8 -*- Lib/getopt.py:# -*- coding: utf-8 -*- Lib/inspect.py:# -*- coding: iso-8859-1 -*- Lib/pydoc.py:# -*- coding: latin-1 -*- Lib/shlex.py:# -*- coding: iso-8859-1 -*- Lib/encodings/punycode.py:# -*- coding: utf-8 -*- Lib/msilib/__init__.py:# -*- coding: utf-8 -*- Lib/sqlite3/__init__.py:#-*- coding: ISO-8859-1 -*- Lib/sqlite3/dbapi2.py:#-*- coding: ISO-8859-1 -*- Lib/test/bad_coding.py:# -*- coding: uft-8 -*- Lib/test/badsyntax_3131.py:# -*- coding: utf-8 -*-
I understand that coding: utf-8 is strictly redundant in 3.x. There are cases such as Lib/shlex.py where using encoding other than utf-8 is justified. (See http://svn.python.org/view?view=rev&revision=82560). What are the guidelines for other cases? Should redundant cookies be removed? Since not all editors respect the -*- cookie, I think the answer should be "yes" particularly when the cookie is setting encoding other than utf-8. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre%40peadrop.com
On Tue, 27 Jul 2010 04:29:31 am anatoly techtonik wrote:
I find "\xXX\xXX\xXX\xXX..." notation for binary data totally unreadable. Everybody who uses and analyses binary data is more familiar with plain hex dumps in the form of "XX XX XX XX...".
I wonder if it is possible to introduce an effective binary string type that will be represented as h"XX XX XX" in language syntax? It will be much easier to analyze printed binary data and copy/paste such data as-is from hex editors/views.
With the moratorium on new language features, this would not even be considered until Python 3.3. If you are serious in pursuing this idea, it is off-topic for this list and should be taken to python-ideas, or even python-list for community feedback, first. Since it only takes a pair of small helper functions to convert hex dumps in the form "XXXX XXXX ..." to and from byte strings, I don't see the need for new syntax and would vote -1 on the idea. However, I'd vote +0 on a matching bytes.tohex() method to partner with the existing bytes.fromhex(). -- Steven D'Aprano
anatoly techtonik wrote:
I wonder if it is possible to introduce an effective binary string type that will be represented as h"XX XX XX" in language syntax?
Rather than a new type, maybe bytes objects could just have a bit indicating whether they were best thought of as containing characterish stuff or just raw data. It wouldn't affect the behaviour in any way except that the repr would come out in hex instead of text. Then b"..." and h"..." literals could produce bytes objects with different settings for the raw-data bit. -- Greg
participants (4)
-
Alexandre Vassalotti
-
anatoly techtonik
-
Greg Ewing
-
Steven D'Aprano