[issue11726] linecache becomes specific to Python scripts in Python 3
New submission from STINNER Victor <victor.stinner@haypocalc.com>: linecache document doesn't tell that the module reads the #coding:xxx cookie to get the encoding of the Python file. linecache reads this cookie since 41665 (May 09 2007). "The linecache module allows one to get any line from any file, ..." => "any file"! And the example uses /etc/passwd which is not a Python file. Not only it reads the #coding:xxx cookie, but updatecache() tries also a PEP 302 loader to read the file. linecache should be marked as very specific to Python scripts, or it should be patched to become more generic (don't read the cookie / use loader by default). Note: the locale encoding may change between to calls to the linecache module. ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 132641 nosy: docs@python, haypo priority: normal severity: normal status: open title: linecache becomes specific to Python scripts in Python 3 versions: Python 3.1, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Changes by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- priority: normal -> high _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Terry J. Reedy <tjreedy@udel.edu> added the comment: The help(linecache) Description is more specific as to the intention (based on traceback usage): "This is intended to read lines from modules imported -- hence if a filename is not found, it will look down the module search path for a file by that name." My experiments show that this is too specific. It *can* read any file that it can find and decode as utf-8 (default, or you say, locale encoding or coding in cookie line). Find = absolute path
linecache.getline('c:/programs/pydev/py32/LICENSE', 1) 'A. HISTORY OF THE SOFTWARE\n'
or relative path on sys.path
linecache.getline('idlelib/ChangeLog', 1) 'Please refer to the IDLEfork and IDLE CVS repositories for\n' linecache.getline('idlelib/extend.txt', 1) 'Writing an IDLE extension\n'
Decode fails on byte illegal for utf-8:
linecache.getline('idlelib/CREDITS.txt', 1) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 1566: invalid start byte
(It reads and decodes entire file even though only line 1 was requested. It choked on Löwis. I believe Py3 distributed text files should be utf-8 instead of latin-1.) If I got rules right, doc should say "Filename must be an absolute path or relative path that can be found on sys.path." and "File must be utf-8 encoded or locale encoded or a Python file with a coding cookie." (If you tried /etc/passwd, how did it fail?) ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Changes by Éric Araujo <merwok@netwok.org>: ---------- nosy: +eric.araujo _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Changes by Terry J. Reedy <tjreedy@udel.edu>: ---------- versions: -Python 3.1 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
STINNER Victor added the comment: Well, my initial message doesn't convince me anymore today (especially after reading Terry's message), so I prefer to close the issue as rejected. I don't think that it's really a problem :-) ---------- resolution: -> rejected status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Thomas Kluyver added the comment: Someone on reddit ran into this, expecting that linecache can be used for an arbitrary text file: http://www.reddit.com/r/Python/comments/2yetxc/utf8_encoding_problems/ I was quite surprised that the docs say "allows one to get any line from any file." I've always understood that linecache is specifically for Python files, and the use of tokenize.open() means that it will only work for files that are UTF-8 or have the #coding: magic comment in the first two lines. I think the docs should at least mention this; I'm happy to work on a patch for it at some point if people agree. ---------- nosy: +takluyver _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
R. David Murray added the comment: Sure, clarifying the docs seems sensible. "Any file" is slightly different from the reality. ---------- nosy: +r.david.murray resolution: rejected -> stage: -> needs patch status: closed -> open title: linecache becomes specific to Python scripts in Python 3 -> clarify that linecache only works on files that can be decoded successfully type: -> behavior versions: +Python 3.4, Python 3.5 -Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Thomas Kluyver added the comment: First attempt at describing this attached. ---------- keywords: +patch Added file: http://bugs.python.org/file38424/linecache-encoding-doc.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Thomas Kluyver added the comment: Anything else I should be doing here? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Roundup Robot added the comment: New changeset 51341af466e3 by Victor Stinner in branch '3.4': Issue #11726: clarify linecache doc: linecache is written to cache Python https://hg.python.org/cpython/rev/51341af466e3 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Roundup Robot added the comment: New changeset 01cb2107cbc3 by Victor Stinner in branch '3.4': Issue #11726: Fix linecache example in the doc https://hg.python.org/cpython/rev/01cb2107cbc3 ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
STINNER Victor added the comment: 4 years to fix this minor documentation issue, I feel ashamed... ---------- resolution: -> fixed status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
R. David Murray added the comment: I think that that patch that Victor committed is incorrect, and that Thomas's patch is closer to correct. People *do* use linecache with files other than python source files, and as far as I can see we are not going to stop supporting that. Given the original docs the intent clearly was that the interface be general, not python-file-specific. ---------- status: closed -> open _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
R. David Murray added the comment: OK, on further investigation I guess it wasn't intended to be so general :) But I still think we should make a nod to the reality that it can be used on other text files. I'll re-close the issue but I may add a sentence to the docs. ---------- status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Roundup Robot added the comment: New changeset ceb14ecc1942 by R David Murray in branch '3.4': #11726: Make linecache docs reflect that all files are treated the same. https://hg.python.org/cpython/rev/ceb14ecc1942 New changeset 1a5c72f9ff53 by R David Murray in branch 'default': Merge: #11726: Make linecache docs reflect that all files are treated the same. https://hg.python.org/cpython/rev/1a5c72f9ff53 ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
Antti Haapala added the comment: Every now and then there are new questions and answers regarding the use of `linecache` module on Stack Overflow for doing random access to text files, even though the documentation states that it is meant for Python source code files. One problem is that the title still states: "11.9. linecache — Random access to text lines"; the title should really be changed to "Random access to Python source code lines" so that the title wouldn't imply that this is a general-purpose random access library for text files. ---------- nosy: +ztane _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11726> _______________________________________
participants (8)
-
Antti Haapala
-
R. David Murray
-
Raymond Hettinger
-
Roundup Robot
-
STINNER Victor
-
Terry J. Reedy
-
Thomas Kluyver
-
Éric Araujo