EOF while scanning triple-quoted string literal

Chris Torek nospam at torek.net
Fri Oct 15 15:49:33 EDT 2010


>> On 2010-10-15, Grant Edwards <invalid at invalid.invalid> wrote:
>>> How do you create a [Unix] file with a name that contains a NULL byte?

>On 2010-10-15, Seebs <usenet-nospam at seebs.net> wrote:
>> So far as I know, in canonical Unix, you don't -- the syscalls all work
>> with something like C strings under the hood, meaning that no matter what
>> path name you send, the first null byte actually terminates it.

In article <i9a84m$rp9$1 at reader1.panix.com>
Grant Edwards  <invalid at invalid.invalid> wrote:
>Yes, all of the Unix syscalls use NULL-terminated path parameters (AKA
>"C strings").  What I don't know is whether the underlying filesystem
>code also uses NULL-terminated strings for filenames or if they have
>explicit lengths.  If the latter, there might be some way to bypass
>the normal Unix syscalls and actually create a file with a NULL in its
>name -- a file that then couldn't be accessed via the normal Unix
>system calls.  My _guess_ is that the underlying filesystem code in
>most all Unices also uses NULL-terminated strings, but I haven't
>looked yet.

Multiple common on-disk formats (BSD's UFS variants and Linux's
EXTs, for instance) use counted strings, so it is possible -- via
disk corruption or similar -- to get "impossible" file names (those
containing either an embedded NUL or an embedded '/').

More notoriously, earlier versions of NFS could create files with
embedded slashes when serving non-Unix clients.  These were easily
removed with the same non-Unix client, but not on the server! :-)

None of this has anything to do with the original problem, in which
a triple-quoted string is left to contain arbitrary binary data
(up to, of course, the closing triple-quote).  Should that arbitrary
binary data itself happen to include a triple-quote, this trivial
encoding technique will fail.  (And of course, as others have noted,
it fails on some systems that distinguish betwen text and binary
file formats in the first place.)  This is why using some
"text-friendly" encoding scheme, such as base64, is a good idea.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list