On 5/11/20, Oleg Broytman
On Mon, May 11, 2020 at 09:12:52PM -0000, Steve Jorgensen
wrote: When the platform is Windows, certainly, "<letter>:" should not be allowed, and perhaps colon should not be allowed at all.
The meaning of "<letter>:name" is context dependent. If it occurs at the beginning of a path, it's relative to the working directory on drive "<letter>:", which defaults to the root directory on the drive. For example, if the working directory on drive "X:" is "X:\spam\eggs", then "X:foo" resolves to "X:\spam\eggs\foo". "X:foo" in this context is not a valid component name; it's actually a filepath. Otherwise "<letter>:" is part of an NTFS or ReFS stream path, where ":" is the stream delimiter. To be valid, it needs to be followed by either the name of the stream or the name plus the type, e.g. "filename:streamname" or "filename:streamname:streamtype". Should file streams be supported? More on File Streams An open or create will fail as an invalid filename if it uses invalid stream syntax or references a stream type that's unknown, or if the filesystem doesn't support streams and disallows colon in filenames (e.g. FAT32). The stream name can be empty to indicate an anonymous or default stream, but only if the stream type is specified. For example, in NTFS "filename::$DATA" is the anonymous data stream in a file named "filename". For a regular data file, it's the same as just accessing "filename". A directory can have named data streams, but it cannot have an anonymous data stream. The default stream in a directory is an index stream named "$I30". The following are equivalent names for a directory in NTFS: "dirname", "dirname::$INDEX_ALLOCATION", and "dirname:$I30:$INDEX_ALLOCATION". But "dirname:$I30" doesn't work because the default stream type is $DATA. To access a stream in a single-letter filename relative to the current directory, the current directory has to be referenced explicitly via the "." component. For example, "./C:spam" is a stream named "spam" in a file named "C" that's in the current working directory, but "C:spam" is a file named "spam" in the working directory on drive "C:".
Forbidden characters:
chr(0) < > : " / \ | ? *
characters in range from chr(1) through chr(31),
See the above discussion regarding ":". An NTFS stream name can include any character except for nul (0), colon, backslash, and slash. The characters *?"<> are the 5 wildcards characters that almost all NT filesystems disallow in filenames. These are important to disallow because the filesystem driver (in the kernel) is expected to support filtering a directory listing with a wildcard pattern. NT's * and ? wildcards have Unix shell semantics. The other three are DOS_DOT ("), DOS_STAR (<), and DOS_QM (>), which help to emulate MS-DOS behavior. The vertical bar or pipe (|) has no significance in filepaths, but it's a special shell character that's usually disallowed in filenames. Control characters 1-31 usually are also disallowed. That said, some non-Microsoft filesystems may allow these characters. For example, the VirtualBox shared-folder filesystem allows pipe and control characters in filenames.
a space or a period at the end of file/directory name.
Trailing spaces and dots are stripped from the final path component in almost all contexts. Except "\\?\" device paths are never normalized in an open or create context. For example, creating "\\?\C:\Temp\spam. . . " will name the file "spam. . . " instead of the normal name "spam". The name "spam. . . " will appear in the directory listing, but opening it will require using a "\\?\" device path.
Forbidden file names (with any extensions):
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
In an attempt to replicate how MS-DOS implemented devices, Windows reserves DOS device names such as "NUL" in the final component of DOS drive-letter paths and relative paths. They are not reserved in the final component of UNC and device paths, though a server may disallow them by policy, as Microsoft's SMB server does. Matching the device name ignores everything after a trailing colon or dot that follows the name with 0 or more intervening spaces. This is more than ignoring an extension, which is typically taken as the characters following the last dot in a filename. "CONIN$" and "CONOUT$" are mistakenly excluded from the documented list of reserved DOS device names. Windows has always reserved them as unqualified relative names in a create/open context. Starting with Windows 8, they're reserved exactly the same as the classic DOS device names. Examples with trailing dots and spaces: >>> os.getcwd() 'C:\\' >>> nt._getfullpathname('spam. . . ') 'C:\\spam' >>> nt._getfullpathname('foo/spam. . . ') 'C:\\foo\\spam' DOS devices: >>> nt._getfullpathname('conin$:spam.eggs') '\\\\.\\conin$' >>> nt._getfullpathname('foo/conin$ .spam.eggs') '\\\\.\\conin$' Non-final component: >>> nt._getfullpathname('spam. . . /foo') 'C:\\spam. . . \\foo'
nt._getfullpathname('conin$/foo') 'C:\\conin$\\foo'
UNC and device paths: >>> nt._getfullpathname('//server/share/conin$') '\\\\server\\share\\conin$' >>> nt._getfullpathname('//./C:/conin$') '\\\\.\\C:\\conin$'