Setting aside the question of whether it can easily be determined whether a given string is a literal or not (I don't know, but would be interested in knowing the answer)... Strings are immutable, so an attacker can't "change" the string as you suggest. A variable can point to a new string, that string could be also be a string literal, and could be malformed/malicious. So I'm not getting what the benefit would be here. If I want to dynamically assemble a malicious string literal, all I would have to do is generate code that can be evaluated to produce a string literal.
eval("".join(['"', "evil", " ", "string", " ", "literal", '"'])) 'evil string literal'
On Wed, 2021-08-04 at 12:52 -0700, S Pradeep Kumar wrote:
Literal Types ([PEP 586](https://www.python.org/dev/peps/pep-0586/)) allow us to type a specific literal string like `x: Literal[“foo”] = “foo”`. This is useful when we know exactly which string or set of strings we want to accept. However, I’ve run into use cases where we'd want to accept *any* literal string such as “foo”, “bar”, etc.
For example, we might have a custom format string function. For security reasons, we would want the typechecker to enforce that the format string is a literal, not an arbitrary string. Otherwise, an attacker could read or write arbitrary data by changing the format string (the so-called “format string attack” [1]):
``` def my_format_string(s: str, *args: FormatArgument) -> str: …
my_format_string(“hello: %A”, a) # OK my_format_string(user_controlled_string, a) # BAD ```
Likewise, if we have a custom shell execution command like `my_execute`, we might want to enforce that the command name is a literal, not an arbitrary string. Otherwise, an attacker might be able to insert arbitrary shell code in the string and execute it:
``` def my_execute(command: str, *args: str) -> None: ...
my_execute("ls", file1, file2) # OK
command = input() my_execute(command, file1, file2) # BAD ```
There is no way to specify the above in the current type system.
# Proposal
We can allow `Literal[str]`, which would represent *any* literal string:
``` from typing import Literal
def my_format_string(s: Literal[str], *args: FormatArgument) -> str: …
my_format_string(“hello: %A: %B”, a, b) # OK because it is a literal string.
my_format_string(user_controlled_string, sensitive_data) # Type error: Expected Literal[str], got str. ```
The same goes for the shell command function:
``` def my_execute(command: Literal[str], *args: CommandArgument) -> None: …
my_execute(“ls”, files) # OK my_execute(arbitrary_string, files) # Type error: Expected Literal[str], got str. ```
Other usage will work as expected:
``` from typing import Literal, TypeVar
# Type variable that accepts only literal strings. TLiteral = TypeVar("TLiteral", bound=Literal[str])
def identity(s: TLiteral) -> TLiteral: ...
y = identity("hello") reveal_type(y) # Literal[“hello”]
s: Literal[str] y2 = identity(s) reveal_type(y2) # Literal[str]
literal_string: Literal[str] s: str = literal_string # OK literal_string: Literal[str] = s # Type error
literal_string: Literal[str] = “hello” # OK
x = “hello” literal_string: Literal[str] = x # OK ```
## Backward compatibility
**Backward compatibility**: `Literal[str]` is acceptable at runtime, so this doesn’t require any changes to Python itself.
**Reference Implementation**: This was quite easy to implement and is available in Pyre v0.9.3.
**Rejected alternatives**:
`T = TypeVar(“T”, bound=Literal[Any])` isn’t something allowed in PEP 586 and would anyway be too broad. It would also allow literal bools to be passed in when we want only literal strings.
## Other uses for Literal[str]
Other places where it might be useful to statically enforce literal strings for safety and readability:
``` # struct
struct.unpack("
# datetime datetime.now().strftime('%B %d, %Y - %X')
# builtins open(path, encoding='utf-8') my_string.encode('latin-1')
# PyTorch self.register_buffer("weight", torch.zeros(a, b))
# argparse parser.add_argument("--my-flag", action="store_true") # argparse ```
The same idea would apply to `Literal[int]` and `Literal[bool]`, but I don’t have compelling use cases for them yet. I suspect `Literal[int]` will be useful for Tensor types in the future. Others might have run into use cases in the wild.
Thoughts? Opinions?
-- S Pradeep Kumar
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: pbryan@anode.ca