Setting aside the question of whether it can easily be determined whether a given string is a literal or not (I don't know, but would be interested in knowing the answer)...

Strings are immutable, so an attacker can't "change" the string as you suggest. A variable can point to a new string, that string could be also be a string literal, and could be malformed/malicious. So I'm not getting what the benefit would be here.

If I want to dynamically assemble a malicious string literal, all I would have to do is generate code that can be evaluated to produce a string literal.

>>> eval("".join(['"', "evil", " ", "string", " ", "literal", '"']))
'evil string literal'


On Wed, 2021-08-04 at 12:52 -0700, S Pradeep Kumar wrote:
Literal Types ([PEP 586](https://www.python.org/dev/peps/pep-0586/)) allow us to type a specific literal string like `x: Literal[“foo”] = “foo”`. This is useful when we know exactly which string or set of strings we want to accept. However, I’ve run into use cases where we'd want to accept *any* literal string such as “foo”, “bar”, etc.

For example, we might have a custom format string function. For security reasons, we would want the typechecker to enforce that the format string is a literal, not an arbitrary string. Otherwise, an attacker could read or write arbitrary data by changing the format string (the so-called “format string attack” [1]):

```
def my_format_string(s: str, *args: FormatArgument) -> str: …

my_format_string(“hello: %A”, a) # OK
my_format_string(user_controlled_string, a)  # BAD
```

Likewise, if we have a custom shell execution command like `my_execute`, we might want to enforce that the command name is a literal, not an arbitrary string. Otherwise, an attacker might be able to insert arbitrary shell code in the string and execute it:

```
def my_execute(command: str, *args: str) -> None: ...

my_execute("ls", file1, file2) # OK

command = input()
my_execute(command, file1, file2) # BAD
```

There is no way to specify the above in the current type system.

# Proposal

We can allow `Literal[str]`, which would represent *any* literal string:

```
from typing import Literal

def my_format_string(s: Literal[str], *args: FormatArgument) -> str: …

my_format_string(“hello: %A: %B”, a, b)  # OK because it is a literal string.

my_format_string(user_controlled_string, sensitive_data)  
# Type error: Expected Literal[str], got str.
```

The same goes for the shell command function:

```
def my_execute(command: Literal[str], *args: CommandArgument) -> None: …

my_execute(“ls”, files) # OK
my_execute(arbitrary_string, files) # Type error: Expected Literal[str], got str.
```

Other usage will work as expected:

```
from typing import Literal, TypeVar

# Type variable that accepts only literal strings.
TLiteral = TypeVar("TLiteral", bound=Literal[str])

def identity(s: TLiteral) -> TLiteral: ...

y = identity("hello")
reveal_type(y) # Literal[“hello”]

s: Literal[str]
y2 = identity(s)
reveal_type(y2) # Literal[str]

literal_string: Literal[str]
s: str = literal_string # OK
literal_string: Literal[str] = s # Type error

literal_string: Literal[str] = “hello” # OK

x = “hello”
literal_string: Literal[str] = x # OK
```

## Backward compatibility

**Backward compatibility**: `Literal[str]` is acceptable at runtime, so this doesn’t require any changes to Python itself.

**Reference Implementation**: This was quite easy to implement and is available in Pyre v0.9.3.

**Rejected alternatives**:

`T = TypeVar(“T”, bound=Literal[Any])` isn’t something allowed in PEP 586 and would anyway be too broad. It would also allow literal bools to be passed in when we want only literal strings.

## Other uses for Literal[str]

Other places where it might be useful to statically enforce literal strings for safety and readability:

```
# struct

struct.unpack("<I", self.read(n))

# datetime
datetime.now().strftime('%B %d, %Y - %X')

# builtins
open(path, encoding='utf-8')
my_string.encode('latin-1')

# PyTorch
self.register_buffer("weight", torch.zeros(a, b))

# argparse
parser.add_argument("--my-flag", action="store_true") # argparse
```

The same idea would apply to `Literal[int]` and `Literal[bool]`, but I don’t have compelling use cases for them yet. I suspect `Literal[int]` will be useful for Tensor types in the future. Others might have run into use cases in the wild.

Thoughts? Opinions?

[1]: [https://owasp.org/www-community/attacks/Format_string_attack](https://owasp.org/www-community/attacks/Format_string_attack)
--
S Pradeep Kumar

_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: pbryan@anode.ca