New features in Python 3.11 related to the type system.

PEP 646 - Variadic Generics

Before introducing this PEP, we need to make up some knowledge, so let’s go deeper and deeper to understand the Generics first.

A generic type is a feature that does not specify a specific type in advance when defining a function or class, but specifies the type at the time of use.

For dynamic languages like Python, since everything is an object reference, you can determine the type directly at the time of use.

In : def say_type(obj):
...:     match obj:
...:         case int():
...:             print('int')
...:         case str():
...:             print('str')
...:         case float():
...:             print('float')
...:         case _:
...:             print('other type')
...:

In : say_type(1)
int

In : say_type('ss')
str

In : say_type(1.1)
float

In : say_type([])
other type

But when it comes to type checking, there are generic issues involved, and problems can be found before they run. As an example.

U = int | str


def max_1(a: U, b: U) -> U:
    return max(a, b)


max_1("foo", 1)
max_1(1, "foo")
max_1("foo", "bar")
max_1(1, 2)

In this example, the parameters can be either numbers or strings. But it is obvious that max_1("foo", 1) and max_1(1, "foo") throw errors when they run because the types are different. But mypy doesn’t find that.

In Python’s type system, generic type variables should use TypeVar, and the problem is exposed.

from typing import TypeVar

T = TypeVar("T", int, str)  # Define type variables


def max_2(a: T, b: T) -> T:  # Generic functions
    return max(a, b)


max_2("foo", 1)
max_2(1, "foo")
max_2("foo", "bar")
max_2(1, 2)

The T or U in the previous 2 examples are type variables, which I understand are also type aliases that can be reused (and also easier to express complex structures), since the parameters and return values are of the same type so they are directly replaced by them. TypeVar can be bound to a type by a bound argument, and can be made to support int and str as I wrote above. I’d still recommend that you take a little look at the official website to see how TypeVar is used, as it’s really important in generics.

And such variables can also be used as elements in a container, as an example:

def max_3(items: list[T]) -> T:
    return max(*items)


max_3([1, 2, 3])  # OK
max_3([1, 2, '3'])  # Rejected

In common cases, combining the corresponding types by way of Union allows mypy to understand that there are multiple types of arguments and return values in a program, as in the above example items is a list of elements that are strings or numbers. Python’s built-in collection types (collections.abc.Collection) can support elements of various types because they are generic classes . And in the real world, we necessarily define various classes in our development, and sometimes we need to make custom classes support generics.

Think about the previous generic type definition. Instead of specifying a specific type in advance, the type is specified at the time of use. We define a class using typing.Generic:

from typing import Generic


K = TypeVar("K", int, str)
V = TypeVar("V")


class Item(Generic[K, V]):  # Item is a generic class that identifies 2 of the types
    key: K
    value: V
    def __init__(self, k: K, v: V):
        self.key = k
        self.value = v


i = Item(1, 'a')  # OK Item is a generic class, so any type value that meets the requirements can be used as a parameter
i2 = Item[int, str](1, 'a')  #  OK Explicitly specify the type of K, V of Item
i3 = Item[int, int](1, 2)  #  OK Explicitly specified as a different type
i4 = Item[int, int](1, 'a')  # Rejected Because the parameters passed in are different from the specified type V

Okay, with the above set up, let’s get to the point.

As the title of the PEP says, it is about variable number of generic functions. While the TypeVar introduced before was a single generic type, this time we introduce a generic type TypeVarTuple with an indeterminate number.

Let’s look at an example to understand it.

from typing import TypeVarTuple


K2 = TypeVar("K2", int, str)
V2 = TypeVarTuple("V2")


class Item2(Generic[K2, *V2]):
    def __init__(self, k: K2, *v: *V2):
        self.key = k
        self.values = v


d = Item2(1, 2, '3', {'d': 4})
d = Item2(1, 2, 3, 4)
d = Item2(1, {}, set(), [])
d = Item2('1', {}, set(), [])
d = Item2('1', {})

In this example, the attributes key and values of Dict are both generic. That is, key can also be int or str, and values is non-fixed length, and since no type is specified when using TypeVarTuple, all types can be used. And because of the introduction of TypeVarTuple, the flexibility of type checking can be improved a lot.

This new feature is not currently supported by mypy, and running mypy now throws an exception.
1
"TypeVarTuple" is not supported by mypy yet

PEP 673 - Self Type

Self is, as the name implies, an assertion of itself. Take 2 examples to see the common usage in the past.

class Result:
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return f'{self.__class__.__name__}(value={self.value})'

    def add_value(self, value: int) -> 'Result':
        self.value += value
        return self

    @classmethod
    def get(cls, value) -> 'Result':
        return cls(value)


class NewResult(Result):
    ...


r = NewResult(10)
print(r.add_value(5))
print(NewResult.get(20))


class Node:
    def __init__(self, data):
        self.data = data
        self.next: 'Node'|None = None
        self.previous: 'Node'|None = None


node = Node(10)
node.next = Node(20)
node.previous = Node(5)

There are three places in this example where the return value refers to itself by a type string:

declaring the instance method add_value returns a Result instance
declare the class method get to return a Result instance
declaring self.next and self.previous as None or Node instances when initializing a Node

You can’t write the class name directly here, because the class is not yet created when it is declared, which mypy understands, but it will run with a NameError: name 'XXX' is not defined error.

In addition to using string definitions, there are three other methods, which I’ll briefly mention here without going into detail (because Python 3.11 solves this problem more perfectly).

use ForwardRef for Python 3.8 and up.
import from __future__ import annotations (enabled by default starting with Python 3.10)
use TResult = TypeVar ("TResult", bound="Result") to bind to a TypeVar.

But all these methods have a very mechanical problem, which is the support and representation of the inherited classes. For example, in the string definition above, NewResult inherits Result and also inherits the method annotation, which means that the return value of a method like NewResult.get is actually an instance of Result. Of course, the logic of isinstance is essentially fine, but it doesn’t really express self. This PEP 673 provides Self, which is the best solution.

from typing import Self

class Result:
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return f'{self.__class__.__name__}(value={self.value})'

    def add_value(self, value: int) -> Self:
        self.value += value
        return self

    @classmethod
    def get(cls, value) -> Self:
        return cls(value)


class NewResult(Result):
    ...


class Node:
    def __init__(self, data):
        self.data = data
        self.next: Self|None = None
        self.previous: Self|None = None

This new feature is not currently supported by mypy, and running mypy now throws an exception.
1
2
error: Variable "typing.Self" is not valid as a type
note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases

PEP 675 - Arbitrary Literal String Type

Before we talk about this LiteralString, it will be easier to understand Literal, which was introduced in Python 3.8. Literal means literal, and common strings, numbers, lists, dictionaries, boolean values, and so on can be used as literal values. The meaning of typing.Literal is that it can accept literal values corresponding to the listed values.

from typing import Literal

def accepts_only_four(x: Literal[4]) -> None:
    pass

accepts_only_four(4)   # OK
accepts_only_four(19)  # Rejected
accepts_only_four(2 + 2)  # Rejected

Literal[4] means that only the value of the argument is accepted as 4, so the second passed in 19 doesn’t work, and the third 2 + 2 results in 4, but in fact it doesn’t work either, because the so-called literal means to give me the value directly and explicitly, not by calculation. For this place, it only determines that 4 and 2+2 are not the same in the ’literal’, and it Rejected it. This paragraph is very important, understand more understanding.

Back to the point, the motivation for this PEP comes from providing a more intuitive and general solution to the SQL injection problem. Let’s look at the example provided in the PEP.

def query_user(conn: Connection, user_id: str) -> User:
    query = f"SELECT * FROM data WHERE user_id = {user_id}"
    conn.execute(query)

query_user(conn, "user123")  # OK.

# Delete the table.
query_user(conn, "user123; DROP TABLE data;")

# Fetch all users (since 1 = 1 is always true).
query_user(conn, "user123 OR 1 = 1")

Under normal circumstances, user_id is a conforming string, but since user_id may be obtained from an external parameter with an unreliable source, there is a security risk that the ‘spliced SQL statement’ may be used for some additional purpose. As an example.

`1`	`query_user(conn, input())`

Here I’m using the input function to indicate that the user_id is being passed in from an external source. If I follow the old str declaration I don’t see the problem, but using the new LiteralString will cause the statement to fail.

from typing import LiteralString

def query_user(conn: Connection, user_id: LiteralString) -> User:
    query = f"SELECT * FROM data WHERE user_id = {user_id}"
    conn.execute(query)

query_user(conn, input())  # Rejected

Because user_id is not directly passed in as a string, but is calculated by input. LiteralString is like the title of PEP, it can represent any string literal value, not like the previous typing.Literal, which can only specify a few definite values, which is too inflexible.

Also if the string is spliced, all parts need to be literal.

def execute_sql(query: LiteralString):
    execute_sql("SELECT * FROM " + user_input)


user_input = input()
execute_sql("SELECT * FROM " + user_input)  # Rejected
execute_sql(f"SELECT * FROM {user_input}")  # Rejected

The above 2 examples will also be Rejected because the second half of user_input is not a literal value.

I think this feature is mainly for f-string, after all, input usage is rare.

PS: mypy doesn’t support this new feature yet, so it doesn’t throw errors where there are problems.

PEP 681 - Data Class Transforms

Currently type checking is well supported for all packages within the standard library, including dataclasses, and this PEP implements a scheme to automatically transform the type checking of common classes with behaviors similar to those of the standard library dataclasses. These behaviors include:

the __init__ method synthesized from the declared data fields.
optional synthesis of __eq__, __ne__, __lt__, __le__, __gt__ and __ge__ methods.
support frozen parameter, static type checking will confirm the immutability of the field.
Support [field descriptor], static type checking will understand the properties of each field, such as whether the field provides a default value.

Before the implementation of this PEP, when you use related libraries in your project, such as attrs, pydantic, various ORMs (e.g. SQLAlchemy, Django, etc.), then these libraries need to provide corresponding type annotations during static type checking, otherwise you have to write them yourself or find a way to ignore the related checking. The PEP is designed to reduce this cost by making it easy to support type checking at the decorator, class, and metaclass levels without writing additional annotations through dataclass_transform.

Personally, I think this PEP is mainly intended to help library authors, unless they build their own wheels in the project with behaviors similar to those of the dataclasses library, so it has less impact on developers to begin with.

As an example, it may be better understood. Personally, I prefer to use attrs, which is how I define Model in my project (greatly simplified for the sake of example).

import attr


@attr.define()
class Model:
    id: int
    title: str



Model(1, 2)  # Rejected

I didn’t define the __init__ method, but when I use attrs, it helps me to create a series of corresponding methods automatically. When using Model(1, 2) it should not pass the type check (because the title should be a string and I passed in int)

Then install a version of attrs that doesn’t support this feature and run pyright (another static checking tool, mypy doesn’t support this PEP yet) to try it out.

➜ pip install attrs==20.3.0
➜ pip install pyright
➜ pyright pep681.py
...
pyright 1.1.276
/home/ubuntu/mp/2022-10-23/pep681.py
  /home/ubuntu/mp/2022-10-23/pep681.py:11:1 - error: Expected no arguments to "Model" constructor (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

pyright is particularly silly in that it thinks that the constructor __init__ is not defined in this class. At this point attrs does not yet support the corresponding type annotations. For those interested, see the corresponding PR: Implement pyright support via dataclass_transforms

Later, pyright will understand the above usage of attrs:

➜ pip install attrs==22.1.0
➜ pyright pep681.py
...
pyright 1.1.276
/home/ubuntu/mp/2022-10-23/pep681.py
  /home/ubuntu/mp/2022-10-23/pep681.py:11:10 - error: Argument of type "Literal[2]" cannot be assigned to parameter "title" of type "str" in function "__init__"
    "Literal[2]" is incompatible with "str" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

You can see that this error above is correct.

PEP 655 - Marking individual TypedDict items as required or potentially-missing

TypedDict is a very useful type added as of Python 3.8, so let’s get it out of the way. It’s common to define a complex dictionary type in everyday development, and if you want mypy to do validation on this dictionary key-value type, you’ll probably need this.

def get_summary() -> dict[str, int|str|list[str]]:
    return {
        'total': 100,
        'title': 'test',
        'items': ['1', '2']
    }

I’ll try to be as specific as possible in the above example, but since this dictionary has so many types of values, I have to use the Union method to string them together. But it is not clear enough in mypy. For example, this logic is executed.

summary = get_summary()
total = summary['total']
items = summary['items']
print(total / len(items))

mypy will throw an error:

pep655.py:12: error: Unsupported operand types for / ("str" and "int")
pep655.py:12: error: Unsupported operand types for / ("List[str]" and "int")
pep655.py:12: note: Left operand is of type "Union[int, str, List[str]]"
pep655.py:12: error: Argument 1 to "len" has incompatible type "Union[int, str, List[str]]"; expected "Sized"

So it is often impossible to specify the type of the return value. And TypedDict is the solution to this problem.

from typing import TypedDict


class Summary(TypedDict):
    total: int
    title: str
    items: list[str]



def get_summary() -> Summary:
    return {
        'total': 100,
        'title': 'test',
        'items': ['1', '2']
    }


summary = get_summary()
total = summary['total']
items = summary['items']
print(total / len(items))

TypedDict specifies the type of each key value in a form similar to dataclass. This helps mypy to better understand the structure of the returned values, and thus to determine more type issues in the logic.

1
2

x = summary['x']  # TypedDict "Summary" has no key "x"
summary['total'] = 'total'  # Value of "total" has incompatible type "str"; expected "int"

But until Python 3.11, it was implemented with extreme requirements for defined keys, either they all needed to be present, or it didn’t care which key was missing.

class Summary(TypedDict):
    total: int
    title: str
    items: list[str]

s: Summary = {'total': 10}  # Missing keys ("title", "items") for TypedDict "Summary"


class Summary2(TypedDict, total=False):  # Using total=False will make the type check not focus on missing keys
    total: int
    title: str
    items: list[str]


s2: Summary2 = {'total': 10}  # OK
s3: Summary2 = {}  # OK

In the past, in order to separate whether different keys are an Optional or not, the only way is to use inheritance.

class Summary3(TypedDict):
    total: int
    title: str


class Summary4(Summary3, total=False):
    items: list[str]


s4: Summary4 = {}  # Missing keys ("total", "title") for TypedDict "Summary4"
s5: Summary4 = {'total': 10, 'title': 'Title'}  # OK It doesn't matter if items are missing

PEP 655 makes the definition of a key in a TypedDict explicitly dependent or not by introducing Required[] and NotRequired[].

from typing import Required, NotRequired


class Summary5(TypedDict):
    total: Required[int]  # Make it clear that total is mandatory
    title: str  # The default is Required
    items: NotRequired[list[str]]  # Make it clear that items are optional


s6: Summary5 = {}  # Missing keys ("total", "title") for TypedDict "Summary5"
s7: Summary4 = {'total': 10, 'title': 'Title'} # OK Achieve the same effect as above

Except for PEP 655, mypy does not support it yet.

Table of Contents

PEP 646 - Variadic Generics

PEP 673 - Self Type

PEP 675 - Arbitrary Literal String Type

PEP 681 - Data Class Transforms

PEP 655 - Marking individual TypedDict items as required or potentially-missing