Understanding descriptors in Python

Descriptors are an advanced concept in Python and the basis for many of Python’s internal mechanisms, so this article will go into them in some depth.

Definition of a descriptor

The definition of a descriptor is simple; a Python object that implements any of the following methods is a descriptor.

__get__(self, obj, type=None)
__set__(self, obj, value)
__delete__(self, obj)

The meanings of the parameters of these methods are as follows.

self is the currently defined descriptor object instance.
obj is the instance of the object that the descriptor will act on.
type is the type of the object on which the descriptor acts (i.e., the class it belongs to).

The above methods are also known as descriptor protocols, and Python will call a method at a specific time with parameters passed in according to the protocol; if we don’t define the method with the agreed-upon parameters, the call may go wrong.

The role of descriptors

Descriptors can be used to control the behavior of access to properties, enabling functions such as calculating properties, lazy loading of properties, property access control, etc. Let’s start with a simple example.

class Descriptor:

    def __get__(self, instance, owner):
        if instance is None:
            print('__get__(): Accessing x from the class', owner)
            return self
        
        print('__get__(): Accessing x from the object', instance)
        return 'X from descriptor'

    def __set__(self, instance, value):
        print('__set__(): Setting x on the object', instance)
        instance.__dict__['_x'] = value

class Foo:
    x = Descriptor()

In the example we created an instance of the descriptor and assigned it to the x property variable of the Foo class. Now when you visit Foo.x, you’ll see that Python automatically calls the __get__() method of the descriptor instance to which the property is bound.

1
2
3

>>> print(Foo.x)
__get__(): Accessing x from the class <class '__main__.Foo'>
<__main__.Descriptor object at 0x106e138e0>

Next, instantiate an object foo and access the x property through the foo object.

>>> foo = Foo()
>>> print(foo.x)
__get__(): Accessing x from the object <__main__.Foo object at 0x105dc9340>
X from descriptor

The corresponding methods defined by the descriptor are also executed.

If we try to assign an x to a foo object, the __set__() method of the descriptor is also called.

>>> foo.x = 1
__set__(): Setting x on the object <__main__.Foo object at 0x105dc9340>
>>> print(foo.x)
__get__(): Accessing x from the object <__main__.Foo object at 0x105dc9340>
X from descriptor
>>> print(foo.__dict__)
{'_x': 1}

Similarly, if we define the __delete__() method in the descriptor, this method will be called when del foo.x is executed.

The descriptor will be called during the property lookup by the . dot operator during property lookup, and is only valid when used as a class variable.

If assigned directly to an instance property, the descriptor will not take effect.

1
2
3

>>> foo.__dict__['y'] = Descriptor()
>>> print(foo.y)
<__main__.Descriptor object at 0x100f0d130>

If the descriptor is accessed indirectly with some_class.__dict__[descriptor_name], the protocol method of the descriptor is also not called, but the descriptor instance itself is returned.

1
2

print(Foo.__dict__['x'])
<__main__.Descriptor object at 0x10b66d8e0>

Types of descriptors

Descriptors can be further divided into two categories depending on the implemented protocol methods.

If either __set__() or __delete__() method is implemented, the descriptor is a data descriptor (data descriptor).
If only the __get__() method is implemented, the descriptor is a non-data descriptor.

There are differences in the presentation behavior of the two.

A data descriptor always overwrites the attributes in the instance dictionary __dict__.
while non-data descriptors may be overridden by attributes defined in the instance dictionary __dict__.

In the example above we have shown the effect of a data descriptor, next we remove the __set__() method to implement a non-data descriptor.

class NonDataDescriptor:

    def __get__(self, instance, owner):
        if instance is None:
            print('__get__(): Accessing y from the class', owner)
            return self

        print('__get__(): Accessing y from the object', instance)
        return 'Y from non-data descriptor'

class Bar:
    y = NonDataDescriptor()

bar = Bar()

When bar.__dict__ does not have an attribute with key y, accessing bar.y and foo.x will have the same behavior.

1
2

>>> print(bar.y)
Y from non-data descriptor

But if we modify the __dict__ of the bar object directly by adding the y property to it, the object property will override the y descriptor defined in the Bar class, and accessing bar.y will no longer call the __get__() method of the descriptor.

1
2
3

>>> bar.__dict__['y'] = 2
>>> print(bar.y)
2

In the data descriptor example above, access to the x property is always controlled by the descriptor, even if we modify foo.__dict__.

1
2
3

>>> foo.__dict__['x'] = 1
>>> print(foo.x)
__get__(): Accessing x from the object <__main__.Foo object at 0x102b40340>

In the following we will describe how these two differences are implemented.

Descriptor implementation

The key to descriptor-controlled property access is what happens between the execution of foo.x and the time the __get()__ method is called.

How object properties are stored

In general, object properties are saved in the __dict__ attribute.

According to the Python documentation, object.__dict__ is a dictionary or other mapped type object that stores the (writable) properties of an object.
Most custom objects, with the exception of some built-in Python objects, will have a __dict__ property.
This attribute contains all the properties defined for that object, and __dict__ is also known as a mappingproxy object.

Let’s continue from the previous example.

>>> print(foo.__dict__)
{'_x': 1}
>>> foo.x
1

When we access foo.x, how does Python determine whether to call the descriptor method or get the corresponding value from __dict__? One of the key roles is played by the . is the dot operator.

How object properties are accessed

The lookup logic for the dot operator is located in the object.__getattribute__() method, which is called on the object every time the dot operator is executed on the object. cPython implements this method in C. Let’s look at its equivalent Python version.

def object_getattribute(obj, name):
    "Emulate PyObject_GenericGetAttr() in Objects/object.c"
    null = object()
    objtype = type(obj)
    cls_var = getattr(objtype, name, null)
    descr_get = getattr(type(cls_var), '__get__', null)
    if descr_get is not null:
        if (hasattr(type(cls_var), '__set__')
            or hasattr(type(cls_var), '__delete__')):
            return descr_get(cls_var, obj, objtype)     # data descriptor
    if hasattr(obj, '__dict__') and name in vars(obj):
        return vars(obj)[name]                          # instance variable
    if descr_get is not null:
        return descr_get(cls_var, obj, objtype)         # non-data descriptor
    if cls_var is not null:
        return cls_var                                  # class variable
    raise AttributeError(name)

Understanding the above code, we can see that when we access object.name, the following procedure is performed in turn.

First look for the name property from the class obj belongs to, and if the corresponding class variable cls_var exists, try to get the __get__ property of the class cls_var belongs to.
If the __get__ property exists, that means that cls_var is (at least) a non-data descriptor. If so, the __get__ method defined in the descriptor is called, passing in the current object obj and the current object’s class objtype as arguments, and the result of the call is returned, and the lookup is complete, with the data descriptor completely overrides access to the object itself, __dict__.
If cls_var is a non-data descriptor (or possibly not a descriptor), it will try to find the name property in the object’s dictionary __dict__ and return the value corresponding to that property if it is present.
If the name attribute is not found in obj’s __dict__ and cls_var is a non-data descriptor, the __get__ method defined in the descriptor is called, passing the appropriate arguments as above and returning the result of the call.
If cls_var is not a descriptor, return it directly.
If it’s not found at the end, raise an AttributeError exception.

In the above process, when we get the name attribute from the class objtype to which obj belongs, if it is not found in objtype we will try to find it in the parent class it inherits from, depending on the return of the cls.__mro__ class method in the following order.

1
2

>>> print(Foo.__mro__)
(<class '__main__.Foo'>, <class 'object'>)

Now we know that descriptors are called in the object.__getattribute__() method depending on different conditions, and this is how descriptors work to control access to attributes. If we overload the object.__getattribute__() method, we can even cancel all descriptor calls.

The `getattr` method

In fact, the attribute lookup does not call object.__getattribute__() directly; the dot operator performs the attribute lookup via a helper function.

def getattr_hook(obj, name):
    "Emulate slot_tp_getattr_hook() in Objects/typeobject.c"
    try:
        return obj.__getattribute__(name)
    except AttributeError:
        if not hasattr(type(obj), '__getattr__'):
            raise
    return type(obj).__getattr__(obj, name)             # __getattr__

Therefore, if the result of obj.__getattribute__() raises an exception and the obj.__getattr__() method exists, the method will be executed. If the user calls obj.__getattribute__() directly, the complementary lookup mechanism of __getattr__() will be bypassed.

If you add this method to the Foo class.

class Foo:
    x = Descriptor()

    def __getattr__(self, item):
        print(f'{item} is indeed not found')

foo = Foo()

Then call foo.z and bar.z respectively.

>>> foo.z
z is indeed not found
>>> bar.z
AttributeError: 'Bar' object has no attribute 'z'

This behavior is only valid if the __getattr__() method is defined in the class to which the object belongs. Defining the __getattr__ method in the object, i.e. adding the attribute in obj.__dict__, is not valid, and the same applies to the __getattribute__() method.

>>> bar.__getattr__ = lambda item:print(f'{item} is indeed not found')
>>> print(bar.__dict__)
{'__getattr__': <function <lambda> at 0x1086e1430>}
>>> bar.z
AttributeError: 'Bar' object has no attribute 'z'

Python’s internal descriptors

In addition to some custom scenarios, Python’s own language mechanism makes extensive use of descriptors.

property

We won’t go into the specific effects of property, but the following is a common use of syntactic sugar.

class C:
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

property itself is a class that implements a descriptor protocol, which can also be used in the following equivalent ways.

class C:
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x

    def setx(self, value):
        self._x = value

    def delx(self):
        del self._x

    x = property(getx, setx, delx, "I'm the 'x' property.")

In the above example property(getx, setx, delx, "I'm the 'x' property.") creates an instance of the descriptor and assigns it to x. The implementation of the property class is equivalent to the following Python code.

class Property:
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):  # 描述符协议方法
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):  # 描述符协议方法
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):  # 描述符协议方法
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):  # 实例化一个拥有 fget 属性的描述符对象
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):  # 实例化一个拥有 fset 属性的描述符对象
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):  # 实例化一个拥有 fdel 属性的描述符对象
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

property stores read, write, and delete functions within the dictionary of the descriptor instance, and then determines whether the corresponding function exists when the protocol method is called to achieve control over reading, writing, and deleting of properties.

Functions

Yes, every function object we define is a non-data descriptor instance.

The purpose of using descriptors here is to allow the functions defined in the class definition to become bound methods when called through the object.

The method is called by automatically passing the object instance as the first argument, which is the only difference between a method and a normal function. Usually we specify this formal parameter as self when defining the method. The class definition of the method object is equivalent to the following code.

class MethodType:
    "Emulate PyMethod_Type in Objects/classobject.c"

    def __init__(self, func, obj):
        self.__func__ = func
        self.__self__ = obj

    def __call__(self, *args, **kwargs):
        func = self.__func__
        obj = self.__self__
        return func(obj, *args, **kwargs)

It takes a function func and an object obj in the initialization method and passes obj into func when it is called.

Let’s take a practical example.

>>> class D:
...     def f(self, x):
...          return x
...
...
>>> d = D()
>>> D.f(None, 2)
2
>>> d.f(2)
2

As you can see, when calling f through a class property, the behavior is a normal function that can pass any object as a self parameter; when accessing f through an instance property, the effect becomes a bound method call, so that the bound object is automatically passed as the first parameter in the call. Obviously creating a MethodType object when accessing a property through an instance is exactly what we can achieve with the descriptor.

The concrete implementation of the function is as follows.

class Function:
    ...

    def __get__(self, obj, objtype=None):
        "Simulate func_descr_get() in Objects/funcobject.c"
        if obj is None:
            return self
        return MethodType(self, obj)

Defining a function by def f() is equivalent to f = Function(), i.e. creating an instance of a non-data descriptor and assigning it to the f variable.

When we access this property via a class method, the call to the __get__() method returns the function object itself.

1
2

>>> D.f
<function D.f at 0x10f1903a0>

When we access this property through an object instance, the __get__() method is called to create a MethodType object initialized with the above function and object.

1
2

>>> d.f
<bound method D.f of <__main__.D object at 0x10eb6fb50>>

To recap, functions have a __get__() method as objects, making them a non-data descriptor instance so that they can be converted to binding methods when accessed as properties. Non-data descriptors will be converted to f(obj, *args) by the instance call obj.f(*args) and to f(*args) by the class call cls.f(*args).

classmethod

classmethod is a variant implemented on top of function descriptors and is used as follows.

class F:
    @classmethod
    def f(cls, x):
        return cls.__name__, x

>>> F.f(3)
('F', 3)
>>> F().f(3)
('F', 3)

The equivalent Python implementation is as follows, which will be easy to understand with the above caveat.

class ClassMethod:
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, cls=None):
        if cls is None:
            cls = type(obj)
        if hasattr(obj, '__get__'):
            return self.f.__get__(cls)
        return MethodType(self.f, cls)

@classmethod returns a non-data descriptor that implements the conversion of obj.f(*args) to f(type(obj), *args) by instance calls and cls.f(*args) to f(*args) by class calls.

staticmethod

The effect of the staticmethod implementation is that, whether we call it by instance or by class, we end up calling the original function.

class E:
    @staticmethod
    def f(x):
        return x * 10

>>> E.f(3)
30
>>> E().f(3)
30

The equivalent Python implementation is as follows.

class StaticMethod:
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, objtype=None):
        return self.f

Calls to the __get__() method return the function object itself, which is stored in __dict__, and therefore do not trigger further descriptor behavior for the function.

@staticmethod returns a non-data descriptor that implements the conversion of calls to obj.f(*args) via instances to f(*args) and calls to cls.f(*args) via classes to f(*args).

Table of Contents