Descriptors are an advanced concept in Python and the basis for many of Python’s internal mechanisms, so this article will go into them in some depth.
Definition of a descriptor
The definition of a descriptor is simple; a Python object that implements any of the following methods is a descriptor.
__get__(self, obj, type=None)
__set__(self, obj, value)
__delete__(self, obj)
The meanings of the parameters of these methods are as follows.
self
is the currently defined descriptor object instance.obj
is the instance of the object that the descriptor will act on.type
is the type of the object on which the descriptor acts (i.e., the class it belongs to).
The above methods are also known as descriptor protocols, and Python will call a method at a specific time with parameters passed in according to the protocol; if we don’t define the method with the agreed-upon parameters, the call may go wrong.
The role of descriptors
Descriptors can be used to control the behavior of access to properties, enabling functions such as calculating properties, lazy loading of properties, property access control, etc. Let’s start with a simple example.
|
|
In the example we created an instance of the descriptor and assigned it to the x
property variable of the Foo
class. Now when you visit Foo.x
, you’ll see that Python automatically calls the __get__()
method of the descriptor instance to which the property is bound.
Next, instantiate an object foo and access the x property through the foo object.
The corresponding methods defined by the descriptor are also executed.
If we try to assign an x
to a foo
object, the __set__()
method of the descriptor is also called.
Similarly, if we define the __delete__()
method in the descriptor, this method will be called when del foo.x
is executed.
The descriptor will be called during the property lookup by the .
dot operator during property lookup, and is only valid when used as a class variable.
If assigned directly to an instance property, the descriptor will not take effect.
If the descriptor is accessed indirectly with some_class.__dict__[descriptor_name]
, the protocol method of the descriptor is also not called, but the descriptor instance itself is returned.
Types of descriptors
Descriptors can be further divided into two categories depending on the implemented protocol methods.
- If either
__set__()
or__delete__()
method is implemented, the descriptor is a data descriptor (data descriptor
). - If only the
__get__()
method is implemented, the descriptor is a non-data descriptor.
There are differences in the presentation behavior of the two.
- A data descriptor always overwrites the attributes in the instance dictionary
__dict__
. - while non-data descriptors may be overridden by attributes defined in the instance dictionary
__dict__
.
In the example above we have shown the effect of a data descriptor, next we remove the __set__()
method to implement a non-data descriptor.
|
|
When bar.__dict__
does not have an attribute with key y
, accessing bar.y
and foo.x
will have the same behavior.
But if we modify the __dict__
of the bar
object directly by adding the y
property to it, the object property will override the y
descriptor defined in the Bar
class, and accessing bar.y
will no longer call the __get__()
method of the descriptor.
In the data descriptor example above, access to the x
property is always controlled by the descriptor, even if we modify foo.__dict__
.
In the following we will describe how these two differences are implemented.
Descriptor implementation
The key to descriptor-controlled property access is what happens between the execution of foo.x
and the time the __get()__
method is called.
How object properties are stored
In general, object properties are saved in the __dict__
attribute.
- According to the Python documentation,
object.__dict__
is a dictionary or other mapped type object that stores the (writable) properties of an object. - Most custom objects, with the exception of some built-in Python objects, will have a
__dict__
property. - This attribute contains all the properties defined for that object, and
__dict__
is also known as amappingproxy
object.
Let’s continue from the previous example.
When we access foo.x
, how does Python determine whether to call the descriptor method or get the corresponding value from __dict__
? One of the key roles is played by the .
is the dot operator.
How object properties are accessed
The lookup logic for the dot operator is located in the object.__getattribute__()
method, which is called on the object every time the dot operator is executed on the object. cPython implements this method in C. Let’s look at its equivalent Python version.
|
|
Understanding the above code, we can see that when we access object.name
, the following procedure is performed in turn.
- First look for the
name
property from the classobj
belongs to, and if the corresponding class variablecls_var
exists, try to get the__get__
property of the classcls_var
belongs to. - If the
__get__
property exists, that means thatcls_var
is (at least) a non-data descriptor. If so, the__get__
method defined in the descriptor is called, passing in the current objectobj
and the current object’s classobjtype
as arguments, and the result of the call is returned, and the lookup is complete, with the data descriptor completely overrides access to the object itself,__dict__
. - If
cls_var
is a non-data descriptor (or possibly not a descriptor), it will try to find thename
property in the object’s dictionary__dict__
and return the value corresponding to that property if it is present. - If the
name
attribute is not found in obj’s__dict__
andcls_var
is a non-data descriptor, the__get__
method defined in the descriptor is called, passing the appropriate arguments as above and returning the result of the call. - If
cls_var
is not a descriptor, return it directly. - If it’s not found at the end, raise an
AttributeError
exception.
In the above process, when we get the name
attribute from the class objtype
to which obj
belongs, if it is not found in objtype
we will try to find it in the parent class it inherits from, depending on the return of the cls.__mro__
class method in the following order.
Now we know that descriptors are called in the object.__getattribute__()
method depending on different conditions, and this is how descriptors work to control access to attributes. If we overload the object.__getattribute__()
method, we can even cancel all descriptor calls.
The __getattr__
method
In fact, the attribute lookup does not call object.__getattribute__()
directly; the dot operator performs the attribute lookup via a helper function.
Therefore, if the result of obj.__getattribute__()
raises an exception and the obj.__getattr__()
method exists, the method will be executed. If the user calls obj.__getattribute__()
directly, the complementary lookup mechanism of __getattr__()
will be bypassed.
If you add this method to the Foo
class.
Then call foo.z
and bar.z
respectively.
This behavior is only valid if the __getattr__()
method is defined in the class to which the object belongs. Defining the __getattr__
method in the object, i.e. adding the attribute in obj.__dict__
, is not valid, and the same applies to the __getattribute__()
method.
Python’s internal descriptors
In addition to some custom scenarios, Python’s own language mechanism makes extensive use of descriptors.
property
We won’t go into the specific effects of property
, but the following is a common use of syntactic sugar.
property
itself is a class that implements a descriptor protocol, which can also be used in the following equivalent ways.
In the above example property(getx, setx, delx, "I'm the 'x' property.")
creates an instance of the descriptor and assigns it to x
. The implementation of the property
class is equivalent to the following Python code.
|
|
property
stores read, write, and delete functions within the dictionary of the descriptor instance, and then determines whether the corresponding function exists when the protocol method is called to achieve control over reading, writing, and deleting of properties.
Functions
Yes, every function object we define is a non-data descriptor instance.
The purpose of using descriptors here is to allow the functions defined in the class definition to become bound methods when called through the object.
The method is called by automatically passing the object instance as the first argument, which is the only difference between a method and a normal function. Usually we specify this formal parameter as self
when defining the method. The class definition of the method object is equivalent to the following code.
It takes a function func
and an object obj
in the initialization method and passes obj
into func
when it is called.
Let’s take a practical example.
As you can see, when calling f
through a class property, the behavior is a normal function that can pass any object as a self
parameter; when accessing f
through an instance property, the effect becomes a bound method call, so that the bound object is automatically passed as the first parameter in the call. Obviously creating a MethodType
object when accessing a property through an instance is exactly what we can achieve with the descriptor.
The concrete implementation of the function is as follows.
Defining a function by def f()
is equivalent to f = Function()
, i.e. creating an instance of a non-data descriptor and assigning it to the f
variable.
When we access this property via a class method, the call to the __get__()
method returns the function object itself.
When we access this property through an object instance, the __get__()
method is called to create a MethodType
object initialized with the above function and object.
To recap, functions have a __get__()
method as objects, making them a non-data descriptor instance so that they can be converted to binding methods when accessed as properties. Non-data descriptors will be converted to f(obj, *args)
by the instance call obj.f(*args)
and to f(*args)
by the class call cls.f(*args)
.
classmethod
classmethod
is a variant implemented on top of function descriptors and is used as follows.
The equivalent Python implementation is as follows, which will be easy to understand with the above caveat.
@classmethod
returns a non-data descriptor that implements the conversion of obj.f(*args)
to f(type(obj), *args)
by instance calls and cls.f(*args)
to f(*args)
by class calls.
staticmethod
The effect of the staticmethod
implementation is that, whether we call it by instance or by class, we end up calling the original function.
The equivalent Python implementation is as follows.
Calls to the __get__()
method return the function object itself, which is stored in __dict__
, and therefore do not trigger further descriptor behavior for the function.
@staticmethod
returns a non-data descriptor that implements the conversion of calls to obj.f(*args)
via instances to f(*args)
and calls to cls.f(*args)
via classes to f(*args)
.