PHP is simple, but it’s not easy to master. In addition to knowing how to use it, we also need to know how it works under the hood.
What is the purpose of understanding the underlying implementation of PHP? To use a dynamic language well, we must first understand it, the memory management and framework model is worth learning from, and we can optimize the performance of our programs by extending the development to achieve more and more powerful features.
PHP is a dynamic language for web development. To be more specific, it is a software framework that contains a large number of component modules implemented in C. It is a powerful UI framework.
In short; the PHP dynamic language execution process: after getting a piece of code, the source program is translated into individual instructions (opcodes) through lexical and syntactic parsing, and then the ZEND virtual machine executes these instructions in order to complete the operation. PHP itself is implemented in C, so the final calls are to C functions, so in effect we can think of PHP as a C-developed piece of software.
PHP directory structure
The PHP source code also includes several files generated during development, and several sections maintained in their respective locations upstream. (Note: PHP version 7.4.13).
|
|
Directory | Description |
---|---|
TSRM | Thread-related safety implementation, PHP thread safety is built on top of the TSRM library, PHP implementation of the common *G macro is usually the encapsulation of TSRM, TSRM (Thread Safe Resource Manager) thread-safe resource manager. |
Zend | Core implementation of PHP parser, such as lexical syntax parsing of scripts, execution of opcode and implementation of extension mechanism, etc. |
build | Compile related directories under linux |
ext | PHP extensions, including the definition and implementation of most PHP functions, such as array series, pdo series, spl series and other function implementations, are in this directory. Personally written extensions can also be placed in this directory when testing, for easy testing and debugging. |
main | PHP’s main code, where the most core PHP files are stored, mainly to achieve the basic facilities of PHP, here and Zend engine is not the same, Zend engine mainly to achieve the language’s most core language runtime environment. |
netware | Network directory, definition and implementation of sockets |
pear | PEAR is an abbreviation for the PHP Extension and Application Repository, a code repository for PHP extensions and applications. It is a code repository for PHP extensions and applications. Simply put, PEAR is to PHP what CPAN (Comprehensive Perl Archive Network) is to Perl. |
sapi | PHP’s application layer interface contains code for various server abstraction layers, such as apache’s mod_php, cli,cgi,embed, and fpm. |
scripts | Script directory under Linux |
tests | Test scripts directory, containing test files for various PHP functions |
travis | For building, non-PHP specific directories |
win32 | The scripts related to compiling PHP under Windows, such as the implementation of sokcet is not quite the same under Windows and *Nix platform, and also includes the scripts related to compiling PHP under Windows。 |
Although there are many source directories, the only core directories are sapi
, main
, zend
, ext
, and TSRM
.
SAPI
The input to PHP
programs can be standard input from the command line, or network requests based on the cgi/fastcgi
protocol. It can even be embedded in a microcontroller for C
, C++
programs to call. They correspond to cli mode, fpm/cgi mode, embed mode, and in addition to these there are apache2handler
, litespeed mode.
-
apache2handle: This is the way to deal with
apache
aswebserver
, usingmod_PHP
mode to run, and it is the most widely used one now. -
cgi: This is another way of interaction between
webserver
andPHP
directly, that is, the famous fastcgi protocol, in recentfastcgi+PHP
is getting more and more applications, and it is the only way supported by asynchronouswebserver
; typical applicationnginx
server;fastcgi
is To be clear, it is an extension ofphp
. -
cli: command invocation.
The sapi directory is an abstraction of the input and output layers, and is the specification for PHP to provide external services.
Similarly, the output can be written to the standard output of the command line or returned to the client as a network response based on the cgi/fastcgi protocol.
SAPI full name Server API, responsible for PHP external service specification, it defines the structure sapi_module_struct, the structure defines the mode start, shutdown, activation, expiration and so on many hook function pointers, each mode will these function pointers to their own function, it can easily extend the way of PHP external service. The above several modes are also the implementation of sapi_module_strcut to complete the multi-scenario application of PHP.
fastcgi process
Web Server
loads theFastCGI
process manager (IIS ISAPI or Apache Module) at startup- The
FastCGI
process manager initializes itself, starts multipleCGI
interpreter processes (visible as multiple php-cgi) and waits for a connection from the WebServer
. - When a client request reaches the
Web Server
, theFastCGI
process manager selects and connects to aCGI
interpreter. TheWeb server
sends theCGI
environment variables and standard input to theFastCGI
subprocessphp-cgi
. - The
FastCGI
subprocess finishes processing and returns the standard output and error messages to theWeb Server
from the same connection. When theFastCGI
subprocess closes the connection, the request is processed. TheFastCGI
subprocess then waits for and processes the next connection from theFastCGI
process manager (running in the Web Server). InCGI
mode,php-cgi
exits at this point. - In the above case, you can imagine how slow
CGI
usually is. For everyWeb
request, PHP has to re-parsephp.ini
, reload all the extensions and re-initialize all the data structures. WithFastCGI
, all of this happens only once when the process starts. An additional benefit is that Persistent database connection works.
main
The main
directory is the glue between the SAPI
layer and the Zend
layer.
The role of the main
directory is to take requests from SAPI
, parse out the script files and parameters to be executed, and initialize the environment and configuration, such as initializing variables and constants, registering functions, parsing configuration files, loading extensions, etc.
Zend
The Zend
engine is the kernel part of php
, which translates php
code into executable opcode
processing and implements the corresponding processing methods, basic data structures, memory allocation management, etc. It consists of two parts: the compiler and the executor.
The compiler is responsible for the lexical and syntactic analysis of the PHP
code, and generates an abstract syntax tree, which is then further compiled into opcode
, opcode
is the instruction recognized by the Zend
virtual machine, php7
has 173
opcodes
in total, and all the syntax is composed of these opcodes
. The executor is responsible for executing the opcode
output by the compiler.
Extensions
ext(extension)
, which is a way to extend the function of PHP
kernel, divided into PHP
extension and zend
extension, both support user-defined development, both are more common, PHP
extensions are gd
, json
, date
, array
, etc., and the familiar opcache
is Zend
extension.
TSRM
TSRM (Thread Safe Resource Manager) is a thread-safe resource manager.
A global variable is a variable defined outside a function, it is a public resource, in a multi-threaded environment, access to public resources may cause conflicts, TSRM is born to solve the problem.
The main purpose of TSRM is to ensure the safety of shared resources, and PHP’s thread safety mechanism is simple and intuitive - in a multi-threaded environment, each thread is provided with a separate copy of the global variable. This is implemented by allocating (locking before allocation) an independent ID (self-incrementing) to each thread via TSRM as an index to the current thread’s global variable memory area, enabling complete independence between threads for subsequent global variable access.
Most of the PHP SAPIs are single-threaded, so there is not much need to pay attention to thread safety, but in the case of Apache or the user’s own implementation of the PHP environment, it is necessary to consider thread safety.
PHP design philosophy and features
- multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that a request hanging will not affect the full service, and PHP also supports multi-threaded model as early as now.
- weakly typed language: different from C/C++, JAVA, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined unchanged at the beginning, it will be determined and may occur implicitly or display type conversion only in the run, the flexibility of this mechanism is very convenient and efficient in web development, the specific will be detailed in the PHP variables later.
- engine (Zend) + component (ext) model to reduce internal coupling.
- middle layer (sapi ) Sapi full name is Server Application Programming Interface isolated web server and PHP.
- syntax is simple and flexible, not too much specification. Disadvantages lead to mixed styles.
php execution flow & opcode
The php dynamic language execution process: after getting a piece of code, the source program is translated into individual instructions (opcodes) through lexical and syntactic parsing, and then the Zend virtual machine executes these instructions sequentially. php itself is implemented in c, so the final calls are to c functions, so in effect, we can think of php as a piece of software developed in c.
The core of php execution is the translated directives (opcode), which are the basic unit of php program execution.
There are several common processing functions.
HashTable - the core data structure
HashTable
is the core data structure of Zend
, it is used to implement almost all common functions in PHP
, we know PHP array is its typical application, in addition, inside zend
, such as function symbol table, global variables, etc. are also based on hash table
with the following features.
- supports typical
key->value
queries - can be used as an array
- O(1) complexity for adding and deleting nodes
- key supports mixed types: the presence of both associative index arrays
- value supports mixed types: array(“string”,2332)
- linear traversal support: such as
foreach
Zend hash table
implements the typical hash
table hash structure, and provides forward and reverse traversal of arrays by attaching a two-way chain table. The structure is shown in the following figure.
As you can see: in hash table
there is both a hash structure in the form of key->value
and a bidirectional linked table model, which makes it very convenient to support fast lookup and linear traversal.
-
Hash structure: Zend’s hash structure is a typical hash table model, which resolves conflicts by means of linked lists. Note that zend’s hash table is a self-growing data structure, and when the hash table is full, it dynamically expands by a factor of two and repositions the elements. In addition, zend itself has made some optimizations to speed up the key->value fast lookup by trading space for time. For example, in each element, a variable nKeyLength is used to identify the length of the key for quick determination.
-
Doubly linked list:
Zend hash table
implements a linear traversal of elements through a Linked list structure. In theory, it is enough to use a Linked list for traversal. The main reason for using a Doubly linked list is to quickly delete and avoid traversal. TheZend hash table
is a composite structure that can be used as an array, i.e. it supports the usual associative arrays and can be used as sequential indexed numbers, even allowing a mixture of the two. PHP associative arrays: Associative arrays are the typical application ofhash_table
. A query process goes through the following steps (as you can see from the code, this is a common hash query process with some quick determinations to speed up the lookup).
- PHP Indexed Arrays
Index arrays are our common arrays, accessed by subscripts. For example: $arr[0]
, zend hashtable is internally normalized, and for index type key is also assigned hash value and nKeyLength (to 0). The internal member variable nNextFreeElement is the maximum id currently assigned, which is automatically added to one after each push. It is this normalization process that allows PHP to achieve a mix of associative and non-associative. Due to the special nature of the push operation, the order of index keys in the PHP array is not determined by the subscript size, but by the order of push. For example, $arr[1] = 2
; $arr[2] = 3
; for a double type key, Zend hashtable will treat him as an index key.
PHP variables
PHP
is a weakly typed language that does not strictly distinguish between the types of variables itself. PHP
does not require a type to be specified at the time of variable declaration.
PHP
may perform implicit conversions of variable types during program runtime. As with other strongly typed languages, explicit type conversions may be performed in programs.
PHP variables can be classified as simple types (int, string, bool), collection types (array resource object) and constants (const). All of the above variables have the same structure at the bottom zval.
Zval
consists of three main parts.
- type: specifies the type of variable stated (integer, string, array, etc.)
- refcount&is_ref: used to implement reference counting (described later in detail)
- value: the core part, which stores the actual data of the variable
Zvalue
is used to store the actual data of a variable. Because of the need to store multiple types, zvalue
is a union
, and thus implements weak types.
The correspondence between the php
variable type and its actual storage is as follows.
-
Reference counting is widely used in memory recovery, string manipulation, etc. Variables in PHP are a typical application of reference counting. Zval’s reference counting is implemented by member variables is_ref and ref_count, which allows multiple variables to share the same data. This allows multiple variables to share a single copy of the data, avoiding the need for frequent copying. When assigning, zend points the variable to the same Zval with ref_count++ and ref_count-1 when unset. only when ref_count is reduced to 0 is the actual destruction operation performed. In the case of a reference assignment, zend will modify is_ref to 1.
-
PHP variables share data by reference counting, so what if you change the value of one of the variables? When trying to write to a variable, if Zend finds that the Zval pointed to by that variable is shared by multiple variables, it makes a copy of the Zval with a ref_count of 1 and decrements the refcount of the original Zval, a process called ‘Zval separation’. This process is called ‘Zval separation’. As you can see, zend will only copy when a write operation occurs, so it is also called copy-on-write. Integer and floating point number is one of the basic types in PHP and a simple type variable. For integers and floating point numbers, the corresponding values are stored directly in Zvalue. The types are long and double.
-
The Zvalue structure shows that, unlike strongly typed languages such as c, php does not distinguish between int, unsigned int, long, etc. For it, there is only one type of integer, which is long, and thus the range of integers in php is determined by the number of bits in the compiler rather than being fixed.
-
For floating point numbers, similar to integers, it also doesn’t distinguish between float and double, but is uniformly of the type double only. In php, what if an integer is out of bounds? In this case, it is automatically converted to double, so be careful, as many triks are generated from this.
-
Like integers, character variables are also base and simple type variables in php. The Zvalue structure shows that in php, strings consist of a pointer to the actual data and a length structure, which is more similar to string in c++. Since the length is represented by an actual variable, unlike c, its strings can be binary numbers (containing \0), and in php, finding the string strlen is an O(1) operation. When adding, modifying, or appending string operations, php reallocates memory to generate a new string. Finally, for security reasons, php still adds a \0 to the end of a string when it is generated.
Common string splicing methods and speed comparison. Suppose there are 4 variables as follows.
Now a comparison and explanation of several character splicing methods as above.
|
|
PHP arrays are implemented naturally via zend hashtable. how is foreach operation implemented?
-
For an array foreach is done by traversing a Doubly linked list in hashtable. For indexed arrays, foreach is more efficient than for, eliminating the need for a key->value lookup. count calls HashTabel -> NumOfElements, O(1), and for strings like ‘123’, zend converts them to integers,
$arr[123]
and$arr[' 123']
are equivalent. -
Resource types are the most complex variables in PHP and are a type of compound structure. PHP’s Zval can represent a wide range of data types, but it is difficult to adequately describe them for custom data types. Since there is no efficient way to depict these composite structures, there is also no way to use traditional straw rentals for them. To solve this problem, it is only necessary to refer to a pointer by an essentially arbitrary identifier (label), in a way known as a resource.
In Zval, for resource, lval is used as a pointer to the address where the resource is located. resource can be any composite structure, we are familiar with mysqli, fsock, memcached, etc. are all resources.
How to use resources
- Registration: for a custom data type to be used as a resource. It needs to be registered first, and
zend
will assign a globally unique label to it. - Get a resource variable: For a resource,
zend
maintains ahash_tale
of id->actual data. For aresource
, only its id is recorded inZval
. fetch finds the specific value in the hash_table by id and returns it. - Resource destruction: There are various data types for resources. There is no way to destroy it in
Zend
itself. So you need to provide the destruction function when registering the resource. - When
unset
a resource,zend
calls the appropriate function to complete the destruct. It is also removed from the global resource table.
A resource can persist for a long time, not just after all the variables that reference it have gone out of scope, but even after a request has ended and a new one has been made. These resources are called persistent resources because they persist through the entire lifecycle of SAPI
, unless they are intentionally destroyed. In many cases, persistent resources can improve performance to some extent. For example, in the common case of mysql_pconnect
, persistent resources are allocated via pemalloc
so that they are not freed at the end of the request. For zend
, there is no distinction between the two per se.
How local and global variables are implemented in PHP
- For a request, at any given moment
PHP
can see two symbol tables (symbol_table and active_symbol_table), where the former is used to maintain global variables. The latter is a pointer to the symbol table of the currently active variable. When the program enters a function,zend
allocates a symbol table x to it while pointingactive_symbol_table
to a. The distinction between global and local variables is achieved in this way. - Get variable values:
PHP
symbol table is implemented byhash_table
, each variable is assigned a unique identifier, and the correspondingZval
is found from the table when fetching. - Using global variables in functions: In functions, we can use global variables by explicitly declaring
global
. Create a reference to a variable with the same name inactive_symbol_table
, or first if there is no variable with the same name insymbol_table
.