When deploying WEB applications in PHP and Python, you will encounter concepts like CGI, FastCGI, WSGI, uWSGI, etc. Previously, I have been very confused about these concepts and did not know how to choose. So I took the time to sort out this part, and the following are the main contents.
CGI
CGI, or Common Gateway Interface, is a standard for interfacing between external applications (CGI programs) and Web servers, a protocol for passing information between CGI programs and Web servers. the CGI specification allows Web servers to execute external programs and send their output to Web browsers, CGI turns CGI turns the Web’s simple set of static hypermedia documents into a complete new interactive medium. In layman’s terms CGI is like a bridge that connects a web page to the executing program in the WEB server, it passes the instructions received from the HTML to the server’s executing program and returns the results of the server’s executing program to the HTML page. CGI has excellent cross-platform performance and can be implemented on almost any operating system.
The CGI method first creates a sub-process of cgi when it encounters a connection request (user request), activates a CGI process, then processes the request, and ends the sub-process when it is finished. This is the fork-and-execute mode. So the server with cgi method will have as many cgi sub-processes as there are connection requests, and repeated loading of sub-processes is the main reason for low performance of cgi. When the number of user requests is very large, it will crowd the system resources such as memory and CPU time, resulting in low performance.
CGI script workflow:
- The browser requests a URL to a CGI application through an HTML form or hyperlink.
- The server receives and sends the request.
- The server executes the specified CGI application.
- the CGI application performs the required action, usually based on the viewer’s input.
- the CGI application formats the result into a document (usually an HTML page) that the web server and the browser can understand.
- the web server returns the results to the browser.
FastCGI
FastCGI is a scalable, high-speed interface for communicating between HTTP servers and dynamic scripting languages. FastCGI is supported by most popular HTTP servers, including Apache, Nginx, and lighttpd, and is also supported by many scripting languages, among them PHP.
FastCGI is an improvement from CGI development. The main disadvantage of the traditional CGI interface approach is poor performance, because each time the HTTP server encounters a dynamic program it needs to restart the script parser to perform the parsing, and then the result is returned to the HTTP server. FastCGI is like a resident (long-live) type of CGI, which can be executed all the time, as long as it is activated and does not take time to fork every time (this is the most criticized fork-and-execute mode of CGI). FastCGI is what is known as a long-lived application. Because FastCGI programs do not need to constantly generate new processes, they can greatly reduce the pressure on the server and produce a more efficient application. It is at least 5 times faster and more efficient than CGI technology. It also supports distributed computing, i.e. FastCGI programs can be executed on hosts other than the web server and receive requests from other web servers.
FastCGI is a language-independent, scalable architecture for CGI open extensions whose main behavior is to keep the CGI interpreter process in memory and get higher performance as a result. It is well known that repeated loading of CGI interpreters is the main reason for low CGI performance, and if CGI interpreters are kept in memory and scheduled by FastCGI process manager, they can provide good performance, scalability, Fail-Over feature, etc. FastCGI interface approach uses C/S architecture, which can separate HTTP server and script parsing server The FastCGI approach uses a C/S architecture that allows the HTTP server to be separated from the script parsing server, while one or more script parsing daemons are started on the script parsing server. Whenever the HTTP server encounters a dynamic program, it can deliver it directly to the FastCGI process for execution, and then return the result to the browser. This approach allows the HTTP server to focus exclusively on static requests or to return the results of dynamic script servers to the client, which greatly improves the performance of the entire application.
The workflow of FastCGI.
- FastCGI process manager (PHP-CGI or PHP-FPM or spawn-cgi) is loaded when the Web Server starts.
- The FastCGI process manager initializes itself, starts multiple CGI interpreter processes (visible as multiple php-cgi) and waits for a connection from the Web Server. When a client request reaches the Web Server, the FastCGI process manager selects and connects to a CGI interpreter. the Web server sends CGI environment variables and standard input to the FastCGI subprocess php-cgi. The FastCGI subprocess then waits for and processes the next connection from the FastCGI process manager (running in the Web Server). In CGI mode, php-cgi exits at this point.
Features of FastCGI
- Breaks traditional page processing techniques. With traditional page processing techniques, the program must be on the same server as the Web Server or Application Server. This history has been broken by FastCGI technology, which allows applications to be installed on any server in a server farm and communicate with the Web server via TCP/IP protocol, making it suitable for both the development of large distributed Web clusters and efficient database control.
- Explicit request model. CGI technology does not have an explicit role, in FastCGI programs, the program is given explicit roles (responder role, authenticator role, filter role).
ISAPI
ISAPI (Internet Server Application Program Interface) is a set of API interfaces provided by Microsoft for WEB services, which can achieve all the functions provided by CGI and extend them on this basis, such as providing a filter application program interface. ISAPI applications are mostly used in the form of DLL dynamic libraries ISAPI applications are mostly used in the form of DLL dynamic libraries, which can be executed after being requested by users and will not disappear immediately after processing a user request, but will continue to reside in memory waiting to process other user input. In addition, the ISAPI DLL application and the WEB server are in the same process, which is significantly more efficient than CGI (due to Microsoft’s exclusivity, it can only run in the windows environment).
ISAPI server extensions provide an alternative to Common Gateway Interface (CGI) applications that use an Internet server. Unlike CGI applications, ISA runs in the same address space where the HTTP server resides and has access to all resources available to the HTTP server. ISA has a lower system overhead than CGI applications because they do not require the creation of other processes and do not perform communication that requires crossing process boundaries, which is very time consuming. Both the extension and filter DLLs may be unloaded if memory is needed by other processes.ISAPI allows multiple commands in a DLL that are implemented as member functions of the CHttpServer object in the DLL.CGI requires a separate name for each task and a URL mapping to a separate executable file. Each new CGI request starts a new process, and each different request is contained in its own executable file that is loaded and unloaded on a per-request basis, so the system overhead is higher than that of an ISA.
PHP-CGI
PHP-CGI is the FastCGI manager that comes with PHP. php-CGI is deficient in.
- php-cgi needs to be restarted after changing the ini configuration in order for the new php-ini to take effect, no smooth restart is possible
- Directly killing php-cgi process php will not work. (PHP-FPM and Spawn-FCGI do not have this problem, the daemon will smoothly generate new child processes from the new)
Spawn-FCGI
Spawn-FCGI is a generic FastCGI management server, it is a part of lighttpd, many people use Lighttpd’s Spawn-FCGI for FastCGI mode management work, but there are quite a few disadvantages. The emergence of PHP-FPM more or less alleviates some of the problems, but PHP-FPM has the disadvantage of having to recompile, which may be a small risk for some already running environments), in php 5.3.3 you can use PHP-FPM directly. spawn-FCGI has very little code, all 630 lines, written in c, the last commit was 5 years ago. Code homepage: https://github.com/lighttpd/spawn-fcgi
Spawn-FCGI code analysis is as follows.
- spawn-fcgi first create socket, bind, listen 3 steps to create server socket, (call this socket fcgi_fd)
- use dup2 to exchange fcgi_fd to FCGI_LISTENSOCK_FILENO (FCGI_LISTENSOCK_FILENO value is equal to 0, which is the socket id specified in fastcgi protocol to listen)
- execute execl , replaces the current process image with a new process image. process image The code segment of the process in the runtime space
Apparently, Spawn-FCGI is also a pre-fork model, but written in an archaic C language, full of dark programming tricks under unix.
Spawn-FCGI has a single function.
- In 2009, I used to use spawn-fcgi to deploy php-cgi, when it ran for a while, it would all hang, and I could only use crontab to restart spawn-fcgi regularly.
- It is not responsible for the network IO in the sub-process, just put the socket to the specified location and that’s it, the next thing is handled by the spawned program
Spawn-FCGI is a very early program, just look up. There is also: a code from 1996:http://www.fastcgi.com/om_archive/kit/cgi-fcgi/cgi-fcgi.c, one style with spawn-fcgi
PHP-FPM
PHP-FPM is a PHP FastCGI manager for PHP only, available for download at http://php-fpm.org/download. PHP-FPM is actually a patch to the PHP source code, designed to integrate FastCGI process management into the PHP package. It must be patched into your PHP source code and used after compiling and installing PHP. FPM (FastCGI Process Manager) is used to replace most of the additional features of PHP-CGI and is very useful for high load websites.
Its features include.
- advanced process management features that support smooth stop/start.
- the ability to work in different uid/gid/chroot environments and listen to different ports and use different ini profiles (which can replace the safe_mode setting).
- stdout and stderr logging;
- Ability to restart and cache corrupted opcode in case of an accident;
- File upload optimization support;
- “slow logging” - logs abnormally slow runs caused by scripts (not only file names, but also PHP backtrace information, which can be read and analyzed using ptrace or similar tools);
- fastcgi_finish_request() - special function: used to continue executing time-consuming work in the background (recording video conversion, statistics processing, etc.) after the request is completed and the data is refreshed.
- dynamic/static child process generation.
- Basic SAPI runtime status information (similar to Apache’s mod_status).
- ini-based configuration files.
WSGI
The Web Server Gateway Interface (WSGI) is a simple and generic interface between a web server and a web application or framework defined for the Python language. Since WSGI was developed, similar interfaces have appeared in many other languages.WSGI is designed as a low-level interface between a Web server and a Web application or application framework to enhance the commonality of portable Web application development.WSGI is based on existing CGI standards.
WSGI is divided into two parts: one is the “server” or “gateway”, and the other is the “application” or “application framework “. When processing a WSGI request, the server provides the environment information and a Callback Function for the application. When the application has finished processing the request, through the aforementioned callback function, the results will be transmitted back to the server. The so-called WSGI middleware implements both sides of the API and therefore mediates between the WSGI service and the WSGI application: from the WSGI server’s point of view, the middleware acts as the application, and from the application’s point of view, the middleware acts as the server. The “middleware” component can perform the following functions.
- Rewrites environment variables to route request messages to different application objects based on the target URL.
- Allow multiple applications or application frameworks to run simultaneously in a single process.
- Load balancing and remote processing, by forwarding request and response messages over the network.
- Perform content post-processing, such as applying XSLT style sheets.
Previously, choosing the right web application framework became a problem for Python beginners because, in general, the choice of web application framework would limit the choice of available web servers, and vice versa. At that time, Python applications were usually designed for one of CGI, FastCGI, mod_python, or even a custom API interface for a particular web server.WSGI has no official implementation, because WSGI is more like a protocol. WSGI is a CGI wrapper for Python, as opposed to Fastcgi, which is a CGI wrapper for PHP.
WSGI web components into three categories: web server, web middleware, web application, wsgi basic processing model: WSGI Server -> (WSGI Middleware)* -> WSGI Application .
WSGI Server/gateway
wsgi server can be understood as a web server that conforms to the wsgi specification, receives request requests, encapsulates a series of environment variables, calls the registered wsgi app according to the wsgi specification, and finally returns the response to the client. It is hard to explain clearly what wsgi server is and what it does, so the most intuitive way is to look at the implementation code of wsgi server. Take wsgiref from Python as an example, wsgiref is a simple wsgi server implemented according to the wsgi specification, and its code is not complicated.
- The server creates a socket, listens to the port, and waits for the client to connect.
- When a request comes in, the server parses the client information into the environment variable environ and calls the bound handler to process the request.
- The handler parses the http request and puts the request information such as method, path, etc. into environ.
- wsgi handler then puts some server-side information into environ, and finally the server information, client information, and this request information are all saved to the environment variable environ.
- wsgi handler calls the registered wsgi app and passes the environ and callback function to the wsgi app.
- wsgi app passes the response header/status/body back to the wsgi handler
- Eventually the handler still sends the response information back to the client via socket.
WSGI Application
The wsgi application is a normal callable object that is called by the wsgi server when a request comes in. start_response is a callback function that the wsgi application calls to start_response to return the response headers/status to the wsgi server. In addition, the wsgi app will return an iterator object, which is the response body.
WSGI MiddleWare
Some functions may be between the server program and the application, for example, the server has got the URL requested by the client, different URLs need to be handled by different functions, this function is called URL Routing, this function can be placed in the middle of the two, this intermediate layer is middleware. middleware is transparent to the server program and the application. The middleware is transparent to the server application and the application, i.e., the server application thinks it is the application, and the application thinks it is the server. This tells us that middleware needs to disguise itself as a server that accepts the application and calls it, while middleware also needs to disguise itself as an application that passes to the server program.
In fact, whether it is a server program, middleware or an application, they are all on the server side, providing services to the client. The reason for abstracting them into different layers is to control the complexity, making each time less complicated, each in its own way.
uWSGI
The uWSGI project aims to develop a complete solution for deploying distributed clusters of web applications. uWSGI is primarily oriented towards the web and its standard services, and has been successfully applied to many different languages. Due to uWSGI’s extensible architecture, it can be extended without limits to support more platforms and languages. Currently, you can write plugins in C, C++ and Objective-C. The “WSGI” in the project name is a tribute to the Python web standard of the same name, as WSGI developed the first plug-in for the project.
uWSGI is a web server that implements the WSGI protocol, uwsgi, http, etc. uWSGI, instead of using either the wsgi protocol or the FastCGI protocol, creates its own uwsgi protocol. uwsgi protocol is a protocol owned by the uWSGI server, which is used to define the type of information to be transmitted (type of uwsgi protocol is a uWSGI server’s own protocol, it is used to define the type of information (type of information) of transmission, the first 4byte of each uwsgi packet is the description of the type of transmission information, it is two different things compared with WSGI. It is said that the protocol is about 10 times faster than fcgi protocol.
The main features of uWSGI are as follows.
- Super fast performance.
- Low memory consumption (measured to be about half of apache2’s mod_wsgi).
- Multi-app management.
- Extensive logging capabilities (can be used to analyze app performance and bottlenecks).
- Highly customizable (memory size limit, restart after a certain number of services, etc.).
Other extended knowledge : Java Servlet, Sinatra, Rack