PostgreSQL Foreign Data Wrappers (hereinafter referred to as FDW) is a very useful feature in real database usage scenarios. PostgreSQL’s FDW is similar to Oracle’s dblink and DB2’s Federation, which allows you to establish connections between local and external databases so that you can manipulate external data as if it were local data.
FDW What’s the point
- Data sharding uses FDW to distribute data across multiple databases to achieve data sharding (e.g. pg_shardman plug-in, which uses postgres_fdw and pg_pathman plug-ins to achieve data sharding).
- Data synchronization uses FDW to establish a connection between local and external databases to synchronize external data to local at regular intervals.
- Data Migration Use FDW to establish a connection between local database and external database to perform data migration.
- ETL (Extract-Transform-Load) uses FDW to extract data from different types of databases into one data warehouse for easy and uniform access.
PostgreSQL FDW Development Overview
In 2003, SQL/MED (SQL Management of External Data) was added to the SQL standard, which provides specifications for external data management. In 2011, PostgreSQL 9.1 was released to support external data reads, and in 2013, PostgreSQL 9.3 was released to support external data writes.
Currently, PostgreSQL (PostgreSQL 14 at the time of writing) provides several extensions to support operations on various types of external databases or files (e.g., postgres_fdw for connecting to external PostgreSQL databases, oracle_fdw for connecting to external Oracle databases, mysql _fdw supports connecting to external MySQL databases, jdbc_fdw supports connecting to external relational databases using the JDBC protocol, file_fdw supports connecting to external files of a specific format, etc.).
This article focuses only on postgres_fdw, how the PostgreSQL database connects to the external PostgreSQL database and how it manages the external data.
Using postgres_fdw
To use postgres_fdw to access a remote database, there are several steps.
- Install the postgres_fdw extension
- Create an external server
- Create a user map
- Create an external table or import an external schema
This article uses a local PostgreSQL database to emulate both the remote and local databases. Before starting the formal steps, you need to do a little bit of preparation in advance.
-
Check the PostgreSQL version
-
Creating a user in a remote PostgreSQL database
Use superuser to execute the following statement in the remote PostgreSQL database to create a normal user
fdw_user
to be used later when the local database establishes an FDW connection.1
CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';
-
Create a table in a remote PostgreSQL database
Create the weather table
weather
for testing in the remote database, insert the test data, and grant the userfdw_user
permission to add, delete, and check against the table.1 2 3 4 5 6 7 8 9 10 11 12
CREATE TABLE weather ( city varchar(80), -- city name (城市名) temp_low int, -- low temperature (最低温度) temp_high int, -- high temperature (最高温度) prcp real, -- precipitation (降水量) date date -- date (日期) ); INSERT INTO weather (city, temp_low, temp_high, prcp, date) VALUES ('Beijing', 18, 32, 0.25, '2021-05-19'), ('Beijing', 20, 30, 0.0, '2021-05-20'), ('Dalian', 16, 24, 0.0, '2021-05-21');
1
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE weather TO fdw_user;
Connect locally with user
fdw_user
to the remote database (this article is special in that it uses both local and remote databases, so the remote host is also localhost) and verify the authorized privileges.1 2 3 4 5 6 7 8 9
$ psql -h localhost -U fdw_user postgres postgres=> SELECT * FROM weather; city | temp_low | temp_high | prcp | date ---------+----------+-----------+------+------------ Beijing | 18 | 32 | 0.25 | 2021-05-19 Beijing | 20 | 30 | 0 | 2021-05-20 Dalian | 16 | 24 | 0 | 2021-05-21 (3 rows)
Note: For a real remote database, you need to add a record to the pg_hba.conf configuration file of the remote database to firewall the access IP in order to establish a connection locally.
-
Creating a user in the local PostgreSQL database
Use superuser to create the normal user
local_user
in the local PostgreSQL database by executing the following statement.1
CREATE USER local_user WITH ENCRYPTED PASSWORD 'secret';
All the preparations are done, now you can start the formal steps in the local database using superuser.
Installing the postgres_fdw extension
Use the CREATE EXTENSION
statement to install the postgres_fdw
extension.
|
|
Grant access to postgres_fdw
for user local_user
.
|
|
Creating an external server
To create an external server using the CREATE SERVER
statement, specify the host, port and database name of the remote database.
Grant access to the external server foreign_server
for the user local_user
.
|
|
Creating a User Mapping
Use the CREATE USER MAPPING
statement to create a mapping of remote users to local users, you need to provide the user name and password of the remote user.
Creating an External Table or Importing an External Schema
Use the CREATE FOREIGN TABLE
statement to create a remote table. Note that the column types need to match the actual remote table and the column names should ideally be the same, otherwise you will need to specify the column names in the remote table separately for each column using the column_name
parameter.
In most cases, you can just use the IMPORT FOREIGN SCHEMA
statement to directly import all the tables in the external schema into the local schema.
Note: Since no user mapping is specified for super_user, the following statement needs to be executed with user
local_user
, otherwise it will reportERROR: user mapping not found for "super_user"
error.
Authorize CRUD permissions for all tables (including external tables) in public mode for local_user
.
|
|
This allows you to use the user local_user
to connect to the local database and operate on external tables.
|
|
At this point, we have basically mastered the use of postgres_fdw
. Next, we will look at the system tables and functions related to FDW, and finally look at FDW transaction management and performance optimization to get a deeper understanding of FDW.
Some important parameters when setting up postgres_fdw
-
updatable
This option sets whether the external table can be updated, i.e. whether postgres_fdw allows external tables to be modified using the
INSERT
,UPDATE
andDELETE
commands, the default istrue
. It can be specified on an external table or on an external server, and the one specified on the table will override the one specified on the server. The specific statements to set or update this parameter are as follows.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-- 创建外部服务器时指定 CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (..., updatable 'false', ...); -- 创建外部表时指定 CREATE FOREIGN TABLE foreign_weather ( ... ) SERVER foreign_server OPTIONS (schema_name ..., table_name ..., updatable 'false', ...); -- 更新外部服务器参数选项 ALTER SERVER foreign_server OPTIONS (updatable 'false'); -- 更新外部表参数选项 ALTER FOREIGN TABLE foreign_weather OPTIONS(updatable 'false');
Of course, if the remote table is not actually updatable, then an error will occur anyway. Using this option mainly allows to throw errors locally without having to query the remote server.
-
truncatable
This option sets whether external tables can be truncated, i.e. whether postgres_fdw allows external tables to be truncated using the
TRUNCATE
command, the default istrue
. This parameter can also be specified on the external table or on the external server, and specifying it on the table will override specifying it on the server.The specific statements to set or update this parameter are exactly the same as
updatable
above.Of course, if the remote table is not actually truncable, then an error will occur anyway. Again, the main point of using this option is that it allows errors to be thrown locally without having to query the remote server.
-
keep_connections
This option sets whether postgres_fdw will keep connections to the remote server in the local session for reuse, the default is
on
(if set tooff
, all connections to the external server will be dropped at the end of each transaction). It can only be specified on external servers. The specific statements to set or update this parameter are as follows.
FDW related system tables and functions
system tables
FDW-related system tables are as follows (for the _pg_*
table, only super_user has access to it)
|
|
Functions
-
postgres_fdw_get_connections()
Calling this function returns the external server name of all open connections established by postgres_fdw from the local session to the external server and whether the connection is valid.
Note: This function gets the current connection status of the local session to the external server, not the connection status of the local database to the external server. Therefore, remote table queries made by opening a separate Shell Tab will not be recorded by the current local session.
This article uses the default option (
on
) for thekeep_connections
parameter when creating the external server, so the connection is kept.As you can see below
psql
connects to the local database and after doing an external table query, the querypostgres_fdw_get_connections()
function will return a row. -
postgres_fdw_disconnect(server_name text)
Disconnects all connections from postgres_fdw from the local session (local session) to the given external server, based on the name passed in.
There can be multiple connections to the given server using different user mappings (when multiple users are used to access the external server, multiple user mappings are configured and postgres_fdw will create a connection for each user mapping). If the connection is being used in the current local transaction, it will not be disconnected and a warning message will be output. Returns true if at least one connection is disconnected, false otherwise, and an error is reported if no external server with the given name is found (
ERROR: server "..." does not exist
).Following the session, execute
SELECT postgres_fdw_disconnect('foreign_server')
and returntrue
; query thepostgres_fdw_get_connections()
function again to find no more connections. -
postgres_fdw_disconnect_all()
Disconnects postgres_fdw from the local session to the external server. Used in a similar way to
postgres_fdw_disconnect(server_name text)
and will not be repeated here.
FDW Transaction Management and Performance Optimization
Transaction Management
When querying a remote table, postgres_fdw opens a new transaction on the remote server if the transaction corresponding to the current local transaction is not already open. When the local transaction is committed or aborted, the remote transaction is also committed or aborted. Savepoints are also managed by creating the corresponding remote savepoints.
When the local transaction has a SERIALIZABLE isolation level, the remote transaction also uses that isolation level; otherwise, the REPEATABLE READ isolation level is used.
If a query performs multiple table scans on a remote server, this option ensures that it will get snapshot-consistent results for all scans. The result is that successive queries in a single transaction will see the same data from the remote server, even if other activities perform concurrent updates on the remote server. This behavior is expected if the local transaction uses the SERIALIZABLE or REPEATABLE READ isolation levels, but it may be surprising for local transactions with the READ COMMITTED isolation level. Future versions of PostgreSQL may modify these rules.
Performance Optimization
postgres_fdw will be more intelligent in determining whether a query statement (the query statements to be tested include SELECT
, UPDATE
, DELETE
statements, statements involving operators, functions, joins, filter conditions and aggregate functions, etc.) should be moved down to the remote server for execution.
Ideally, the tables involved are on the remote server, and the operators, functions, etc. are all built-in types, so that postgres_fdw sends the entire query to the remote server for computation, and then just takes the results. In most cases, postgres_fdw needs to fetch the necessary data locally to perform operations such as joins, filters, and aggregate function processing. That is, postgres_fdw optimizes the queries sent to the remote server (optimizing the WHERE clause and not fetching unneeded columns) to reduce the data transfer from the remote server.
Let’s look at two examples.
-
Pure Remote Table Query
Original query statement:
1
SELECT * FROM foreign_weather;
Use
EXPLAIN VERBOSE
to view the actual query (Remote SQL) sent to the remote server as :1
SELECT city, temp_low, temp_high, prcp, date FROM public.weather
1 2 3 4 5 6 7 8 9
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT * FROM foreign_weather; QUERY PLAN ---------------------------------------------------------------------------------- Foreign Scan on public.foreign_weather (cost=100.00..121.25 rows=375 width=194) Output: city, temp_low, temp_high, prcp, date Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather (3 rows)
-
Remote table and local table join query
Create a new local table
cities
and insert test data.For queries.
The SQL sent by postgres_fdw to the remote server is
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT * FROM cities c, foreign_weather w WHERE c.name = w.city; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=118.10..163.91 rows=675 width=388) Output: c.name, c.location, w.city, w.temp_low, w.temp_high, w.prcp, w.date Hash Cond: ((w.city)::text = (c.name)::text) -> Foreign Scan on public.foreign_weather w (cost=100.00..121.25 rows=375 width=194) Output: w.city, w.temp_low, w.temp_high, w.prcp, w.date Remote SQL: SELECT city, temp_low, temp_high, prcp, date FROM public.weather -> Hash (cost=13.60..13.60 rows=360 width=194) Output: c.name, c.location -> Seq Scan on public.cities c (cost=0.00..13.60 rows=360 width=194) Output: c.name, c.location (10 rows)
That is, postgres_fdw will fetch all the data of
foreign_weather
locally and join it with the tablecities
.And for the query.
The SQL sent by postgres_fdw to the remote server is
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
$ psql -U local_user postgres postgres=> EXPLAIN VERBOSE SELECT c.name, max(w.temp_high) FROM cities c, foreign_weather w WHERE c.name = w.city AND w.temp_high <= 30 group by c.name; QUERY PLAN ------------------------------------------------------------------------------------------------------ HashAggregate (cost=143.17..145.17 rows=200 width=182) Output: c.name, max(w.temp_high) Group Key: c.name -> Hash Join (cost=119.25..141.98 rows=238 width=182) Output: c.name, w.temp_high Hash Cond: ((c.name)::text = (w.city)::text) -> Seq Scan on public.cities c (cost=0.00..13.60 rows=360 width=178) Output: c.name, c.location -> Hash (cost=117.60..117.60 rows=132 width=182) Output: w.temp_high, w.city -> Foreign Scan on public.foreign_weather w (cost=100.00..117.60 rows=132 width=182) Output: w.temp_high, w.city Remote SQL: SELECT city, temp_high FROM public.weather WHERE ((temp_high <= 30)) (13 rows)
That is, postgres_fdw optimizes the
WHERE
conditions sent to the remote server, gets only the data it needs from the remote tableforeign_weather
, and then computes it locally with the tablecities
for join, filtering, and aggregate function processing.
In summary, we have a more detailed understanding of the basic concepts of PostgreSQL external data wrappers and how to use postgres_fdw.