1. Installation
GoReplay is written in Go and has a single executable file, which can be downloaded from the official Release page and placed in the PATH
directory.
2. Basic Use
The overall use of the GoReplay command line is to specify the input and input side, and then GoReplay copies the traffic from the input side to the output side.
2.1. Real-time traffic replication
The GoReplay input can specify a tcp address, and GoReplay will then copy the traffic from that port to the output; the following example shows copying traffic from 127.0.0.0:8000
and outputting it to the console.
First start an HTTP Server, here we use the python
HTTP Server directly
then have gor listen to the same port, --output-stdout
specifying the output as the console
At this point, accessing python
’s HTTP Server via curl
shows that gor copies the HTTP request and outputs it to the console
Also if we specify the output side as another HTTP Server with the -output-http
option, then gor will synchronize the request and send it to the output HTTP Server.
2.2. Traffic crawling and replay
2.2.1. Basic use
GoReplay can save traffic to a file by specifying the output as a file, and then GoReplay reads that saved traffic file and redeploys it to the specified HTTP Server.
First save the request to a file with the -outpu-file
option
Read the traffic information using the -input-file
option and then redirect it to the target server using the -output-http
option
2.2.2. Extended Options
When saving traffic to a file, by default GoReplay writes to the file in blocks and each block will generate a separate file name ( test_0.gor
), if you want to write all blocks of traffic to one file, you can set --output-file-append
to true
.
Also, GoReplay output file names support date placeholders, e.g. --output-file %Y%m%d.gor
will generate a file name like 20210801.gor
; all available date placeholders are as follows:
%Y
: year including the century (at least 4 digits)%m
: month of the year (01..12)%d
: Day of the month (01..31)%H
: Hour of the day, 24-hour clock (00..23)%M
: Minute of the hour (00..59)%S
: Second of the minute (00..60)
When there are many requests, saving the traffic to a file may result in a large file, so you can use the .gz ending as the file name, and GoReplay will automatically compress it with GZip when it reads the .gz suffix.
|
|
If you need to replay multiple files in aggregate, just specify multiple files and GoReplay will automatically keep the request order during replay:
|
|
When using file input, GoReplay also supports stress tests, where GoReplay will replay requests at twice the rate for file names specified by test.gor|200%:
2.3. Data Loss and Buffers
GoReplay uses a relatively low-level packet interception technique, where the kernel GoReplay intercepts a TCP packet when it arrives; however, packets can arrive out of order, and then the kernel needs to rebuild the TCP stream to ensure that upper-layer applications can read TCP packets in the correct order, at which point the kernel has a buffer of packets; by default, Linux systems have a buffer of By default, Linux systems have a buffer of 2M and Windiws have a buffer of 1M. When a particular HTTP request packet exceeds the buffer, GoReplay cannot intercept it properly (because GoReplay needs a complete HTTP request packet for saving to a file or replaying), and it may cause problems such as lost requests, corrupted requests, etc.
To solve this problem, GoReplay provides -input-raw-buffer-size
option to adjust the buffer size, for example -input-raw-buffer-size 10485760
option will adjust the buffer to 10M.
2.4. Speed limit
In some cases, for debugging purposes, we may capture traffic in the production environment and mirror it to the test environment for replay; however, we may not need such a large request rate due to the high volume of traffic in the production environment, so we can let GoReplay control the number of requests for us through rate limiting.
Absolute number limit: With parameters of the form -output-http "ADDRESS|N"
, GoReplay guarantees that the mirrored traffic will not exceed “N” requests per second.
Percent limit restriction: With parameters of the form -output-http "ADDRESS|N%"
, GoReplay ensures that mirrored traffic is maintained at “N%” of the total traffic.
2.5. Request Filtering
At some point we only expect to redirect specific traffic from the production environment to the test environment, or to disallow some traffic from being redirected to the test environment, at which point we can use GoReplay’s filtering capabilities; GoReplay provides the following options to provide filtering capabilities:
-http-allow-header
: HTTP header to allow replay (regular support)-http-allow-method
: HTTP methods that are allowed to be replayed--http-allow-url
: URL to allow replay (regular support)--http-disallow-header
: HTTP headers that are not allowed (regularity supported)--http-disallow-url
: disallowed HTTP URLs (regular support)
Here is a sample of the official command:
|
|
2.6. Request Rewrite
Sometimes the URL path of the test environment may be completely different from the production environment, so if you replay the traffic from the production environment in the test environment directly, it may lead to the wrong request path and so on; for this reason GoReplay provides URL rewriting, parameter setting, request header setting and other functions.
URL rewriting via -http-rewrite-url
option
Set URL parameters
|
|
Set the request header
Host header is a special request header, by default GoReplay will automatically set it to the domain name of the target replay address, if you want to turn off this default behavior use the -http-original-host
option
3. Other advanced configurations
3.1. Relay server
GoReplay can use a relay server to chain traffic. To use a relay server, simply set the output side to TCP mode and the input side of the relay server to TCP mode:
If there are multiple relay servers, you can use the -split-output
option to have each GoReplay that grabs traffic send traffic to each relay server using a polling algorithm:
|
|
3.2. Output to ElasticSearch
GoReplay supports setting the output side to ElasticSearch:
|
|
There is no need to create indexes before exporting to ES, GoReplay will do it automatically and the data structure after exporting to ES is as follows:
|
|
3.3. Kafka Docking
In addition to exporting to ES, GoReplay also supports exporting to Kafka and reading data from Kafka:
Reference https://mritd.com/2021/08/03/use-goreplay-to-record-your-live-traffic/