Recently, the project needs to implement version control of single data, so there is a table (foo) dedicated to storing key data and another table (bar) dedicated to storing Data data, how to get the latest version of all the key data in the bar table? Let’s take a look at the schema example.
|
|
The name + key in the foo
table is a unique value, so it will be a one-to-many state, and a key will correspond to multiple sets of data in the bar
. The timestamp
in the bar is used to handle version control, and each modification will create a new set of timestamp
data. The following section describes how to get the first few data for each key.
Using the UNION method
First of all, if you don’t have much data, you can use UNION to solve the problem as follows:
This approach is not expected to solve the version control problem, assuming that the same foo_id
data in each timestamp version of the number of strokes is not the same, so it will be sprayed wrong.
foo_id | timestamp | data |
---|---|---|
1 | 100 | test_01 |
1 | 100 | test_02 |
1 | 100 | test_03 |
1 | 101 | test_01 |
1 | 101 | test_02 |
1 | 101 | test_03 |
1 | 101 | test_04 |
If we only use limit method, we can’t get the data with timestamp 101 (because there are four strokes and we can only get 3 strokes by limit). So this solution is not suitable at all.
Using the rank() method
The rank() method can be supported by MySQL, SQLite or Postgres, and since my development mode is to use SQLite natively and Postgres for the production environment, I will take care of whether all three can work together while writing SQL (XD for open source projects), so let’s experiment with rank to The timeestamp is marked by rank.
You will get the following data.
foo_id | timestamp | data | rank |
---|---|---|---|
1 | 101 | test_01 | 1 |
1 | 101 | test_02 | 1 |
1 | 101 | test_03 | 1 |
1 | 101 | test_04 | 1 |
1 | 100 | test_01 | 2 |
1 | 100 | test_02 | 2 |
1 | 100 | test_03 | 2 |
This time we need to get the data when foo_id is 1, so we can solve the limit problem by using rank = 1
. Next, we need to handle how to get the latest version (timestamp) of each foo_id. Assume the following data:
foo_id | timestamp | data |
---|---|---|
1 | 100 | 1_test_01 |
1 | 101 | 1_test_01 |
1 | 101 | 1_test_02 |
2 | 100 | 2_test_01 |
2 | 101 | 2_test_02 |
2 | 102 | 2_test_03 |
3 | 100 | 3_test_01 |
3 | 103 | 3_test_02 |
3 | 104 | 3_test_03 |
3 | 105 | 3_test_04 |
We need to get the latest version
- version 101 if foo_id is 1
- version 102 if foo_id is 2
- version 105 when foo_id is 3
The data are as follows.
foo_id | timestamp | data | rank |
---|---|---|---|
1 | 101 | 1_test_01 | 1 |
1 | 101 | 1_test_02 | 1 |
2 | 102 | 2_test_03 | 1 |
3 | 105 | 3_test_04 | 1 |
By using rank = 1
, we can get the latest version of each foo. Next, suppose we want to get the version with timestamp 102, then we need to find the version of each foo that is closest to 102.
- version 101 when foo_id is 1
- version 102 when foo_id is 2
- version 100 when foo_id is 3 (100 is closest to 102)
The data are as follows.
foo_id | timestamp | data | rank |
---|---|---|---|
1 | 101 | 1_test_01 | 1 |
1 | 101 | 1_test_02 | 1 |
2 | 102 | 2_test_03 | 1 |
3 | 100 | 3_test_01 | 1 |
This is how to solve the data version control problem by rank().