PostgreSQL version schema performance comparison

Requirement
  1. Developers keep releasing new software, the version format was {major}.{minor}.{micro}.{build}
  2. When a device ask for upgrade information, we need figure out the latest versions of software
  3. But not only the latest one, we may need to know following version information, so that we can provide suitable recommendation
    1. is there are 3 newer versions of software?
    2. How many newer software versions?

Challenge
  1. Database sort text by "Natural Sorting"
  2. These 2 versions, the older version will be treated as the newer one if we use natural sorting
    1. 1.2.3.100 => Will be treated as older because 2 > 1
    2. 1.2.3.20 => So the version 20 will be treated as newer
Ideas
  1. Persist major, minor, micro, build in different columns
  2. Calculate versions to be a number, which can be sorted correctly
  3. Merge versions to be a single text, and need to be sorted correctly (We need padLeft 0 to let all versions become the same length)
  4. Persist major, minor, micro, build in a byte array (BLOB),
    assume the natural sorting will compare BLOB from the first element of an array.
Test Steps
  1. Prepare TestContainer for PostgreSQL testing
  2. Generate 10000 records
    1. major 0
    2. minor 0-9
    3. micro 0-9
    4. build 0-99
  3. Compare with a specified version: 0.6.7.58
  4. Compare query plan and query result (Query the latest 3 records)

Code for the test

Tables
Version1
DDL
CREATE TABLE version1 (
id uuid PRIMARY KEY,
major int,
minor int,
micro int,
build int
);

CREATE INDEX version1_major_idx ON version1 (major DESC);
CREATE INDEX version1_major_minor_idx ON version1 (major DESC, minor DESC);
CREATE INDEX version1_major_minor_micro_idx ON version1 (major DESC, minor DESC, micro DESC);
CREATE INDEX version1_major_minor_micro_build_idx ON version1 (major DESC, minor DESC, micro DESC, build DESC);
SQL
String sql = "SELECT * FROM Version1 " +
"WHERE (major > 0) " +
"OR (major = 0 AND minor > 6) " +
"OR (major = 0 AND minor = 6 AND micro > 7) " +
"OR (major = 0 AND minor = 6 AND micro = 7 AND build > 58) " +
"ORDER BY major DESC, minor DESC, micro DESC, build DESC " +
"LIMIT 3";
Query Plan
QUERY PLAN: Limit (cost=0.29..1.16 rows=3 width=32) (actual time=0.065..0.193 rows=3 loops=1)
QUERY PLAN: -> Index Scan using version1_major_minor_micro_build_idx on version1 (cost=0.29..982.84 rows=3366 width=32) (actual time=0.051..0.073 rows=3 loops=1)
QUERY PLAN: Filter: ((major > 0) OR ((major = 0) AND (minor > 6)) OR ((major = 0) AND (minor = 6) AND (micro > 7)) OR ((major = 0) AND (minor = 6) AND (micro = 7) AND (build > 58)))
QUERY PLAN: Planning Time: 0.920 ms
QUERY PLAN: Execution Time: 0.307 ms
QUERY RESULT: {major=0, minor=9, micro=9, build=99, id=7e5878ff-9d25-46dc-a0c0-79196fd8c5d3}
QUERY RESULT: {major=0, minor=9, micro=9, build=98, id=ee2097ef-fe86-491e-869b-afda5976a354}
QUERY RESULT: {major=0, minor=9, micro=9, build=97, id=ada64c48-72e2-4a39-a6b0-5eabda135d0d}

Version2
DDL
CREATE TABLE version2 (
id uuid PRIMARY KEY,
build int
);

CREATE INDEX version2_order_idx ON version2 (build DESC);
SQL
String sql = "SELECT * FROM Version2 " +
"WHERE build > " + getVersion2Number(0,6,7,58) +
" ORDER BY build DESC" +
" LIMIT 3";
Query Plan
QUERY PLAN: Limit (cost=0.29..0.61 rows=3 width=20) (actual time=0.053..0.124 rows=3 loops=1)
QUERY PLAN: -> Index Scan using version2_order_idx on version2 (cost=0.29..391.76 rows=3627 width=20) (actual time=0.037..0.060 rows=3 loops=1)
QUERY PLAN: Index Cond: (build > 6758)
QUERY PLAN: Planning Time: 0.388 ms
QUERY PLAN: Execution Time: 0.195 ms
QUERY RESULT: {build=9999, id=e32b8067-9cae-4c49-b776-372e8a2137e4}
QUERY RESULT: {build=9998, id=c2902963-6579-48d3-bb91-deac6a8528bd}
QUERY RESULT: {build=9997, id=de28843c-b245-4b46-8e2d-7e549f0b862e}

Version3
DDL
CREATE TABLE version3 (
id uuid PRIMARY KEY,
build bytea
);

CREATE INDEX version3_order_idx ON version3 (build DESC);
SQL
String sql = "SELECT * FROM Version3 " +
"WHERE build > ?" +
" ORDER BY build DESC" +
" LIMIT 3";
Query Plan
QUERY PLAN: Limit (cost=0.28..0.77 rows=3 width=48) (actual time=0.063..0.130 rows=3 loops=1)
QUERY PLAN: -> Index Scan using version3_order_idx on version3 (cost=0.28..368.24 rows=2283 width=48) (actual time=0.047..0.068 rows=3 loops=1)
QUERY PLAN: Index Cond: (build > '\x000506073a'::bytea)
QUERY PLAN: Planning Time: 0.329 ms
QUERY PLAN: Execution Time: 0.214 ms
[0, 9, 9, 99]
[0, 9, 9, 98]
[0, 9, 9, 97]

Version4
DDL
CREATE TABLE version4 (
id uuid PRIMARY KEY,
build VARCHAR
);

CREATE INDEX version4_build_idx ON version4 (build DESC);
SQL
String sql = "SELECT * FROM Version4 " +
"WHERE build > '" + getVersion4Text(0, 6, 7, 58) + "'" +
" ORDER BY build DESC" +
" LIMIT 3";
Query Plan
QUERY PLAN: Limit (cost=0.29..0.80 rows=3 width=48) (actual time=0.074..0.137 rows=3 loops=1)
QUERY PLAN: -> Index Scan using version4_build_idx on version4 (cost=0.29..512.72 rows=2996 width=48) (actual time=0.060..0.079 rows=3 loops=1)
QUERY PLAN: Index Cond: ((build)::text > '0000000600070058'::text)
QUERY PLAN: Planning Time: 0.313 ms
QUERY PLAN: Execution Time: 0.256 ms
QUERY RESULT: {build=0000000900090099, id=29f9f1a0-40eb-4a31-873f-ff04868fa3d1}
QUERY RESULT: {build=0000000900090098, id=1ead5f4d-73d9-4cda-b899-f745b90f8598}
QUERY RESULT: {build=0000000900090097, id=bc9a4dfe-12c9-4c34-8925-836e8c4a7ad3}

Comparison
OptionscostPros & Cons
Version1 (split columns)(cost=0.29..982.84 rows=3366 width=32)
Pros: Flexible, can change SQL easily
Cons: Slow
Version2
(calculate to number)
(cost=0.29..391.76 rows=3627 width=20)
Pros: Fast
Cons:
  1. Need calculate before persist and may need migrate if the logic to compare changed
  2. Sorting will be broken if the number exceed max number
Version3
(calculate to string)
(cost=0.28..368.24 rows=2283 width=48)
Pros: Fast and don't have max number issue
Cons:  Need calculate before persist and may need migrate if the logic to compare changed
Version4
(calculate to byte array)
(cost=0.29..512.72 rows=2996 width=48)
Pros: Don't need extra calculation
Cons:
  1. Easy to exceed max byte number, so still need extra calculation
  2. Slow


別名演算法 Alias Method

 題目 每個伺服器支援不同的 TPM (transaction per minute) 當 request 來的時候, 系統需要馬上根據 TPM 的能力隨機找到一個適合的 server. 雖然稱為 "隨機", 但還是需要有 TPM 作為權重. 解法 別名演算法...