Decided to re-create this in MySQL on fairly old hardware, and with actual phone numbers - the latter shouldn't make a difference since they're still VARCHAR, but I already have a program [0] to generate schema with them, so why not?
I did have to do a few manual updates after data load because the aforementioned program can't make foreign keys yet, and also for bools (which MySQL stores as tinyint(1)) I'm randomly generating them via `id & 1`, which isn't what you had.
Also, I gave `hn_phone` its own auto-increment int as a PK, so I could have a non-unique index on `user_id`. In MySQL, if you create a table without a PK, you get one of these, in descending order of precedence:
* The first indexed UNIQUE NOT NULL column promoted to PK
* An invisible, auto-generated, auto-incrementing integer column called `my_row_id` as PK (MySQL >= 8.0.30, if sql_generate_invisible_primary_key=1)
* A hidden index called `GEN_CLUST_INDEX` created on a super-invisible (i.e. doesn't show up in table definition) column called `ROW_ID`, but that column is shared across the entire DB so please don't do this
It's worth noting that since the first 10,000,000 rows all have `is_primary` set, this can finish extremely quickly. If you invert that match with these tables, you have to do a table scan on `hn_phone`, and the time jumps up to about 5650 msec. If you change the `hn_phone` index to be a composite on (`user_id`, `is_primary`) and then rewrite the query to use a subquery instead of a join, the time drops to around 7 msec. You might see a slight speed-up if you index `created_at` in descending order if that was the normal access pattern.
Anyway:
OS: Debian Bullseye 5.10.0-23-amd64
Virtualized: Yes (Proxmox)
CPU: E5-2650 v2 @ 2.60GHz
Allocated Core Count: 16
Allocated RAM: 64 GiB PC3-12800R
Disk: Samsung PM983 1.92 TiB via Ceph
Filesystem: XFS
Mount Options: defaults,noatime
MySQL Version: 8.0.34
MySQL Options (non-default):
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 134217728
innodb_buffer_pool_size = 17179869184
innodb_numa_interleave = 1
innodb_sync_array_size = 16 # this shouldn't apply here, but listing anyway
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 16
innodb_write_io_threads = 16 # this shouldn't apply here, but listing anyway
CREATE TABLE `hn_user` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_created_at` (`created_at`)
);
CREATE TABLE `hn_phone` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`user_id` int unsigned NOT NULL,
`is_primary` tinyint(1) NOT NULL DEFAULT '1',
`phone` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
CONSTRAINT `hn_phone_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `hn_user` (`id`)
);
mysql> SELECT COUNT(*) FROM hn_user UNION SELECT COUNT(*) FROM hn_phone;
+----------+
| COUNT(*) |
+----------+
| 10000000 |
| 10200000 |
+----------+
2 rows in set (1.20 sec)
mysql> SELECT
u.id,
u.created_at,
ut.is_primary,
ut.phone
FROM
hn_user u
JOIN hn_phone ut ON u.id = ut.user_id
WHERE
ut.is_primary
ORDER BY
u.created_at DESC
LIMIT 10;
+---------+---------------------+------------+--------------------+
| id | created_at | is_primary | phone |
+---------+---------------------+------------+--------------------+
| 6906106 | 2023-08-12 06:08:25 | 1 | +61 02 5317 2261 |
| 6906106 | 2023-08-12 06:08:25 | 1 | +254 20 294 205 |
| 6738922 | 2023-08-12 06:07:12 | 1 | +61 02 1247 3361 |
| 6738922 | 2023-08-12 06:07:12 | 1 | +44 0131 8386 4494 |
| 7449553 | 2023-08-12 06:03:55 | 1 | +61 02 7649 6731 |
| 7449553 | 2023-08-12 06:03:55 | 1 | +61 02 7893 9835 |
| 6908862 | 2023-08-12 05:51:52 | 1 | +81 03 6743-6893 |
| 6908862 | 2023-08-12 05:51:52 | 1 | +44 0131 8414 7888 |
| 4134961 | 2023-08-12 05:51:42 | 1 | +1 614-908-1719 |
| 4134961 | 2023-08-12 05:51:42 | 1 | +44 0131 9898 8958 |
+---------+---------------------+------------+--------------------+
10 rows in set (0.00 sec)
mysql> WITH latest_event AS (
SELECT
event_id
FROM
performance_schema.events_statements_history_long
WHERE
sql_text LIKE 'SELECT u.id%'
ORDER BY
event_id DESC
LIMIT 1
)
SELECT
event_name,
TRUNCATE(
TIMER_WAIT / POW(10, 9),
3
) AS 'duration (msec)'
FROM
performance_schema.events_stages_history_long stg
JOIN latest_event ON stg.nesting_event_id = latest_event.event_id
UNION
SELECT
"total",
TRUNCATE(
TIMER_WAIT / POW(10, 9),
3
)
FROM
performance_schema.events_statements_history_long stmt
JOIN latest_event ON stmt.event_id = latest_event.event_id;
+------------------------------------------------+-----------------+
| event_name | duration (msec) |
+------------------------------------------------+-----------------+
| stage/sql/starting | 0.261 |
| stage/sql/Executing hook on transaction begin. | 0.003 |
| stage/sql/starting | 0.016 |
| stage/sql/checking permissions | 0.006 |
| stage/sql/checking permissions | 0.005 |
| stage/sql/Opening tables | 0.134 |
| stage/sql/init | 0.008 |
| stage/sql/System lock | 0.023 |
| stage/sql/optimizing | 0.034 |
| stage/sql/statistics | 0.087 |
| stage/sql/preparing | 0.074 |
| stage/sql/executing | 0.74 |
| stage/sql/end | 0.003 |
| stage/sql/query end | 0.003 |
| stage/sql/waiting for handler commit | 0.025 |
| stage/sql/closing tables | 0.019 |
| stage/sql/freeing items | 0.176 |
| stage/sql/cleaning up | 0.003 |
| total | 1.654 |
+------------------------------------------------+-----------------+
19 rows in set (0.00 sec)
[0]:
https://github.com/stephanGarland/genSQL # shameless plug; it's super messy and probably unintuitive, but it's getting better/faster and has been a fun ride learning how fast you can make Python (and when to offload to C).