The spike was in March which was the start of the COVID-19 pandemic in the United States/shelter-in-place, which was a time of odd behavior and so a spike wouldn't be too weird.
It turns out that dang posted a special whoishiring soon after which was massively popular: https://news.ycombinator.com/item?id=22665398
I thought I filtered out nonstandard threads in the query:
WITH whoishiring_threads AS (
SELECT id FROM `bigquery-public-data.hacker_news.full`
WHERE `by` = "whoishiring"
AND REGEXP_CONTAINS(title, "Ask HN: Who is hiring?")
)
...but that filter is a regex, and in a regex the `?` is a modifier character and not a literal. So the query will combine the counts of both the top-level comments of that thread and the original one.Data science is fun like that, and surprisingly not the first time I've made that particular query mistake.
More pernicious than ? is . though. Not that it matters in your case, but a lot of matches really can be "oops, a one character substitution totally matches, too".
Although in my work I tend to use REGEXP_CONTAINS() as an efficient multifilter for different inputs, which speeds things up too.
There was a combination of large companies needing programmers to enable remote work or process changes due to the pandemic, stimulus funds causing a lot of VC funding to hit startups, hiring binges at FAANG companies, and everyone suddenly being remote-friendly.