undefined | Better HN

0 pointsangrais3y ago0 comments

Does anyone have any idea _why_ there's a peak in Q1 2020?

0 comments

So that appears to be a fun edge case with that quick SQL query!

The spike was in March which was the start of the COVID-19 pandemic in the United States/shelter-in-place, which was a time of odd behavior and so a spike wouldn't be too weird.

It turns out that dang posted a special whoishiring soon after which was massively popular: https://news.ycombinator.com/item?id=22665398

I thought I filtered out nonstandard threads in the query:

   WITH whoishiring_threads AS (
      SELECT id FROM `bigquery-public-data.hacker_news.full`
      WHERE `by` = "whoishiring" 
      AND REGEXP_CONTAINS(title, "Ask HN: Who is hiring?")
    )

...but that filter is a regex, and in a regex the `?` is a modifier character and not a literal. So the query will combine the counts of both the top-level comments of that thread and the original one.

Data science is fun like that, and surprisingly not the first time I've made that particular query mistake.

boulos3y ago

This is part of why I prefer to use LIKE directly when the thing I'm doing isn't actually a regexp. Then again, it's equally easy to screw up the %, but I feel it's maybe more visible (due to the less common characters). [My primary reason is clarity for the reader, so they don't have to attempt to parse the potential regexp].

More pernicious than ? is . though. Not that it matters in your case, but a lot of matches really can be "oops, a one character substitution totally matches, too".

minimaxir3y ago

You have more experience in this area, but REGEXP_CONTAINS() seems faster than LIKE for bigger datasets IMO.

Although in my work I tend to use REGEXP_CONTAINS() as an efficient multifilter for different inputs, which speeds things up too.

1 more reply

ericskiff3y ago

I can reflect that we saw this happening in real life hiring. Remote hiring competition got FIERCE from the beginning of the pandemic and really through most of 2021. It has since settled out quite a bit.

There was a combination of large companies needing programmers to enable remote work or process changes due to the pandemic, stimulus funds causing a lot of VC funding to hit startups, hiring binges at FAANG companies, and everyone suddenly being remote-friendly.

tyingq3y ago

Covid opening up more companies to the idea of remote workers? Or forcing development to fill gaps of processes that used to happen offline?

randomdata3y ago

Things that used to be done in person suddenly went online and everyone was in a rush to make that transition happen quickly and smoothly.

j / k navigate · click thread line to collapse

0 comments

minimaxir3y ago

So that appears to be a fun edge case with that quick SQL query!

The spike was in March which was the start of the COVID-19 pandemic in the United States/shelter-in-place, which was a time of odd behavior and so a spike wouldn't be too weird.

It turns out that dang posted a special whoishiring soon after which was massively popular: https://news.ycombinator.com/item?id=22665398

I thought I filtered out nonstandard threads in the query:

   WITH whoishiring_threads AS (
      SELECT id FROM `bigquery-public-data.hacker_news.full`
      WHERE `by` = "whoishiring" 
      AND REGEXP_CONTAINS(title, "Ask HN: Who is hiring?")
    )

Data science is fun like that, and surprisingly not the first time I've made that particular query mistake.

boulos3y ago

More pernicious than ? is . though. Not that it matters in your case, but a lot of matches really can be "oops, a one character substitution totally matches, too".

minimaxir3y ago

You have more experience in this area, but REGEXP_CONTAINS() seems faster than LIKE for bigger datasets IMO.

Although in my work I tend to use REGEXP_CONTAINS() as an efficient multifilter for different inputs, which speeds things up too.

1 more reply

ericskiff3y ago

tyingq3y ago

Covid opening up more companies to the idea of remote workers? Or forcing development to fill gaps of processes that used to happen offline?

randomdata3y ago

Things that used to be done in person suddenly went online and everyone was in a rush to make that transition happen quickly and smoothly.

j / k navigate · click thread line to collapse