Ask HN: Would a DB of startup tech stacks be valuable to you?

19 pointslsj06273y ago23 comments

I'm imagining the user would be a Hiring Manager or Recruiter looking for Engineers. If they need Ruby Engineers with startup experience, they click the Ruby box from a tech drop-down list and the search will retrieve the startups that also use it. Ideally, you'd be able to sort by geographical area, founding year, latest funding phase, number of employees (e.g. 50-200), and more. I would also aim for matching the right area of the stack - for example, the option to pick Python AND Backend, so you don't end up with startups using Python only for Data Science/ML work.

Note: I did try the StackShare API and there is no filtering feature. So if you purchase the 1,000-company plan, you have no control over what they send you. It'll be a randomly generated list of 1,000 companies that use the technology you requested, a hodgepodge of companies all around the world, big and small, new and old.

I look forward to hearing your thoughts. Thanks!

23 comments

Tanjreeve3y ago

In HN style I'm going to suggest You could "just" scrape job postings for software Devs and get the same thing and have much more confidence it's accurate what people are using.

dirtbag__dad3y ago

I have done this before and you get a lot of looking for “experience in Go, Java, or Elixir.”

Then two lines down it states they team uses python. Or maybe that it doesn’t explicitly state the language at all.

Parsing around this sucks. Maybe one of the new openai tools could solve this. I attempted it before their time.

lsj0627OP3y ago

Great point, thank you. However, I think this leaves out many companies - those that don't have job postings.

I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs.

Thoughts?

Tanjreeve3y ago

>Great point, thank you. However, I think this leaves out many companies - those that don't have job postings

Unless you're planning to cold call people and get them to pinky swear to tell you honestly what they're using or you have some other plan then you're somewhat stuck anyway for companies that don't post jobs.

Also worth noting but plenty of companies won't even really tell you anyway. E.g plenty of companies will have language like "systems programming language" or "object oriented language". When they could be using anything from C-family to Haskell (leaving aside how secretive many Haskell jobs are or being hidden in custom dialects) You are going to be running into all kinds of human BS, it'll be fun but a can of worms nonetheless.

>I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs

Not sure how much experience you have modelling data but this can also be trickier than expected to capture postings by date even leaving aside the fun of unstructured data and differences in models between platforms and your judgement calls needed to decide where you're crawling.

Having cut my teeth scraping property listings of competitor websites you come to realise most boards incentivise people to delete and repost ads so they boost their recency score and appear higher in the search. So now you will have duplicates messing up your data which you want to deal with if you're trying to create value off your data.

The classified site also doesn't like this so will try to stop this gaming of the system so that game of cat and mouse will normally mess up your scraping and dedupe logic too.

As said it's a potentially fun can of worms to open. I was just making a joke about HN commenters tendency to massively underestimate the oceans of complexity that seperates their hello world project from an enterprise grade "just a CRUD app" system that people pay for. E.g all the people that could totally build twitter with a sqlite DB and some bash scripts + sellotape etc.

1 more reply

hitpointdrew3y ago

If I wanted to know what tech stack a site is using I would just pop open Maltego and find out. Which it seems like you are doing but just automating over a large list of companies and storing results in a db.

lsj0627OP3y ago

Well, you lost me. I just went over to Maltego and it seems to be an investigations tech company (forensics, security, threat intelligence).

How do you use this to find a company's tech stack?

hitpointdrew3y ago

Install the software, open it up, put in a website you want to know the stack of, scan it, get results back. It's been awhile since I have used it, and the interface/UI has a bit of learning curve, so might not be super intuitive on how to get the tech stack info, but it's there somewhere.

1 more reply

jonas-w3y ago

I think any website where you can filter without any algorithms that think they are smarter than you, ads, seo, etc. is valuable. Imagine you had "direct" DB access to the data google, reddit, twitter, hackernews (we have HN on google bigquery, and its awesome), github, stackerflow, youtube, ... hold. Anyone who knows exactly what they want, will find it. People that don't know exactly what they want, may find it harder to find anything.

I don't know about your specific use case, but personally anything like that is valuable to me.

chzblck3y ago

This already exists in many of the Sales tools that are out there today.

Zoominfo, Apollo, and Seamless all have the ability to show what types of technology a company is using.

lsj0627OP3y ago

I didn't know this - I thought they were just sales prospecting tools with tons of features, but nothing involing to finding company tech stacks.

Do you have experience working with these products and seeing its capabilities in finding the technologies companies are using?

Thanks.

dirtbag__dad3y ago

https://www.rocks.gold/ is a comprehensive repository of jobs and company data.

There are also enterprise companies like predictleads.com that offer jobs data.

Pricing is all over the place and quality was an issue when I was evaluating them because they go for volume over accuracy.

1 more reply

chzblck3y ago

Yes - In sales currently and use it to segment accounts. Ex - AWS/GCP/AZURE lab/hub/bitbucket.

Apollo.io is free to sign up and use.

2 more replies

nithayakumar3y ago

Ive seen data sets like this and they've been bad. My main issues have been that dataset isn't kept updated, theres no sense of proportion (e.g. is 1% of the team java or 50%), and there's often not enough companies in the dataset.

So bad that I probably wouldn't buy this data without some proof that its good data

lsj0627OP3y ago

This is really helpful feedback, thanks!

Yes, keeping it updated is a hard job, but it needs to be done regularly.

Can you elaborate on the sense of proportion problem?

And as far as enough companies, can you give me an example of a specific search you might conduct and how many results you'd expect?

Thanks!

pcthrowaway3y ago

I would use it in a job search.

I'm sure tech sales people would find it useful as well

lsj0627OP3y ago

You're right. I forgot to mention that it'd be useful for job seekers, e.g Rustaceans, click that box ad find your companies!

Question: how would tech sales people use it? I'm not familiar enough with the field to know their use case.

pcthrowaway3y ago

Well if it's just programming languages, then they probably wouldn't.

You said "tech stacks" which led me to believe it was everything from databases to SaaS to dependencies.

There is massive competition in the database space to sell people support services.

If a company is using the Timescale community edition, Timescale's enterprise sales might reach out to them. They also might reach out to people using Influx.

Someone selling a UI framework might reach out to people using similar ones. And so on

edit: And Datatog sales can continue reaching out to anyone not using Datadot

j / k navigate · click thread line to collapse

23 comments

Tanjreeve3y ago

In HN style I'm going to suggest You could "just" scrape job postings for software Devs and get the same thing and have much more confidence it's accurate what people are using.

dirtbag__dad3y ago

I have done this before and you get a lot of looking for “experience in Go, Java, or Elixir.”

Then two lines down it states they team uses python. Or maybe that it doesn’t explicitly state the language at all.

Parsing around this sucks. Maybe one of the new openai tools could solve this. I attempted it before their time.

lsj0627OP3y ago

Great point, thank you. However, I think this leaves out many companies - those that don't have job postings.

I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs.

Thoughts?

Tanjreeve3y ago

>Great point, thank you. However, I think this leaves out many companies - those that don't have job postings

>I think a job board like startup.jobs would solve this by creating a job archive - then it would be prime scraping material. But it's only a job board with (mainly) current jobs

The classified site also doesn't like this so will try to stop this gaming of the system so that game of cat and mouse will normally mess up your scraping and dedupe logic too.

1 more reply

hitpointdrew3y ago

lsj0627OP3y ago

Well, you lost me. I just went over to Maltego and it seems to be an investigations tech company (forensics, security, threat intelligence).

How do you use this to find a company's tech stack?

hitpointdrew3y ago

1 more reply

jonas-w3y ago

I don't know about your specific use case, but personally anything like that is valuable to me.

chzblck3y ago

This already exists in many of the Sales tools that are out there today.

Zoominfo, Apollo, and Seamless all have the ability to show what types of technology a company is using.

lsj0627OP3y ago

I didn't know this - I thought they were just sales prospecting tools with tons of features, but nothing involing to finding company tech stacks.

Do you have experience working with these products and seeing its capabilities in finding the technologies companies are using?

Thanks.

dirtbag__dad3y ago

https://www.rocks.gold/ is a comprehensive repository of jobs and company data.

There are also enterprise companies like predictleads.com that offer jobs data.

Pricing is all over the place and quality was an issue when I was evaluating them because they go for volume over accuracy.

1 more reply

chzblck3y ago

Yes - In sales currently and use it to segment accounts. Ex - AWS/GCP/AZURE lab/hub/bitbucket.

Apollo.io is free to sign up and use.

2 more replies

nithayakumar3y ago

So bad that I probably wouldn't buy this data without some proof that its good data

lsj0627OP3y ago

This is really helpful feedback, thanks!

Yes, keeping it updated is a hard job, but it needs to be done regularly.

Can you elaborate on the sense of proportion problem?

And as far as enough companies, can you give me an example of a specific search you might conduct and how many results you'd expect?

Thanks!

pcthrowaway3y ago

I would use it in a job search.

I'm sure tech sales people would find it useful as well

lsj0627OP3y ago

You're right. I forgot to mention that it'd be useful for job seekers, e.g Rustaceans, click that box ad find your companies!

Question: how would tech sales people use it? I'm not familiar enough with the field to know their use case.

pcthrowaway3y ago

Well if it's just programming languages, then they probably wouldn't.

You said "tech stacks" which led me to believe it was everything from databases to SaaS to dependencies.

There is massive competition in the database space to sell people support services.

If a company is using the Timescale community edition, Timescale's enterprise sales might reach out to them. They also might reach out to people using Influx.

Someone selling a UI framework might reach out to people using similar ones. And so on

edit: And Datatog sales can continue reaching out to anyone not using Datadot

j / k navigate · click thread line to collapse