Skip to content

Top Best Ask Show New Jobs

Show HN: IPDetective – An API for IP bot detection (opens in new tab)

(ipdetective.io)

68 pointsAndrewCopeland3y ago58 comments

IPDetective collects data from about 60+ different sources such as official cloud provider endpoints and public VPN/Proxy/Tor/Bot net lists. Then aggregates this data into a fast and easy to use API that can be integrated into applications or scripts easily.

IPDetective started as a hobby project for my other hobby projects :) and I decided to wrap a simple website around and offer it as a service.

Let me know what your thoughts, if you find value in this service or if you have any feature requests.

58 comments

43 comments · 16 top-level

stevenicr3y ago· 4 in thread

I checked the posted privacy policy - and I have no idea what your system logs and for how long.

There are a few sites I could use this for, but some of them would also end up sending private customer ips for lookups, and for that reason I just manually check the suspicious ones and manually don't look up the ones I know have passed the 'already a member and human not hacker' gate.

I keep look for something I can just add lists to my server and check against that, but not going to spend thousands on it.

marvinblum3y ago

Same here. I would like to integrate a simple blacklist into our system. Does anybody know of a good IP lists that contains known bots, data centers, and so on?

Also from a performance perspective, I can't do hundreds of millions of requests a month over the wire. That just a waste.

AndrewCopelandOP3y ago

IPDetective collects data data from bot nets, datacenters, tor nodes, proxies and vpns. Is that what you are looking for?

Also IPDetective can be used just as a detection solution rather than a prevention solution.

Sometimes I use it against my nginx access logs and see how much of my traffic could be from bots.

AndrewCopelandOP3y ago

Thank you very much for your input. I will look into having the privacy be more verbose around the system logs and for how long I store them.

> I keep look for something I can just add lists to my server and check against that, but not going to spend thousands on it.

What do you mean by this? Like having the ability to create your own deny/black list of ip addresses and then you can validate against it?

toast03y ago

>> I keep look for something I can just add lists to my server and check against that, but not going to spend thousands on it.

> What do you mean by this? Like having the ability to create your own deny/black list of ip addresses and then you can validate against it?

They want to download your database and run queries on it locally, rather than call your API. This way they don't share data with you and don't need to worry about your data practices.

bragr3y ago· 3 in thread

>Let me know what your thoughts

Do you have any data showing your service is better/more accurate/better false positive rate than competing offerings?

>IPDetective collects data from about 60+ different sources such as official cloud provider endpoints and public VPN/Proxy/Tor/Bot net lists.

Are you on the up and up with all those public sources? I'm not sure which ones you're sourcing, but many do not allow commercial use, resale, or at the very least have some attribution clause to keep security companies from mooching off crowdsourced data.

>No, currently IPDetective does not support ipv6 addresses. However this feature is on the road map.

That's a major shortcoming in 2022.

Finally, as itake pointed out bellow, I'm not giving you my email just to run a simple test query and see what the results look like.

AndrewCopelandOP3y ago

Thanks for your input.

>Do you have any data showing your service is better/more accurate/better false positive rate than competing offerings

I have signed up to some of the competitors and my service averaged about 20ms per request for a single IP address. From the competition it was typically around 200ms to 300ms. Regarding better false positives that would be rather hard to do.

> Are you on the up and up with all those public sources?

I gathered from sources that did not have any licensing related to non-commercial use. Would you recommend I reach out to all of the sources regardless even if they say you can use it for commercial use? Or do not say anything about commercial use at all?

> No, currently IPDetective does not support ipv6 addresses. However this feature is on the road map.

Yeah I know, just focusing on IPv4 right now. I am still collecting ipv6 addresses however I have not implemented it in the service yet.

Thanks a bunch for writing the comments above.

Yeap… measure the amount of requests for ipv6 support you get from your users, not hn users :-)

>> currently IPDetective does not support ipv6 addresses. However this feature is on the road map.

> That's a major shortcoming in 2022.

In the US, it’d be nice if national ISPs (fiber, cable, and mobile) thought this.

exabrial3y ago· 3 in thread

Is there a way we can please for the love of god summit exceptions? We need a corporate vpn and we don’t need every friggen website blocking us.

Without this you were also probably inadvertently contributing to the de-democratization of email. I used to run my own email off a cloud server, but that time has passed.

AndrewCopelandOP3y ago

I have added the feature for a user to add known addresses so they do not get flagged as bots. Take a look at the API docs here: https://ipdetective.io/api

Currently the known address is scoped to the user/client who is using the service.

AndrewCopelandOP3y ago

Sounds like a great idea and an even better feature. Would that exception only work for you? Or would it also work for other users of IPDetective?

I think allowing users to register exhorting to your list as ‘known ips’ to have them stay off the list would be useful for all parties involved. If there was abuse behavior, if could go to a graylist while the matter is handled.

RockRobotRock3y ago· 3 in thread

Congrats on the launch! I personally hate anti-bot tech but I also make money from bots and scrapers so I’m not exactly unbiased.

Do you have any plans to expand the business? Captchas as a service? A cheap version of CF’s proxy maybe?

AndrewCopelandOP3y ago

I would like to expand. See what users need and pivot as needed. A cheap version of CF's proxy seems like an interesting start. I do not know if I have the expertise though.

asdadsdad3y ago

what kinds of things do you doo with bots and scrapers, if you can disclose? I'm interested in scraping too and was curious what are some applications people pursue nowadays. It looks like a lot of the earlier ideas are so crowded now, like price monitoring, etc.

RockRobotRock3y ago

A lot of it was for the endgoal of lead generation, so any place where a lot of names, phone numbers, email addresses etc were accessible. I also got into the habit of reverse engineering mobile apps, because those would typically touch endpoints which weren't monitored for rate limiting and had more interesting information. It was typical for a company to monitor its competitors.

itake3y ago· 2 in thread

1/ needs authenticationless way to search. all other ip look up tools don't require an account for me to test this. I'm not going to register.

2/ I've tried blocking vpn or proxy users (by ip hosting provider), but found too many false positives. Using a VPN is common for IT professionals or consumers trying to 'protect' their privacy and this impacted the growth of my app.

How is your tool better at detecting a bot vs a human using a vpn?

AndrewCopelandOP3y ago

>1/ needs authenticationless way to search. all other ip look up tools don't require an account for me to test this. I'm not going to register.

Okay I will look into a solution.

2/ I've tried blocking vpn or proxy users (by ip hosting provider), but found too many false positives. Using a VPN is common for IT professionals or consumers trying to 'protect' their privacy and this impacted the growth of my app.

That good to know, I do not find that I have a lot of false postivies but I would imagine it all depends on audience.

>How is your tool better at detecting a bot vs a human using a vpn?

It does not know the difference as or right now. I was thinking about adding a user-agent validation as well which could add another layer.

michaelmior3y ago

> I was thinking about adding a user-agent validation as well which could add another layer.

Presumably this service exists because bots try to avoid detection. I don't think UA validation would really help much and there are plenty of libraries already that do this.

whodev3y ago· 2 in thread

I wonder where you get your data from. I checked the dedicated IP I use against other services and they all come back as clean, but yours labelled it as a bot.

AndrewCopelandOP3y ago

Is it your home dedicated ip?

whodev3y ago

No it's from my VPN. That's what is making me wonder about the data you used. If other intel sources say it's a clean IP not used for bots, I want to know what makes your service say otherwise. Are you labeling all IPs from specific ASN's as bots?

mxuribe3y ago· 2 in thread

I didn't see my question in the FAQ, so asking here: do you have a process to contest accidental inclusion of an IP address?

AndrewCopelandOP3y ago

I do not, but it seems like a great feature request. I will look into adding something like this.

Thanks so much @AndrewCopeland!

I should have added context (or at least an anecdote) to help you during any product roadmap meetings...I used to oversee web ops for global real estate company (one of the biggest in the U.S. and the woirld)...and our consumer-facing websites would show tons and tons and tons of listings of residential homes for sale. Of course we really were just showing listings from our data store as well as other real estate companies who agreed to share listings data. As in many data sharing and data synch arenas, there are data issues. The most common scenario: "hey, you're showing a home that sold X time ago...stop showing it!" And, in all cases it was "someone else's data". But to the customer, or even other realtors, we were the bad guys. Even realtors who should have known betteer that there are always data issues that we can not fully control; at least nopt at the source...would complain to us. Now, one might assume that once the source data updates properly, the "correct data" should flow through, right? Well, not in real estate! Clearly this is a different arena. But i've learned that even if its only to remove a local cahce of data, its a good idea to give users and/or customers a mechanism to at least properly communicate to you about stale data...and if appropriate, maybe evenallow self-service for a user/customer to get rid of the "bad data". Obviously this merits putting in place protocols to avoid abuse...but i hope you get the idea. Good luck!

mousetree3y ago· 2 in thread

Good work on the pricing and free tier. Much cheaper than your competitors (we're busy testing out ipinfo.io). But would recommend you add a bit more data in your API response if you already have it or can built it out easily. For example, any geocoding, ASN name, whether its a proxy or a VPN etc.

AndrewCopelandOP3y ago

Sounds good, this has been a common request and is definitely on the road map. I am trying to create a very fast service and data querying slows it down. With that being said I am thinking about adding a query parameter like `?info=true` that would end up providing more information. Like the examples you gave above.

mousetree3y ago

I'm not so sure that low latency is the most important thing in each case. Personally, in our case, we wouldn't care - specifically we're only requesting (detailed) information per IP on a small set of specific events (i.e. login, signup) and additionally those calls to ipinfo/ipdetective would be async anyway. So perhaps there are two use cases here: 1. low latency, high volume). 2. Higher latency but lower volume or don't care about minimising latency to below 20ms.

asdadsdad3y ago· 2 in thread

I think ip blacklisting is really a graveyard of false positives.

asdadsdad3y ago

Also, how does it compare to https://focsec.com/pricing

AndrewCopelandOP3y ago

It seems more affordable, I do not know if its more accurate.

azalemeth3y ago· 1 in thread

I have to say, I really, really hate IPv4-based bot detection. If you end up on an IP that is labelled erroneously as "bad", your life becomes a gCaptcha hell for no real reason – my cable ISP does cGNAT and I have no control over it and frequently end up being blacklisted. I tend to use a variety of always-on VPNs to both avoid censorship and try to improve this – I can pick different endpoints.

VPN usage is increasingly common by consumers and in my country I've seen ads for it in places ranging from Mozilla in my browser to NordVPN on my TV. There's massive overshare of IPv4 addresses and it's really, really annoying to find out that you can't use a service because you're on a naughty list. I feel like yelling "I am a customer and want to buy something!" on occasion – the net result is I go elsewhere.

Stopping abuse and bot detection is one thing. Banning people for something they might have literally no control over is quite another.

AndrewCopelandOP3y ago

I would agree with everything you are saying. I did not start this service for the sake of stopping actual people from using certain websites. I started this service to mitigate/detect bot abuse of consumer applications.

I think it really is dependent on the consumer application. But mentioned by some other folks is to add the ability to have IP exceptions, essentially an allow list, which I am in favor of. This brings up an entirely different issue, which is identifying the user as legitimate.

bootsmann3y ago· 1 in thread

How do you plan to keep the list up to date and especially reliably accurate? (providers change, bot nets get cleaned, ASN especially in 3rd world countries are very unreliable, crawlers might scrap weird things etc.)

Disclaimer: I work for a company that operates in a similar space so don't go into too much secret sauce if you don't want to :)

AndrewCopelandOP3y ago

So I have a scraper that gathers the information from about 60 different sources using a wide array of techniques which is run everyday.

The largest thing I try to look at is to make sure my sources are not older than 1 year. Sometimes these sources break and I update them or they go offline all together. I am always looking for new hosting providers, vpn and proxies.

Also I can do further port analytics on these ips as well. A lot of the VPN services have specific ports open used by VPN services. I would say that automating every source from the start was a good way to start so I can stay up to date.

ThinkBeat3y ago· 1 in thread

It would be interesting to see a breakdown of shared / unique entries across the different competitors in this space.

I have a feeling that there would be overwhelming overlap.

That might not be bad, just low hanging fruit everyone can get.

The speed at acquiring high probability threats I guess would be a better / more valuable comparison.

Perhaps running two or more of these offerings side by side for a couple of months.

I would prefer a feed of ALL, with frequent updates, instead of querying a 3rd party every time. (well obviously you can cache responses for a period of time locally at least)

AndrewCopelandOP3y ago

How would you plan on storing the feed of IP addresses? I currently reach out to 3rd party and then aggregate all of the IP addresses once a day.

joshmn3y ago· 1 in thread

Does this handle residential or LTE proxies? Obviously harder to identify, but it's not unfeasible.

AndrewCopelandOP3y ago

Currently we collect the data across several free public proxy lists. Definitely more difficult to do wit the proxies you mentioned. I think traffic analytics and port scanning would need to be implemented to get these residential and LTE proxies.

throwaway7423y ago

I think what you are doing is counterproductive and contrary to the free and open nature of the internet. Actual bad actors can afford clean residential IPs from dubious sources, but if I want the privacy of a VPN I am blacklisted from accessing large swathes of the internet.

AndrewCopelandOP3y ago

Please feel free to leave comments if you have any questions or any feedback.

arepaw3y ago

Excelente

j / k navigate · click thread line to collapse