Detecting Algorithmically Generated Domain Names (opens in new tab)

(nbviewer.ipython.org)

35 pointswooster11y ago6 comments

6 comments

6 comments · 2 top-level

llasram11y ago· 4 in thread

If the goal is to actually detect DGA activity vs just playing around with data, check out https://www.usenix.org/conference/usenixsecurity12/technical... for a much more effective approach.

The key difference is the use of sets of (NXDomain) domains vs single domains. With a few additonal features, the boost in signal is sufficient to allow classification as individual DGAs with essentially no false positives.

meowface11y ago

The project discussed in this paper requires having access to all of the passive DNS data in an entire ISP's network, which isn't that practical for many researchers.

OP's machine learning is arguably even more impressive, because it has a decent success rate based entirely on open source data and the domain names themselves, with no other corroborating information (like a NXDOMAIN response).

llasram11y ago

You only need large quantities of pDNS data for the discovery portion. For classification all you need are collections of domains produced by the same algorithm ( which are readily available for the widely-known DGAs.) The domains being NX isn't so much corroboration as fundament -- the NX-producing search over multiple domains is the observable behavioral distinction between AGD vs static C&C discovery.

(I've worked with both Antonakakis and Yadav, and implemented the production version of Damballa's AGD classifier as per Antonakakis).

a-dub11y ago

I've always thought that this would be the way to do it. Run a smart DNS proxy/server that does pattern matching on the DNS query/response traffic using both traffic profile and query contents as features, then when you see something that matches for a given time window, stop service for that client and complain.

a-dub11y ago

That said, cool notebook!

pestaa11y ago

    # I'm SURE there's a better way to store all the counts but not sure...

I'm seeing the progress with which this comment came to life.

j / k navigate · click thread line to collapse

6 comments

6 comments · 2 top-level

llasram11y ago· 4 in thread

If the goal is to actually detect DGA activity vs just playing around with data, check out https://www.usenix.org/conference/usenixsecurity12/technical... for a much more effective approach.

meowface11y ago

The project discussed in this paper requires having access to all of the passive DNS data in an entire ISP's network, which isn't that practical for many researchers.

llasram11y ago

(I've worked with both Antonakakis and Yadav, and implemented the production version of Damballa's AGD classifier as per Antonakakis).

a-dub11y ago

That said, cool notebook!

pestaa11y ago

    # I'm SURE there's a better way to store all the counts but not sure...

I'm seeing the progress with which this comment came to life.

j / k navigate · click thread line to collapse