Zone Transfers on The Alexa Top 1 Million Part 2

In part 1 of this blog post I conducted a DNS Zone Transfer (axfr) against the top 2000 sites of the Alexa Top 1 Million. I did this to create a better subdomain brute forcing word list. At the time, conducting the Zone Transfer against the top 2000 sites took about 12 hours, this was using a single threaded bash script. I was pretty proud of this achievement at the time and thought that doing the same for the whole top 1 million sites was beyond the time and resources that I had.

After creating a multithreaded and parallelised PoC in Ruby to do the Zone Transfers, it took about 5 minutes to conduct the Zone Transfers against the top 2000 compared to the 12 hours it took me to do the top 2000 using a single thread. I decided it was possible to do a Zone Transfer against the whole top 1 million sites.

There were 60,472 successful Zone Transfers (%6) out of the Alexa Top 1 Million, this equates to 566MB of raw data on disk. This amount of data brings its own challenges when attempting to manipulate it.

The top 10 subdomains in the Alexa Top 1 Million are:

Instances, Subdomain
54520 www
41581 mail
39873 ftp
38590 localhost
22771 webmail
17643 smtp
17410 webdisk
15439 pop
15155 cpanel
14904 whm

There are some big differences in this top 10 compared to the top 10 against the top 2000. In this one the www subdomain takes the number one spot. The m subdomain is not in this list. The cpanel subdomain is in this list but didn't feature in the top 2000 list.

The data

In the lists below any subdomain with a '*' or a '@' character were removed, as well as any subdomain that was seen only once.

Subdomains including instances: subdomains-top1mil-with-rank.txt (42MB)
Subdomains not including instances: subdomains-top1mil.txt (1.1MB)
Subdomains top 20,000: subdomains-top1mil-20000.txt (146KB)
Subdomains top 5000: subdomains-top1mil-5000.txt (33KB)

On a daily basis I would probably use the subdomains-top1mil-5000.txt list over the others due to the time it takes to complete the subdomain brute force. You may want to use the subdomains-top1mil-20000.txt list to be more thorough, and for the best results but the more time investment I'd use the subdomains-top1mil.txt list.