I have a domain that is not live. As expected, loading the domain returns: Error 1016.
However…I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com
This subdomain IS LIVE but has NOT been publicized/posted anywhere. It’s a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.
I am using CloudFlare for my DNS.
How did the internet find my subdomain? Some sample user agents are:
“Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com”,
“Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8”,
“Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36”,
The bots are GET requests which are failing, as designed, but I’m wondering how the bots even knew the subdomain existed?!
33 Comments
codingdave
If it is on DNS, it is discoverable. Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.
Kikawala
Is it available under HTTPS? Then it's probably in a Certificate Transparency log.
daggersandscars
DNS query type AXFR allows for subdomain querying. There are security restrictions around who can do it on what DNS servers. Given the number of places online one can run a subdomain query, I suspect it's mostly a matter of paying the right fees to the right DNS provider.
artursapek
presumably it has a DNS record
vince14
I'm having the same issue.
https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
parliament32
Certificate Transparency logs, or they don't actually know the domain name: just port-scanning[1] then making requests to open web ports.
[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan
fsckboy
LPT, this is an object lesson in the weakness of security through obscurity
OuterVale
https://www.merklemap.com pops to mind.
8bitchemistry
Did you ever email the URL to somebody?
We had the same issue years ago where google seemed to be crawling/indexing new subdomains it finds in emails.
andix
I'm surprised nobody mentioned subfinder yet: https://github.com/projectdiscovery/subfinder
Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.
spl757
Does the IP address for that subdomain have a DNS PTR record set? If it does, someone can discover the subdomain by querying the PTR record for the IP.
andix
If a HTTPS service should be hard to discover, an easy way is to hide it behind a subdirectory. Something like https://subdomain.domain.example/hard_to_find_secret_string.
Another option are wildcard certificates.
This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.
LinuxBender
As others have said, likely cert transparency logs. Use a wildcard cert to avoid this. They are free using LetsEncrypt and possibly a couple other ACME providers. I have loads of wildcard certs. Bots will try guessing names but like you I do not use easily guessable names and the bots never find them. I log all DNS answers. I assume cloudflare supports strict-SNI but no idea if they have their own automation around wildcard certs. Sometimes I renew wildcard certs I am not even using just to give the bots something to do.
pabs3
ArchiveTeam has some docs about this:
https://wiki.archiveteam.org/index.php/Finding_subdomains
ciaovietnam
There is a chance that your subdomain is the first/default virtual host in your web server setup (or the subdomain's access log is the default log file) so any requests to the server's IP address get logged to this virtual host. That means they didn't access your subdomain, they accessed via your server IP address but got logged in your subdomain's access log.
alberth
This site will find any subdomain, for any domain, so long as it previously had a certificate (ssl/tls)
https://crt.sh/
thedougd
Some CAs (Amazon) allow not publishing to the Certificate Transparency Log. But if you do this, browsers will block the connection by default. Chromium browsers have a policy option to skip this check for selected URLs. See: CertificateTransparencyEnforcementDisabledForURLs.
Some may find this more desirable than wildcard certificates and their drawbacks.
rempargo
I assume you host this with a https certificate, so you can look your subdomains at:
https://crt.sh/?q=sampledomain.com
melson
Someone might used open-source tool like sublist3r
paxys
Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.
DeborahMatthews
[dead]
arkfil
paloAlto (network devices like firewalls etc) is able to scan the sites that users want to visit behind their devices. these are very popular devices in many companies. users can also have agents installed on their computers that also have access to the sites they visit.
govideo
Thanks for everyone's perspectives. Very educational and admittedly lots outside the boundaries of my current knowledge. I have thus far relied on CloudFlare's automatic https and simple instant subdomain setup for their worker microservice I'm using.
There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.
bashwizard
Like people have said already; Certificate Transparency logs.
There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.
3oil3
What happens if you google your subdomain?
Maybe the bots have some sort of dictionary files and they just run them, and when there is a match, then they append it with some .html extension, or maybe they prepend it to the match as a subdomain of it?
f4c39012
CSP headers can leak urls, but I assume that isn't the cause here if the subdomain is an entirely separate project
ThePowerOfFuet
Others are saying CT logs but my own subdomains are on wildcard certificates, in which case I suspect they are discovered by DPI analysis of DNS traffic and resold, such as by Team Cymru.
BLKNSLVR
There are a number of companies, not just Palo Alto Networks, that perform various different scales of scans of the entire IPv4 space, some of them perform these scans multiple times per day.
I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.
There are also services that track Newly Registered Domains (NRDs).
Tangentially:
NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.
My little, very amateur, project to block them can be found here:
https://github.com/UninvitedActivity/UninvitedActivity
Edited to add:
Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months – crikey, I've been busy longer than I thought):
https://github.com/UninvitedActivity/UninvitedActivity/blob/…
lockhead
Most likely passive DNS data, if you use your subdomain you do DNS queries for it. If you use a DNS server to resolve your domains that shares this data, it can be picked up by others.
nusl
It's pretty common to bruteforce subdomains of a domain you might be interested in, specially by attackers.
xg15
TIL (from this thread) : You can abuse TLS handshakes to effectively reverse-DNS an IP address without ever talking to a DNS server! Is this built into dig yet? :)
(Alright, some IP addresses, not all of them)
I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory – otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.
_trampeltier
Did you send a link over Email, Whatsapp or something like?
ralferoo
If you're using HTTPS, then you're probably using letsencrypt and so your subdomain will appear on the CT logs that are publicly accessible.
One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar – in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting – same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.
Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.