Ask HN: How did the internet discover my subdomain? by govideo

ByHackTech March 7, 2025

33Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However…I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It’s a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are:
“Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com”,
“Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8”,
“Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36”,

The bots are GET requests which are failing, as designed, but I’m wondering how the bots even knew the subdomain existed?!

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (33)

33 Comments

Post Author

codingdave

Posted March 6, 2025 at 10:39 pm

If it is on DNS, it is discoverable. Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.

0Likes Log in to Reply
Post Author

Kikawala

Posted March 6, 2025 at 10:39 pm

Is it available under HTTPS? Then it's probably in a Certificate Transparency log.

0Likes Log in to Reply
Post Author

daggersandscars

Posted March 6, 2025 at 10:48 pm

DNS query type AXFR allows for subdomain querying. There are security restrictions around who can do it on what DNS servers. Given the number of places online one can run a subdomain query, I suspect it's mostly a matter of paying the right fees to the right DNS provider.

0Likes Log in to Reply
Post Author

artursapek

Posted March 7, 2025 at 12:00 am

presumably it has a DNS record

0Likes Log in to Reply
Post Author

vince14

Posted March 7, 2025 at 12:06 am

I'm having the same issue.

https://securitytrails.com/ also had my "secret" staging subdomain.

I made a catch-all certificate, so the subdomain didn't show up in CT logs.

It's still a secret to me how my subdomain ended up in their database.

0Likes Log in to Reply
Post Author

parliament32

Posted March 7, 2025 at 12:15 am

Certificate Transparency logs, or they don't actually know the domain name: just port-scanning[1] then making requests to open web ports.

[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan

0Likes Log in to Reply
Post Author

fsckboy

Posted March 7, 2025 at 12:26 am

LPT, this is an object lesson in the weakness of security through obscurity

0Likes Log in to Reply
Post Author

OuterVale

Posted March 7, 2025 at 12:27 am

https://www.merklemap.com pops to mind.

0Likes Log in to Reply
Post Author

8bitchemistry

Posted March 7, 2025 at 12:29 am

Did you ever email the URL to somebody?
We had the same issue years ago where google seemed to be crawling/indexing new subdomains it finds in emails.

0Likes Log in to Reply
Post Author

andix

Posted March 7, 2025 at 12:35 am

I'm surprised nobody mentioned subfinder yet: https://github.com/projectdiscovery/subfinder

Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.

0Likes Log in to Reply
Post Author

spl757

Posted March 7, 2025 at 12:41 am

Does the IP address for that subdomain have a DNS PTR record set? If it does, someone can discover the subdomain by querying the PTR record for the IP.

0Likes Log in to Reply
Post Author

andix

Posted March 7, 2025 at 12:44 am

If a HTTPS service should be hard to discover, an easy way is to hide it behind a subdirectory. Something like https://subdomain.domain.example/hard_to_find_secret_string.

Another option are wildcard certificates.

This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.

0Likes Log in to Reply
Post Author

LinuxBender

Posted March 7, 2025 at 1:33 am

As others have said, likely cert transparency logs. Use a wildcard cert to avoid this. They are free using LetsEncrypt and possibly a couple other ACME providers. I have loads of wildcard certs. Bots will try guessing names but like you I do not use easily guessable names and the bots never find them. I log all DNS answers. I assume cloudflare supports strict-SNI but no idea if they have their own automation around wildcard certs. Sometimes I renew wildcard certs I am not even using just to give the bots something to do.

0Likes Log in to Reply
Post Author

pabs3

Posted March 7, 2025 at 2:35 am

ArchiveTeam has some docs about this:

https://wiki.archiveteam.org/index.php/Finding_subdomains

0Likes Log in to Reply
Post Author

ciaovietnam

Posted March 7, 2025 at 3:08 am

There is a chance that your subdomain is the first/default virtual host in your web server setup (or the subdomain's access log is the default log file) so any requests to the server's IP address get logged to this virtual host. That means they didn't access your subdomain, they accessed via your server IP address but got logged in your subdomain's access log.

0Likes Log in to Reply
Post Author

alberth

Posted March 7, 2025 at 3:29 am

This site will find any subdomain, for any domain, so long as it previously had a certificate (ssl/tls)

https://crt.sh/

0Likes Log in to Reply
Post Author

thedougd

Posted March 7, 2025 at 5:23 am

Some CAs (Amazon) allow not publishing to the Certificate Transparency Log. But if you do this, browsers will block the connection by default. Chromium browsers have a policy option to skip this check for selected URLs. See: CertificateTransparencyEnforcementDisabledForURLs.

Some may find this more desirable than wildcard certificates and their drawbacks.

0Likes Log in to Reply
Post Author

rempargo

Posted March 7, 2025 at 5:31 am

I assume you host this with a https certificate, so you can look your subdomains at:

https://crt.sh/?q=sampledomain.com

0Likes Log in to Reply
Post Author

melson

Posted March 7, 2025 at 5:41 am

Someone might used open-source tool like sublist3r

0Likes Log in to Reply
Post Author

paxys

Posted March 7, 2025 at 5:55 am

Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.

0Likes Log in to Reply
Post Author

DeborahMatthews

Posted March 7, 2025 at 7:00 am

[dead]

0Likes Log in to Reply
Post Author

arkfil

Posted March 7, 2025 at 7:21 am

paloAlto (network devices like firewalls etc) is able to scan the sites that users want to visit behind their devices. these are very popular devices in many companies. users can also have agents installed on their computers that also have access to the sites they visit.

0Likes Log in to Reply
Post Author

govideo

Posted March 7, 2025 at 7:51 am

Thanks for everyone's perspectives. Very educational and admittedly lots outside the boundaries of my current knowledge. I have thus far relied on CloudFlare's automatic https and simple instant subdomain setup for their worker microservice I'm using.

There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.

0Likes Log in to Reply
Post Author

bashwizard

Posted March 7, 2025 at 8:00 am

Like people have said already; Certificate Transparency logs.

There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.

0Likes Log in to Reply
Post Author

3oil3

Posted March 7, 2025 at 9:03 am

What happens if you google your subdomain?
Maybe the bots have some sort of dictionary files and they just run them, and when there is a match, then they append it with some .html extension, or maybe they prepend it to the match as a subdomain of it?

0Likes Log in to Reply
Post Author

f4c39012

Posted March 7, 2025 at 9:26 am

CSP headers can leak urls, but I assume that isn't the cause here if the subdomain is an entirely separate project

0Likes Log in to Reply
Post Author

ThePowerOfFuet

Posted March 7, 2025 at 10:23 am

Others are saying CT logs but my own subdomains are on wildcard certificates, in which case I suspect they are discovered by DPI analysis of DNS traffic and resold, such as by Team Cymru.

0Likes Log in to Reply
Post Author

BLKNSLVR

Posted March 7, 2025 at 10:26 am

There are a number of companies, not just Palo Alto Networks, that perform various different scales of scans of the entire IPv4 space, some of them perform these scans multiple times per day.

I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.

There are also services that track Newly Registered Domains (NRDs).

Tangentially:

NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.

My little, very amateur, project to block them can be found here:
https://github.com/UninvitedActivity/UninvitedActivity

Edited to add:
Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months – crikey, I've been busy longer than I thought):
https://github.com/UninvitedActivity/UninvitedActivity/blob/…

0Likes Log in to Reply
Post Author

lockhead

Posted March 7, 2025 at 10:28 am

Most likely passive DNS data, if you use your subdomain you do DNS queries for it. If you use a DNS server to resolve your domains that shares this data, it can be picked up by others.

0Likes Log in to Reply
Post Author

nusl

Posted March 7, 2025 at 10:40 am

It's pretty common to bruteforce subdomains of a domain you might be interested in, specially by attackers.

0Likes Log in to Reply
Post Author

xg15

Posted March 7, 2025 at 10:56 am

TIL (from this thread) : You can abuse TLS handshakes to effectively reverse-DNS an IP address without ever talking to a DNS server! Is this built into dig yet? :)

(Alright, some IP addresses, not all of them)

I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory – otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.

0Likes Log in to Reply
Post Author

_trampeltier

Posted March 7, 2025 at 11:01 am

Did you send a link over Email, Whatsapp or something like?

0Likes Log in to Reply
Post Author

ralferoo

Posted March 7, 2025 at 11:32 am

If you're using HTTPS, then you're probably using letsencrypt and so your subdomain will appear on the CT logs that are publicly accessible.

One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar – in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting – same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.

Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.

0Likes Log in to Reply

Ask HN: How did the internet discover my subdomain? by govideo

Ask HN: How did the internet discover my subdomain? by govideo

Share This Article

Newsletter

33 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter