Is Your Website Being Cloned Without You Knowing? How I Caught Mine, Fought Back, and How You Can Protect Yours!
Listen to the Blog and Follow Along with the Transcript
The Problem Discovered
Years ago, I built a personal website—a simple CV-style page. Nothing fancy, just a place to say, “Hey, I exist!” It was hosted on free static storage because, honestly, who needs dynamic content for a glorified online business card?
But recently, I decided to give my humble site a glow-up. The plan? Start blogging about programming and software architecture. Then I remembered: years ago, I built a CRM for a friend in Laravel. It was quite the overachiever, packed with features like automated logging and analysis. Since I’m not exactly a WordPress enthusiast, I thought, “Why not use my old creation?” A few tweaks here, a sprinkle of updates there, and voilà—the new site was live!
Now, here’s where it gets interesting. This CRM, despite its age, had some serious chops. Shortly after launching, it started sending me automated alerts about suspicious visitors. Curious, I decided to check things out manually.
And surprise! Not only was my site cloned, but the entire thing was being proxied through another domain! Imagine my shock—my site, starring in a performance it didn’t audition for. After some digging, I discovered the offending domain was likely being used for SEO in… let’s just say “industries” unrelated to programming.
Annoying? Yes. World-ending? Not for me—this is just a personal site. But think about a corporate website. Something like this could tank your SEO and seriously hurt your brand.
So, what did I do? Let me walk you through how I took back control.
Investigating: What Went Wrong?
Alright, it is what it is—my website’s doppelgänger is out there. But instead of panicking, I decided to channel my inner detective. Think of this as the Tom and Jerry show: I’m Tom, and the cloner? Well, they’re Jerry—smugly stealing my cheese (or in this case, my content).
It didn’t take long to figure out their trick. Honestly, it was as simple as a toddler’s toy. They set up a basic proxy to mimic my site. No advanced wizardry here—they banked on the fact that my old setup was just a static file server with no room for fancy configurations. And, admittedly, I had rushed things when moving to a DigitalOcean droplet with default Apache settings. Whoops.
The straightforward fix? Easy. Since I’d already upgraded from a static server to a virtual machine, I could have just configured Apache to reject direct IP access and rely solely on virtual hosting. Problem solved, right?
But where’s the fun in that? Instead, I decided to gather some intel on the cloner. Time to play cat-and-mouse with their shady operation. My plan? Collect enough evidence to make a solid case, then report them to their domain provider and relevant organizations that handle spam, copyright, and fraud reports. It’s not just about fixing my site—it’s about holding these folks accountable.
So, let’s dig in. Are these organizations up to the task of dealing with the trickery I uncovered? We’re about to find out. Stay tuned!
New Features Implemented
Once I realized my site was being cloned, I decided to fight back. Not just to defend my site, but to have a bit of fun along the way. Here’s what I did:
- Gathering More Information The first step was simple: log everything. I added a feature to check the referrer headers (the same clue that led me to the cloner in the first place) and log the associated IP addresses. Any IP linked to suspicious activity was added to a ban list, and my server would refuse connections from their site.
Did it work? Sort of. My content disappeared from their clone for a short while—but the celebration was short-lived. They quickly caught on, changed their IPs, and removed the unique signature from their requests. Time for Plan B.
- Luring the Cloner If they wouldn’t give up, I’d make them slip up. I created a regular job aptly named "Cloner Lurer." Its mission? Generate a dynamic page that existed for a few seconds, load it, and log the IP of whoever accessed it. If it was the cloner, I’d ban them.
But banning wasn’t enough. Their automated systems would likely notice the lack of valid responses and adjust. So, I took a more mischievous approach.
A Warning Page Instead of returning an error or blank page, I sent them a fully valid page—except this one was special. It displayed a big, bold warning explaining the situation and urging visitors to report the clone. My message was clear: “This site is stolen, and it’s not just me—they might be cloning yours too!”
Google Shenanigans Next, I played another card. I added meta tags to the page with noindex and nofollow. This meant that Google wouldn’t index their proxy, effectively kicking them out of search results. Sweet revenge!
Did It Work? Oh, yes. Not only did I see my glorious warning message on their clone, but I also collected around 100 IP addresses from their operation. With this evidence in hand, it was time to contact their domain provider and another organization (I’ll keep that as a surprise for now).
Stay tuned—next, we find out if the cloners finally met their match!
Reporting.
Alright, time for some domain whois magic! I’ll just send a generic report to the domain provider, explaining how I think their domain is up to no good. Did I get a response? You bet—an automated one. Then… cue the waiting game. Tick, tock, tick, tock. The art of silence in full effect. I guess they’re really committed to internet security.
Next step—time to analyze the IPs crawling all over my site. And what do I find? They’re all coming from a CloudFlare-managed IP address. Well, this can only be good news, right? CloudFlare, the knights in shining armor of the internet, guarding us against DDOS attacks and hackers. I’m thinking, "Oh yeah, I’ll just send them a massive list of their IPs to sort this out."
So, what happens next? Get this. CloudFlare looks up my IP, realizes I’m a DigitalOcean customer, and promptly passes the buck: “Please handle this with your client.” And that’s it. The end.
And here I am, thinking, “Great, so I have to deal with it, huh?” The lesson? If you're asking whether you’re protected, the answer is: not unless you’re willing to become a vigilant security ninja. You’re on your own, my friend. Time to lock it down and stay alert.
Is there anyone else I don’t know about?
Alright, I’ve spotted them, and it’s time to wrap up the server configuration and block them for good.
And guess what? I’ve tracked down their location. They used a transparent proxy once—yep, you guessed it: Navi Mumbai (Reliance Corporate Park). Not exactly a shocker, right?
As I mentioned earlier, they took full advantage of the previous site being hosted on static storage, then the new one got installed with the classic default configuration. Rule number one, folks: Never trust default settings. Always stay ahead of the game.
So, I did a little server reconfiguration to make sure it only serves content for the virtual domain and doesn’t fall back to any random site hosted on the same virtual machine. Simple stuff. Here's how you do it:
Create a deny config:
<VirtualHost *:80>
ServerName default
<Location />
Require all denied
</Location>
ErrorLog ${APACHE_LOG_DIR}/catchall_error.log
CustomLog ${APACHE_LOG_DIR}/catchall_access.log combined
</VirtualHost>
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName default
<Location />
Require all denied
</Location>
ErrorLog ${APACHE_LOG_DIR}/catchall_error.log
CustomLog ${APACHE_LOG_DIR}/catchall_access.log combined
</VirtualHost>
</IfModule>
Add a symlink to it: into /etc/apache2/sites-available
Make sure in your virtual host you deny other they your domain:
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName yourdomain.com
ServerAlias www.yourdomain.com
ProxyPass / http://localhost:5000/
ProxyPassReverse / http://localhost:5000/
ErrorLog /var/log/aolb-error.log
CustomLog /var/log/aolb-access.log combined
<If "%{HTTP_HOST} != 'yourdomain.com' && %{HTTP_HOST} != 'www.yourdomain.com'">
Redirect 403 /
</If>
Include /etc/letsencrypt/options-ssl-apache.conf
SSLCertificateFile /etc/letsencrypt/live/yourdomain.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/yourdomain.com/privkey.pem
</VirtualHost>
</IfModule>
But hold on a second… what if? What if there’s someone even sneakier out there, who rewrites the Host header in the HTTP request to point to my domain? It's a piece of cake to do, and guess what? It could work again.
So, what did I do to make sure no one else is pulling this trick? Let’s take a look at how I figured it all out...
Add some Javascript code.
The solution? Oh, it was simple. Just sprinkle a little JavaScript magic, and boom—problem solved!
I generated some hashes from my domain names, both the “www” and non-“www” versions. Then I slapped together a JavaScript snippet that loads with the page, checks if the hash matches the window.location.hostname, and if it does, it hits a relative endpoint with a bit of cache-busting at the end—passing the window.location.hostname as a parameter.
On the backend, we record the IP address and, if someone’s cloning my site, we automatically block them.
Why does this work? Well, here’s the fun part: More and more sites are rendered using JavaScript these days. There are tons of JavaScript frameworks out there, and guess what? Crawlers are starting to render your site just like a browser does. They might even be using a headless Chrome browser to do it. In this case, your API gets called with the hostname passed along, and bam, you’re notified.
But hold on—if a real person stumbles upon your site, guess who gets a notification? That’s right, me!
Example code:
(function() {
const h = window.location.hostname;
let x = 0;
for (let i = 0; i < h.length; i++) {
x ^= h.charCodeAt(i);
}
if (![102, 63, 109].includes(x)) {
let cacheBuster = Math.floor(Math.random() * (1000000 - 10 + 1)) + 10;
const u = "/me/" + encodeURIComponent(h) + "?cache=" + cacheBuster;
const xhr = new XMLHttpRequest();
xhr.onload = function() {
if (xhr.status === 200) {
document.body.innerHTML = xhr.responseText;
}
};
xhr.open("GET", u, true);
xhr.send();
}
})();
Did it work? Oh, absolutely. I caught someone else trying to clone my site. And… guess what? I accidentally blocked myself. Oops.
Stick around, and I’ll show you how I did that!
Panic, I am banned by myself.
So there I was, casually monitoring my site when suddenly—bam!—I see my beautiful warning message: This site is a clone. Whoa, hold on! What happened? Let's dig into the bans. And lo and behold, there’s my own IP address, linked to a Chinese domain. Whoops. Something went horribly wrong, but what exactly?
Turns out, this wasn’t your average cloner. Oh no, this one went for the low-tech approach: they managed to set the A record to point directly to my server address. Simple, but effective. And guess what? My clone picked it up, and I blocked it.
The fix? Easy. Just remove my IP from the blocklist and add a new rule: Do not block myself. If the IP is mine, it's all good.
Other things to consider…
Here’s a little extra brain food: Transparent proxies can forward the original IP address. So, for example, if Google decides to render a clone of your page and it’s being proxied, you might end up blocking Googlebot. Not ideal, right? So make sure you’re only banning the primary IP address and not yourself in the process. Because, let’s face it, you don’t want to accidentally block Google—unless you’re feeling really rebellious.
Conclusion, Settings:
Alright, here’s the takeaway:
- Never trust default settings. Seriously, they're just begging for trouble.
- Monitor everything, both automatically and manually. You can never have too many eyes on your site.
- Take preventative measures—because, let’s face it, the hackers aren’t taking a vacation.
Also, don’t forget the basics:
Send those proper security headers like a responsible adult. Always include your canonical tag to let Google know, “Hey, this is my content. Hands off!”
Examples:
Add some headers, This is from a Laravel Middleware, you can use Apache configuration files, .htaccess as well.
private function addSecurityHeaders(Response $response): Response
{
$response->header('X-Frame-Options', 'DENY'); // Prevent clickjacking
$response->header('X-XSS-Protection', '1; mode=block'); // Prevent XSS attacks
$response->header('X-Content-Type-Options', 'nosniff');
$response->header('Access-Control-Allow-Origin', config('app.url'));
$response->header('Copyright', 'Attila Olbrich');
$response->header('Cross-Origin-Opener-Policy', 'same-origin');
$response->header('Cross-Origin-Embedder-Policy', 'require-corp');
$response->header('Content-Security-Policy', "default-src 'self'; script-src 'self' https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/; style-src 'self' https://fonts.googleapis.com; font-src 'self' https://fonts.gstatic.com; object-src 'none'; frame-ancestors 'none';");
return $response;
}
Don't forget to add your canonical tag: (Laravel example)
<link rel="canonical" href="{{ config('app.url') }}" />
So now, it’s time to get your hands dirty. Dive into your site, take a good look around, and make sure you're not leaving any virtual doors open. Trust me, your future self will thank you.