March 9, 2025
Human Friend Digital Podcast
The Internet Is Disappearing: A Look at Link Rot
This time on Human Friend Digital, Jacob and Jeff explore link rot, the slow decay of the internet as web pages vanish and URLs break over time. Websites may feel permanent, but in reality, they’re as fragile as a pamphlet—easily deleted, lost, or restructured, leaving broken links in their wake.
They discuss the impact of link rot on everything from SEO rankings to Supreme Court opinions, where nearly half of referenced links are now dead. Tools like the Wayback Machine offer a glimpse into lost pages, but most websites aren’t archived, meaning crucial information disappears forever.
To combat link rot, Jacob recommends 301 redirects for site updates, SEMrush for monitoring broken links, and periodic audits to preserve credibility. In the end, the episode is a reminder that the internet, despite its vastness, is more fragile than we think.
Links:
SEM Rush: https://www.semrush.com
Way Back Machine: http://web.archive.org/
Broken link checker. Org: https://www.brokenlinkcheck.com
SwiperJS: https://swiperjs.com
Flickity: https://flickity.metafizzy.co
View Full Transcript
[This transcript has been edited for clarity]
Jeff:
Hey, Jacob.
Jacob:
Hey, Jeff.
Jeff:
Welcome to another episode of the Human Friend Digital Podcast.
Jeff:
Welcome, Jeff. I’m happy to be here. I got my gunpowder green tea.
Jeff
I do love gunpowder tea.
Jacob:
It’s very delightful.
Jeff:
Today, we are talking about link rot because Vox, which we listen to from time to time, posted an episode last week on Today, Explained about, well, it’s about the Trump administration removing web pages. But we’re not going to talk about the Trump administration.
We’re going to talk about the disappearance of pages from the World Wide Web, right?
Jacob:
Yes, it happens all the time. It really does. Because, unlike a house, or a building, or a museum that has exhibits or a room, it’s really easy to delete a room from a website, so to speak—a page.
If someone doesn’t take a picture of it or a snapshot of it in that moment in time, there’s no way you’re going to get it back. It’s just gone.
It’s funny because we use websites all the time, but if you were looking at them objectively, I really think they fall under the category of ephemera.
Jeff:
I mean, they are very fragile things, and it reminds me—do you remember in high school when social media was becoming a thing, and all of our teachers were like, “What you put on the internet is going to be there forever?” Evidently, that’s not true.
Jacob:
No, not everything. Some things are. It depends on the service and their commitment to saving that slice of history. So we’ll talk about some of those. I saw that in the questionnaire coming up—one of my favorite ones.
But yeah, let’s dive in a little bit.
Jeff:
Link rot. Yeah, let’s just define it.
Jacob:
So, link rot is essentially when links break on the internet. You’ll be on a website, you click a link, and it says 404 – Page Not Found. It often happens between two different websites. You can have this happen on your own website, which is really bad for SEO if you’re getting 404s on your own site. But if you have a website that’s decades old with thousands of articles and posts, it’s almost inevitable that you’re going to experience link rot as things get moved around.
Then you basically have all these dead links that create, quote-unquote, a “rotting” environment in these web experiences. A lot of times, what happens is a big website, maybe a news website, will link to an external resource, like the White House, and that page is not there anymore. What happened to it? It’s broken. There’s a rotten link. This just happens over time. It’s inevitable. But that’s essentially what’s occurring.
Jeff:
Yeah, I was reading when I was doing research for this that there’s a half-life for links.
Some of the estimations—it depends on what the link is linking to—but sometimes it’s like nine years half-life: Half of them will die within nine years, and some of them are like two years.
Jacob:
I can imagine that. No, I agree with what you’re saying. An example of a really long-lasting link would be a homepage of a website. That doesn’t change as long as that URL is alive. Google.com hasn’t changed Google.com around. Or the New York Times—what is it, NYTimes.com? NYT.com? Anyway, the homepage of any of these websites doesn’t change the URL because that’s the root domain.
But all those subpages—those get moved around.
Let’s say you want to redesign your website. You’re restructuring all your pages, and you had all your services under mywebsite.com/services/this. Now you have five service offerings, so you restructure it as mywebsite.com/service-offer1, mywebsite.com/engineering, mywebsite.com/product-development.
All those subfolders get moved around when a website update happens, and this is probably what you’re referring to with the half-life. I think people redesign and update their website approximately every two to five years.
Jeff:
And so that introduces potential for those links to die. Like, if you change around your domain name or your URL structure, that can kill those links.
Jacob:
Right. From an SEO perspective, this is really bad. If you restructure your whole website and don’t set up what’s called 301 redirects, you’re going to have problems. Let’s say you had a really great link from a U.S. government website or a .edu website linking to a service page on your site, and it’s giving you a lot of, quote-unquote, “link juice.” Well, if that page becomes a 404 on your website, Google stops counting it.
Jeff:
Can you just— I mean, we did a whole episode on error codes, but what’s a 404? What’s a 301?
Jacob:
A 404 is when the page can’t be found by the content management system, the server.
It’s basically telling your web browser—Firefox, Chrome, whatever—or the Google search bot that the page doesn’t exist anymore. When it’s a 404, it says, I can’t find that. It doesn’t exist anymore.
If Google was using that page as part of its ranking, you’ve just hobbled the link that you got.
Now you can save that by using a 301 redirect. So, let’s say you had mywebsite.com/link1, but now the new page is mywebsite.com/link2.
You can use something called a 301 redirect. So instead of issuing a 404 error when someone visits the old page, it will automatically send them to the new URL. Google will then recognize the new page and preserve your link juice. That’s the best way to handle it. If you don’t, you’re actively contributing to link rot.
If you don’t have a web developer working on your team, and you’re doing this yourself, you might realize too late that you’ve killed all your old links. And sometimes, I’ve seen this happen—someone at your company will say, “Hey, I used to send my teachers to this website URL, and I don’t see these PDFs anymore.” And then it’s like, “Oh no how did we miss the PDF’s. Where did they go? How did we miss this?” And now your SEO scores are dropping.
If you came to someone like me at that point, your only hope is the Wayback Machine, the Internet Archive, which was actually talked about in the Vox episode.
Jeff:
Let’s talk about that. I mean, you and I have talked about it before, but I don’t think we’ve ever brought it up on the podcast.
Jacob:
No, it’s kind of one of those things that web people know all about because it saves our butts a lot.
But it’s also one of those weird situations where a client will say, “I didn’t back up my website,” and you’re like, “Okay, we’ll find it”.
The Wayback Machine is a tool. It’s part of a project called the Internet Archive. And the Internet Archive, or archive.org, doesn’t just archive the internet and website pages. It archives books, audio files—anything that isn’t in the public domain. It is like this digital repository of humanity. It is quite large.
One of their products, or I shouldn’t say maybe a product since they’re nonprofit, but one of their services is called the Wayback Machine. And what it does is it goes out to the internet all the time. It crawls website pages, and then it copies them to its server and timestamps them. So, you can actually go back and see what Google looked like in 1999. You can go back and see what Google looked like in 2000. You can go back and see what Google looked like in 2005.
Jeff:
Or like Amazon’s first iteration. I remember I did that on the Wayback Machine.
I was just like, “This is what Amazon looked like when it was first founded. This is wild.”
Jacob:
Yeah, not a polished experience. Sometimes, it’s actually good to look back at that moment to realize—if you make a website for somebody and it’s not perfect right out the door—nobody’s website was perfect right out the door.
Jeff:
It could turn into Amazon.
Jacob:
Yeah, Amazon was kind of a turd, and then it turned into a money-making machine, making one of the wealthiest men in the world.
But anyways, the point of the Wayback Machine is to save this stuff. Because, like I was talking about earlier—ephemera. Even though you don’t want to think of a website as ephemera, it is basically the equivalent of printing a pamphlet. And then it goes in the trash. Eventually, after it’s been used for so long, you’ll make a new one, and the old one will go in the trash. Your website is actually like that—it looks cooler, it feels more concrete, but eventually, it just gets thrown away. So, Wayback Machine saves that.
Let’s say you did what my example was, and you forgot to back up your website. Three months later, you need to get something off of it. The Wayback Machine is your last hope. If you have a website that has thousands of visitors a month or more, there’s a really good chance your website has been archived in history. Maybe not completely, but at least a couple of pages have been captured. And the more visitors you have to your website on a regular basis, the larger the number of times your site will be crawled and saved by the Wayback Machine.
But it is very cool. It helps journalists a lot, too. Like in the Vox story—they can say, “Here’s what the White House website looked like under the Biden administration in month one”.
Jeff:
And here’s how it’s changed since then, and so there’s that record.
Jacob:
Yeah, it’s not a perfect machine. If somebody uses a lot of external resources in their web development—
Jeff:
What do you mean by that?
Jacob:
Oh, so let’s say you have a slider on your website, like a little carousel, right? And you want to use a plugin, maybe you want to use SwiperJS or Flickity. These are some really popular options out there. There are two options: You can copy their—you can buy the license and copy the code to your website.
Or, you can buy a license or use something free that is hosted on a CDN file (Content Delivery Network). The nice thing about using that external URL is that if the developer updates their code, they can post it there, and then anybody using that code will automatically have the newest version. So it’s like a way to have an external backup.
But the Wayback Machine struggles with this. It really can only store files that are hosted directly on a website. So, sometimes, more modern websites that use a lot of external resources might look a little broken in the Wayback Machine.
Because it will save everything that’s there and linked to in that website, but if it pulls resources from third parties, they’re not going to store that external content. So, if you go back and you’re like, “Why is this site broken?” a lot of times, it’s because that site used third-party resources on the live website.
Jeff:
And if those third-party resources go out of business or fail, you would lose anything that was running on them.
Jacob:
Yeah. Basically, that JavaScript would then be broken on that site.
So, the point with link rot is that if you want to find out what was on a page of a broken link in time, you can take that broken URL, that 404, and put it into the Wayback Machine to see what was there at one time. If you’re trying to clean up your own mess—or someone else’s mess—the Wayback Machine is the hope. But like I said, if you have a small website that doesn’t have a whole lot of visitors, the number of snapshots will be smaller.
Jeff:
It’s not a hundred percent of the internet. I mean, they do their best. They’re archivists.
Jacob:
It’s impressive, though.
Jeff:
They’re trying to be the Library of Alexandria, but they can’t get everything.
Alright, so Jacob, what tools or strategies should a company use to try to mitigate or prevent link rotting from happening on their end?
Jacob:
So, it’s really hard to know when external resources are coming to you and breaking on your site. However, there are tools that allow you to detect 404 errors and keep a log of 404 errors happening on your website. Some website hosts do this. There’s a WordPress plugin called 301 Redirect, and as part of their premium version, you can get a log that tells you about broken links.
For your own website, I recommend hooking up a tool like SEMrush. They have a Site Audit tool that will crawl your website weekly or monthly and give you a report on broken links. It will give you a list—a to-do list—and then you can fix them by entering 301 redirects. That’s the best way to prevent it for yourself—monitor your 404s coming in, and monitoring that, and crawl your site regularly for 301s.
Now, SEMrush is expensive.There is a free option for just checking broken links on your site called BrokenLinkChecker.[com]. If you have a big website, you might have to pay for it, but for most websites under 100 pages—maybe even 200—which is a lot of websites on the internet.
Jeff:
Yeah, that’s a big site.
Jacob:
Yeah. If you use BrokenLinkChecker.[com], it won’t give you a lot of details, but it will tell you where a broken link was detected. It will show which page it’s on and the status, so you have to do some detective work to hunt it down. And it’s not automated—you have to manually run a broken link check yourself on your website, whereas SEMrush, which you can automate, and it gives you a report.
But those are the two ways. So, the next question I see here, which leads directly into this, is like, “why am I going to do all this effort? How does this affect my SEO?”
So, authority on the internet is dictated, in the Google search result, by the quality of your content and your link profile. There are other factors as well, but those are the two biggest ones. If you have really good content and you spent the effort to write really good stuff, you do want to rank well for it. Links give Google the guideposts to be like, “Oh, if they have a really big, huge, great link profile, these people are really trustworthy.”
Jeff:
Like if people are linking to you to cite your expertise on whatever subject, that ranks you higher in Google?
Jacob:
Exactly. And so, if your site has been rotting away, so to speak, and you’re leaving all these external links coming to your site to 404 out—well, Google’s not going to give you the juice from those anymore. They’re not going to add that to your quota. So, having a good plan here and keeping your link profile tidy is going to keep your SEO health up.
This is not a “if you do this, you will rank better.” Maybe if you’re currently going through a big change in your website and your rankings drop, this could help restore it. But this is more about maintaining it. It’s kind of like an oil change equivalent. It’s not going to make your car perform a hell of a lot better, but if you don’t do it, it could make your car perform a hell of a lot worse.
Jeff:
I need to get an oil change.
Jacob:
You know, I feel like since COVID, the amount of times I get oil changes is less.
Jeff:
Well, I just don’t drive that much anymore.
Jacob:
Right.
Jeff:
So yeah, that’s why it matters—just maintaining your position within Google. You need to make sure that your links are not all broken, or like dying.
Jacob:
Yeah, and just from another perspective outside of Google—it’s nicer for humans if they don’t run into 404s because it is confusing. It creates a click-back. It puts a bad taste in the experience.
Jeff:
Yeah, and when I was doing research for this, I saw that in a 2013 study—which is like a decade old at this point because we’re old—it found that 49% of links on Supreme Court opinions are dead. So it’s like, you know, research articles or any of those things where it’s linking to something—if you don’t have that source, can you even prove the arguments that you’re making? Having these things broken can impact a lot of different things. Maybe your website is not as important as the Supreme Court’s, but this can impact a lot of different things.
Jacob:
Yeah, essentially. Basically, I totally agree with you on that one. And I’m sure what will happen is certain resources online will become a lot more valuable—like the paid services where you have to pay to get links. It’s really the public-facing information that gets hurt here. And it is essentially like… If you imagine the Supreme Court operating its opinions for the public on pamphlets only. And you’re like, “Wait a second. That’s all going to end up in the trash. Shouldn’t you back that up somewhere?” No. It’s because people get very confused about the internet. I would say that, by and large, website literacy is the hardest day-to-day thing that everybody interacts with, but most people don’t know how it actually works.
And so they just take a lot of things for granted. Or they take them as really substantially built, like a building. And it really doesn’t take much to break it. I mean, you can destroy a whole website—just put one line of code into the header of a website and break the entire thing.
Jeff:
Or like a server goes down, and yeah—see you later.
Jacob:
Yeah, or what if the guy that was running the website died, unfortunately, and he didn’t have a backup of stuff going on his credit cards and his things? No one could get into it for a while, and the domain expired, the hosting expired, and the website disappears, and all the backups disappear. Those things can really happen. And they just don’t happen with cars or houses. We have a relic left over—
Jeff:
Way more fragile than people realize.
Jacob:
Yeah. One last thing—if you don’t want to deal with website backups all the time and you do want important resources archived, the Wayback Machine for free will allow you to submit your URL.
If you go to the Wayback Machine, like web.archive.org, there is, in the bottom right, a Save Page Now button. And you can put your site in there.
Now, I don’t think that guarantees complete backup of everything, because it depends on the resources of this nonprofit service. But there you go. That is something you could start doing today. You can save key pages over time and use that third party as a service. But even the Wayback Machine could eventually—
Jeff:
Yeah, it could go down. The Library of Alexandria of the modern age.
Jacob:
Just—poof.
Alright, Jeff.
Jeff:
Alright. Any final thoughts?
Jacob:
No, just back stuff up and get with somebody that can help you analyze your website’s 404 errors and broken links. If you’re a business owner and you have a website with more than, let’s say, 50 pages, monitoring that alone is a big burden on your brain. So, I would just recommend getting with an expert if you can. If you have a tiny website with five pages, this conversation is not for you. You don’t have to worry.
Jeff:
Stop listening, even though we’re at the end.
Okay, very cool. So, this was, what, episode four of season two?
Jacob:
Yep.
Jeff:
Well, we will see you guys in two weeks, probably.
Jacob:
Yeah.
Oh, also, if anybody wants to be a guest—anybody that is listening right now can be a guest.
I don’t care what you do. If you flip burgers, I will interview you. I don’t care. We just need a guest on here to break it up.
Jeff:
We like having guests on the show, it’s fun.
Jacob:
Just reach out to us on our website, and you can be a guest. The threshold is zero.
Jeff:
Very low.
Alright. See ya.
Jacob:
See ya, guys.
Almost never miss an episode!
Well, we're only human.
Subscribe to receive emails in your inbox when every new episode drops ... or when we want to send you obnoxious emails to sell you stuff you don't really need.
Just kidding, we respect the privilege of being in your inbox.
Email Subscribe
"*" indicates required fields