September 13, 2024
Human Friend Digital Podcast

Getting Found Online: Search Engine Indexing

Getting found in search engines.
in this episode

In this episode of the Human Friend Digital Podcast, Jacob and Jeffrey break down one of the most essential aspects of SEO: Google indexing. They discuss how Google’s “spider bot” crawls the web, visiting sites to gather information, and the importance of being indexed—because if Google can’t find your site, it won’t show up in search results.

Jacob explains the key factors that impact whether or not a site gets indexed. Things like “no index” tags, technical issues, or even slow Google response times can affect how quickly (or if) your site gets added to Google’s vast database. Tools like Google Search Console can help monitor this, but it’s a slow and often frustrating process—likened to the DMV of the internet.

They also touch on best practices like using proper meta tags, URL structures, and schema markup to ensure that your site is not only indexed but indexed correctly. Jacob and Jeffrey make it clear: if your site isn’t indexed, it’s as good as invisible online.

Google Search Console:

https://search.google.com/search-console/about

Yoast:

https://yoast.com

IndexNow:
https://indexnow.org/

Schema:

https://schema.org

Schema JSON Creation Tool:

https://technicalseo.com/tools/schema-markup-generator

Test Your Schema Before Going Live:

https://developers.google.com/search/docs/appearance/structured-data

View Transcript

[this transcript has been edited for clarity]

Jacob:

Welcome to the Human Friend Digital Podcast. I’m your co-host, Jacob Meyer.

Jeffrey:

I’m your other co-host, Jeffrey Caruso, and today we’re talking about how Google indexes and crawls the information on your website.

Jacob:

So, you start it off—hit me with a question.

Jeffrey:

So, just briefly explain what crawling and indexing mean. I know that we’ve actually talked about it on one of our episodes about SEO timelines, but if we want to just, like, recap it here…

Jacob:

Yeah, no, it’s something that’s going to come up a lot because a functional part of getting your website online is crawling and indexing. Google does this, Bing does this, DuckDuckGo does this, and every search engine I know of uses the same method.

Essentially, they have a bot. I don’t know if they still call it a spider bot, but back in the day, they used to call it a spider bot because it crawls the web from site to site to find all the websites that want to be crawled. Then it will read them, put them into this processing system, and if it meets some basic criteria—like, does the page allow being indexed, and does it have content—Google will index it.

When I say “index,” I mean that you’re getting put in the Google database so your website can show up when someone searches a keyword.

Jeffrey:

If you’re not indexed, will it not show up?

Jacob:

Correct. You can actually set things on your site. Most websites allow you to set your webpage to “no index” or “index” by default. If you don’t say anything, you’ll be indexed, but you can tell Google, “I don’t want this page included in the search index,” or even your whole website.

Jeffrey:

Right, so that might be if you’re still working on the website?

Jacob:

Exactly. Staging sites are always marked as “no index” just in case Google finds them. Google doesn’t usually index staging sites because they have weird URLs and subdomains… 

Jeffrey:

If, for whatever reason, you didn’t want one of your pages to show up on Google, you could tell it not to.

Jacob:

And if you were wanting to, like, do illicit drug deals or show up on the dark web, this is how you ensure your anonymity

That’s basically step one. After you make your website and it’s launched—so to speak—and people can access it at the live URL, you need to go to Google or Bing and say, “I exist.”

Jeffrey:

Right. So, you can put a tag on your website to say “no indexing,” right? What are some other reasons why it might not index, other than you telling it not to?

Jacob:

There’s a place called Google Search Console, and Bing has an equivalent. When it crawls your website, if it’s having trouble indexing a page, it will give you a list of reasons why. Some reasons might be that Google is slow, and it will say, “discovered, but not currently indexed.”

Jeffrey:

Right, so it’s in the queue?

Jacob:

Yes, and how long it takes to get through that queue has only gotten longer over the years. Then, there are basic issues, like Google trying to crawl your site but encountering an error, like a 301 redirect or a 404 error, which means the page didn’t exist when it tried to crawl it. Some issues might be server-related, or maybe you’ve accidentally marked “no index.” All of these things can prevent your page from being indexed.

Jeffrey:

So, what would you do if it’s a page you want indexed? What would you do to be like “Hey, Google! Come back, I fixed it?” 

Jacob:

Right, and this is a big problem for people, and they often don’t know it. Most clients get their website online and never check the Google index again. It’s up to SEOs like me to check regularly. You can go in manually, page by page, or submit your whole sitemap and hope Google crawls it again.

Some tools, like Yoast on WordPress, or IndexNow has some plugins that will auto-ping Google to say, “Put me back in the queue to crawl.” Basically what you’re trying to do is after you’ve made a fix, you’re trying to get yourself back in the queue to be crawled again.

Overall, what’s interesting about this process is that it’s so technical, it’s almost like the DMV of the internet is Google Search Console. You don’t get a lot of responsive information back. It’s like filling out  government paperwork. You just have to stand in line forever to get serviced. And then if you screw up, you go back in line and then you have to wait in line again to get there. So it is very much like the closest thing to the DMV.

Jeffrey:

Yeah that actually tracks so hard, because we were reindexing for someone, and i remember going through there that there’s a quota system: you only get so many requests per day and then I’d have to come back the next day only to find out “Oh, you did this one wrong, this one wrong…” and I’d have to go back and do it again. It’s very much like the DMV, that is very apt.

Jacob:

This is the experience, so everything is going to feel like that.

Jeffrey:

And so, when I was reading up on topics for this episode, I came across some of the how-tos that Google had put out. They mentioned that, in order to make sure you get indexed properly, you need to consider things like your meta tags and URL structure—and keywords, I guess. Can you explain what those are and why they matter?

Jacob:

Yeah, so Google is… well, they say all these things, and I want the listener to realize that Google has built its entire system around indexing a ton of things. Most people on the internet do all of this poorly and still get indexed. What I’m about to say is best practices to improve your rankings so you can get indexed better and faster, but you can still get indexed even without doing these things, as long as your site allows indexing. Most people on the internet don’t really know how to do all this, if that makes sense—they’re just normal people. It’s kind of like the DMV; most people have to figure it out when they get there.

Jeffrey:

I don’t know what I need when I go in.

Jacob:

Exactly. So, all of these things are important to include. Page titles are what show up in Google search results. When you type something into Google, that’s the link text that appears. If you have a good keyword match—like if someone Googles, um, I don’t know, what do you want to Google today? Mechanical engineering?

Jeffrey:

Sure.

Jacob:

So, let’s say you Google “mechanical engineering.” If your website has “mechanical engineering” in the page title and you want to show up for that term, you’re going to rank a little better because it’s a match.

Jeffrey:

And it’ll help Google index you? Like, help it find you better? Or does it not really matter?

Jacob:

In a certain way, it’s helping understand what the page should be indexed under. If you think of an index like a textbook, when you open the back, you have all these topics listed there.


Jeffrey:

So it’s like literally indexing them under topics on the backend. You didn’t really explain it that way at the beginning, but that actually makes more sense.

Jacob:

Well, I mean, it’s more broad. Like you are getting tossed into this giant pool of like billions of websites.

Jeffrey:

That’s what I thought it was, just like a giant, yeah, pool of websites. But you’re saying that they do categorize them.

Jacob:

When you get into keywords, it will try to categorize them. You can see this pretty easily if you type in something like “best pizza place near me.” That’s a food and location category, so it’s indexed that way, and then you’ll see the results page with a map.

Jeffrey:

And it’s not going to show you pizza places in Burbank, California, because that’s not near me.

Jacob:

Exactly. It’s kind of like the index section in the back of a textbook, right? So if you want to show up for “Mechanical Engineering,” you need to have that keyword in your page title, in your meta description—which is a short description of what the page is—and you can also put it in your URL structure, like somewebsite.com/mechanical-engineering.

Jeffrey:

And so that’s where URL structure would be important? Not have it be something like suchandsuch.com with a bunch of garbled letters that takes you to a section about mechanical engineering.

Jacob:

Right, right, exactly. You don’t want that. Back in the day, some databases were like that, where it would just be a series of numbers, like “page ID equals 1256.” And that doesn’t help Google, because Google’s trying to learn, “What are you teaching with this page? What are you offering?” So when it crawls your site and understands it, and when it’s ready to pull it up in the index and someone Googles “mechanical engineer” or “pizza place near me,” it can match them with the best result. And that’s all part of the indexing step.

One other thing I should mention, which isn’t part of this topic but is important, is schema. Schema is a markup language, and it’s a very technical way to write something. It’s all about the syntax—like, I’ve got to put my colon here, my quotes there—and what you do is list out all these variables, like “I am a local business,” “My business name is officially this,” “My official business logo is this,” and “My official business number is this.”

I like to use JSON, which is like a JavaScript thing. You should really get an expert for this step if you haven’t done it before—it’s kind of tedious. But there are also tools, and I can link to them on the blog post on our website. I’ll link to a little thing about it, and what schema does is ensure that when you get indexed, Google has something like government-style paperwork for that page, or for your website in general, to make sure it doesn’t get confused.

Essentially, schema makes sure Google doesn’t mess it up because the robot is deaf, dumb, and blind. It just reads code. It cannot make stylistic changes or adjustments to its indexing, whether your website is very pretty or not. There are other tools that Google uses to analyze your website, like visuals to some degree, if they’re usable, but we’re not getting into that here. Schema is there to ensure that when Google visits your homepage—let’s say it’s a beautiful homepage filled with images about your business but doesn’t say much—Google sees it as doing nothing because it cannot read pictures unless you add alt tags.

Jeffrey:

Oh, it’s like on the backend.

Jacob:

Yes, it’s something you put in your code, often in the footer of your site, that tells Google exactly what your business is—your name, your phone number—so it can’t get confused. Otherwise, you’re relying on it successfully reading what’s in your footer or on your contact page and making that connection. Schema is your way of ensuring that when you get indexed, if you do schema right, it essentially guarantees that Google understands your service area and your primary offerings for your business.

If you’re posting jobs on your website, Job Schema is super helpful because it ensures Google can read the job listing correctly. People use schema on their site—for example, when you see reviews show up in Google search results, that’s thanks to schema. It’s the way to ensure your reviews show up, or special images if you want them. Schema is like the magic nerd form that makes sure Google and its blind robot can actually index your site properly.

Jeffrey:

Very cool. So, I know we were talking about keeping the episode shorter, so I think this is a good place to wrap up.

Jacob:

That’s Google indexing in a nutshell. What you should do right now is get your website on Google Search Console. I’ll have a link in the description of this episode. Go check it out and see how your website’s doing.

Jeffrey:

Make sure there’s no random pages that got left behind unintentionally.

Jacob:

Exactly. Or, you might find that you’re not in there at all yet if you have a brand new website, because Google can take a long time to find it.

Jeffrey:

Yeah, very good. Awesome Jacob! Until next week.

subcribe

Almost never miss an episode!

Well, we're only human.

Subscribe to receive emails in your inbox when every new episode drops ... or when we want to send you obnoxious emails to sell you stuff you don't really need.

Just kidding, we respect the privilege of being in your inbox.

Email Subscribe

"*" indicates required fields

Name*
This field is for validation purposes and should be left unchanged.
sponsors