Indexing in Yandex search engines. A quick way to check page indexing in Yandex and Google. Foreign search engines

Site indexing is the most important, necessary and primary detail in the implementation of its optimization. After all, it is precisely because of the presence of an index that search engines can respond to all user requests extremely quickly and accurately.

What is site indexing?

Site indexing is the process of adding information about the content (content) of the site to the database of search engines. It is the index that is the database of search engines. In order for the site to be indexed and appear in the search results, a special search bot must visit it. The entire resource, page by page, is examined by the bot according to a certain algorithm. As a result, finding and indexing links, images, articles, etc. At the same time, in the search results, those sites will be higher in the list, the authority of which is higher compared to the rest.

There are 2 options for indexing the PS site:

Self-determination by the search robot of fresh pages or a created resource - this method is good if there are active links from other already indexed sites to yours. Otherwise, you can wait for the search robot indefinitely;
Entering the URL to the site in the search engine form intended for this manually - this option allows the new site to "queue up" for indexing, which will take quite a long time. The method is simple, free and requires entering the address of only the main page of the resource. This procedure can be performed through the Yandex and Google webmaster panel.

How to prepare a site for indexing?

It should be noted right away that it is highly undesirable to lay out a site at the development stage. Search engines can index incomplete pages with incorrect information, spelling errors, etc. As a result, this will negatively affect the site's ranking and the issuance of information from this resource in the search.

Now let's list the points that should not be forgotten at the stage of preparing a resource for indexing:

indexing restrictions apply to flash files, so it is better to create a site using HTML;
such a data type as Java Script is also not indexed by search robots, in this regard, site navigation should be duplicated with text links, and all important information that should be indexed is not written in Java Script;
you need to remove all broken internal links so that each link leads to a real page of your resource;
the structure of the site should allow you to easily navigate from the bottom pages to the main page and back;
it is better to move unnecessary and secondary information and blocks to the bottom of the page, and also hide them from bots with special tags.

How often does indexing take place?

Site indexing, depending on a number of reasons, can take from several hours to several weeks, up to a whole month. Indexing update, or search engine ups occur at different intervals. According to statistics, on average, Yandex indexes new pages and sites for a period of 1 to 4 weeks, and Google manages for a period of up to 7 days.

But with proper preliminary preparation of the created resource, these terms can be reduced to a minimum. After all, in fact, all PS indexing algorithms and the logic of their work come down to giving the most accurate and up-to-date answer to a user's request. Accordingly, the more regularly quality content appears on your resource, the faster it will be indexed.

Methods for accelerating indexing

First you need to “notify” the search engines that you have created a new resource, as mentioned in the paragraph above. Also, many people recommend adding a new site to social bookmarking systems, but I don’t do that. This really made it possible to speed up indexing a few years ago, since search robots often “visit” such resources, but, in my opinion, now it’s better to put a link from a popular social network. Soon they will notice a link to your resource and index it. A similar effect can be achieved with direct links to a new site from already indexed resources.

After several pages have already been indexed and the site has begun to develop, you can try to “feed” the search bot to speed up indexing. To do this, you need to periodically publish new content at approximately equal intervals of time (for example, every day, 1-2 articles). Of course, the content must be unique, high-quality, competent and not oversaturated with key phrases. I also recommend creating an XML sitemap, which will be discussed below, and adding it to the webmaster panel of both search engines.

robots.txt and sitemap files

The robots txt text file includes instructions for search engine bots. At the same time, it makes it possible to prohibit the indexing of selected pages of the site for a given search engine. If you do it manually, it is important that the name of this file is written only in capital letters and is located in the root directory of the site, most CMS generate it on their own or using plugins.

Sitemap or site map is a page containing a complete model of the site structure to help "lost users". In this case, you can move from page to page without using site navigation. It is advisable to create such a map in XML format for search engines and include it in the robots.txt file to improve indexing.

You can get more detailed information about these files in the relevant sections by clicking on the links.

How to prevent a site from being indexed?

You can manage, including prohibiting a site or a separate page from being indexed, using the robots.txt file already mentioned above. To do this, create a text document with the same name on your PC, place it in the root folder of the site and write in the file from which search engine you want to hide the site. In addition, you can hide site content from Google or Yandex bots using the * sign. This instruction in robots.txt will prohibit indexing by all search engines.

User-agent: * Disallow: /

For WordPress sites, you can disable site indexing through the control panel. To do this, in the site visibility settings, check the box "Recommend to search engines not to index the site." At the same time, Yandex, most likely, will listen to your wishes, but with Google it is not necessary, but some problems may arise.

Hello, dear readers of the site site. Today I will tell you how to find and fix possible problems with the indexing of your site. Let's look at three main points.

The robot must index the necessary pages of the site with ;
These pages should be indexed quickly;
The robot should not visit unnecessary pages of the site;

Everything seems to be quite simple. But in fact, most webmasters face the same problems when setting up indexing. Let's look at them carefully.

At the moment, in most cases, if we are talking about a new page of the site in the search results, it will appear within a few tens of minutes. If we are talking about already indexed pages, then this 3-5 days.

As a result, in order for your site to be visited quickly, you need to remember three rules:

First, you must have a valid and regularly updated sitemap file;
Second, don't use the Crawl-delay directive just if you feel like it. Right now, be sure to go to your robots.txt and check if there is such a directive. If it is installed, think about whether you really need it.
Thirdly, use "page crawl" to visit the most important pages of your site with a new important page.

We save the robot from unnecessary pages of the site

When a robot starts visiting your resource, it often does not always have a positive effect on the indexing of good site pages. Imagine a situation, let's say the robot makes 5 requests per second to your resource. It seems like a great result, but what's the point of these five visits per second if they all belong to your site's service pages or duplicates and the robot does not pay attention to the really important pages at all. This is our next section on how not to index unnecessary pages.

We use the Yandex Webmaster section and bypass statistics
We get the addresses of pages that the robot should not index
We compose the correct robots.txt file

Let's take a look at the "Crawl Statistics" tool, it looks like this. There are also charts here. We are interested in scrolling down the page a little with the "all pages" button. You will see everything that the robot has visited in recent days.

Among these pages, if there are any service pages, then they must be prohibited in the robots.txt file. What exactly needs to be banned, let's point by point.

First, as I said earlier, filter pages, selection of goods, sorting should be prohibited in the robots.txt file.
Secondly, we must prohibit various action pages. For example, add to compare, add to favorites, add to cart. The shopping cart page itself is also prohibited.
Thirdly, we prohibit to bypass all service sections such as site search, the admin panel of your resource, sections with user data. For example, delivery information, phone numbers, etc. are also prohibited in the robots.txt file.
AND pages with IDs, for example, with utm tags, it is also worth prohibiting crawling in the robots.txt file using the Clean-param directive.

If you are faced with the question of whether it is necessary to prohibit or whether this page should be opened for bypassing, answer yourself a simple question: whether users need this page search engine? If this page should not be in the search for queries, then it can be prohibited.

And a small practical case, I hope it will motivate you. Look, on one of the resources, the robot makes almost thousands of hits a day to pages with a redirect. Actually, the redirect pages were the add-to-cart pages.

We make changes to the robots.txt file and it can be seen on the graph that access to such pages has practically disappeared. At the same time, immediately positive dynamics - bypassing the necessary pages on this page of the steppe with code 200 has increased dramatically.

- Duplicate pages on the site, how to find them and what to do with them

And here another danger awaits you - this is page duplicates. By duplicates, we mean several pages of the same site that are available at different addresses, but at the same time contain absolutely identical content. The main danger of duplicates is that if they exist, they can change in the search results. A page can get to an address you don’t need, compete with the main page that you are promoting for any queries. Plus, a large number of duplicate pages makes it difficult for an indexing robot to bypass the site. In general, they bring a lot of problems.

I think that almost all webmasters are sure that there are no duplicate pages on their resource. I want to upset you a little. In fact, there are duplicates on almost all sites in RuNet. ? I have a detailed article about this, after reading which, you will not have a single question left.

- Checking the server response code

In addition to the robots.txt file, I would like to tell you about the correct http response codes. It also seems to be things that have already been said more than once. The http response code itself is a specific page status for the indexing robot.

http-200 - the page can be indexed and search enabled.
http-404 means the page has been deleted.
http-301 - page redirected.
http-503 - temporarily unavailable.

What is the advantage of using the correct http codes tool:

Firstly, you will never have various broken links to your resource, that is, those links that lead to pages that do not respond with a 200 response code. If the page does not exist, then the robot will understand the 404 response code.
Secondly, it will help the robot with planning to bypass the really necessary pages that respond with a 200 response code.
And thirdly, it will allow you to avoid getting various garbage into the search results.

About this just the next screen is also from practice. During the unavailability of the resource and technical work, the robot receives a stub with an http response code of 200. This is exactly the description for this stub you see in the search results.

Because the page responds with a 200 response code, they are returned. Naturally, pages with such content cannot be located and displayed for any requests. In this case, the correct setting would be a 503 http response. If a page or resource is temporarily unavailable, this response code will prevent pages from being excluded from search results.

Here are situations when your important and necessary pages of your resource become inaccessible to our robot, for example, they respond with a 503 or 404 code, or return to this stub instead.

Such situations can be tracked using the tool "important pages". Add to it those pages that bring the most traffic to your resource. Settings for notifications to the mail and either to the service and you will receive information about what is happening with this page. What is her response code, what is her title, when she was visited and what is her status in the search results.

You can check the correctness of returning one or another response code using the appropriate tool in Yandex Webmaster (here). In this case, we check the response code of non-existent pages. I came up with a page and drove it into the tool, pressed the check button and got a 404 response.

Everything is in order here, since the page was unavailable, it correctly answered with a 404 code and it will no longer be included in the search. As a result, in order to limit robots from visiting unnecessary pages of the site, actively use the bypass statistics tool, make changes to the robots.txt file and make sure that the pages return the correct http response code.

- Summing up

We gave the robot the correct page of the site with content. We have achieved that it is indexed quickly. We forbade the robot to index unnecessary pages. All three of these large groups of tasks are interconnected.. That is, if the robot is not limited to indexing service pages, then, most likely, it will have less time to index the necessary pages of the site.

If the robot does not receive the content of the required pages in full, it will not include these pages in the search results quickly. That is, you need to work on indexing your resource in a complex, on all these three tasks. And in this case, you will achieve some success in order for the desired pages to quickly get into the search results.

Yandex official answers

Pages in uppercase were included in the index, despite the fact that the site does not contain such pages. If the pages are in uppercase, it is most likely that the robot found links to them somewhere on the Internet. Check your site first, most likely an incorrect link is installed somewhere on the Internet. The robot came, saw it, and started downloading the page in upper case. For them, for such pages, it is better to use 301 redirects.

Sitemap has multiple pages - is this normal? If we are talking about a sitemap, that is, a special sitemap format in which you can specify links to other sitemap files, then of course it’s normal.

If you place links to all sections of the catalog at the bottom of the site, which is displayed on all pages, will it help indexing or hurt? In fact, you don’t need to do this at all, that is, if visitors to your site don’t need it, then you don’t need to make it specifically. A simple sitemap file is enough. The robot will find out about the presence of all these pages, add them to its database.

Do I need to specify the update frequency in the sitemap? The sitemap file can be used to pass additional information to the indexing robot. In addition to the addresses themselves, our robot also understands a few more tags. Firstly, this is the update frequency, that is, the frequency of the update. This is the crawl priority and last modified date. It takes all this information from the sitemap file when processing the file and adds it to its database and uses it later to adjust bypass policies.

Is it possible to do without a sitemap? Yes, make sure that your site has transparent navigation so that any internal page has accessible links. But keep in mind that if this is a new resource (from the author: read about how to launch a new site) and, for example, some page is deep on your site, for example, in 5-10 clicks, then the robot will take a lot of time to find out about its availability. First, download the main page of the site, get links, then again download those pages that he learned about, and so on. The sitemap file allows you to transfer information about all pages to the robot at once.

The robot makes 700 thousand requests per day for non-existent pages. First of all, you need to understand where such non-existent pages came from. Perhaps relative links are used incorrectly on your site, or some section was permanently removed from the site and the robot continues to check such pages anyway. In this case, you should simply disable them in the robots.txt file. Within 12 hours, the robot will stop accessing such pages.

If the service pages are indexed, how can I remove them from the search. Also use robots.txt to remove pages from search results. That is, it does not matter if you set the ban when creating your site, or when you launched the resource. The page will disappear from the search results within a week.

Is an auto-generated sitemap good or not? In most cases, all sitemaps are generated automatically, so we can say that this is probably good. You do not need to do something with your hands and can pay attention to something else.

How the page will be indexed if it is made canonical to itself. If the canonical attribute leads to the page itself? Is this page considered canonical? It will be normally indexed and included in the search results, that is, it is quite correct to use this technique.

What does the status "non-canonical" mean? W The page starts with the canonical attribute set, which leads to another page on your site. Therefore, this page will not be able to get into the search. Open the source code of the page, do a search, see where canonical leads, and check the canonical page in the search.

What is more correct for a shopping cart page to be banned in robots.txt or noindex? If the page is disabled using the noindex method, the robot will periodically visit it and check for this ban. To prevent the robot from doing this, it is better to use a ban in the robots.txt file.

See you! Do everything and always on the pages of the blog site

Help the project - subscribe to our Yandex.Zen channel!

Everything is very simple with Google. You need to add your site to webmaster tools at https://www.google.com/webmasters/tools/, then select the added site, thus getting into the Search Console of your site. Next, in the left menu, select the “Scanning” section, and in it the “View as Googlebot” item.

On the page that opens, in the empty field, enter the address of the new page that we want to quickly index (taking into account the already entered domain name of the site) and click the "Scan" button to the right. We are waiting for the page to be scanned and appear at the top of the table of addresses previously scanned in this way. Next, click on the "Add to Index" button.

Hooray, your new page is instantly indexed by Google! In a couple of minutes you will be able to find it in the Google search results.

Fast indexing of pages in Yandex

In the new version of webmaster tools became available similar tool to add new pages to the index. Accordingly, your site must also be previously added to Yandex Webmaster. You can also get there by selecting the desired site in the webmaster, then go to the "Indexing" section, select the "Page Recrawl" item. In the window that opens, enter the addresses of new pages that we want to quickly index (using a link on one line).

Unlike Google, indexing in Yandex does not happen instantly yet, but it tries to strive for it. By the above actions, you will inform the Yandex robot about the new page. And it will be indexed within half an hour or an hour - this is how my practice shows personally. Perhaps the page indexing speed in Yandex depends on a number of parameters (on the reputation of your domain, account, and/or others). In most cases, this can be stopped.

If you see that the pages of your site are poorly indexed by Yandex, that is, a few general recommendations on how to deal with this:

The best, but also the most difficult recommendation is to install the Yandex speedbot on your site. To do this, it is desirable to add fresh materials to the site every day. Preferably 2-3 or more materials. And add them not all at once, but after a while, for example, in the morning, afternoon and evening. It would be even better to follow approximately the same publication schedule (approximately maintain the same time for adding new materials). Also, many people recommend creating an RSS feed of the site so that search robots can read updates directly from it.
Naturally, not everyone will be able to add new materials to the site in such volumes - it's good if you can add 2-3 materials per week. In this case, you can not particularly dream about the speed of Yandex, but try to drive new pages into the index in other ways. The most effective of which is considered to be posting links to new pages in upgraded Twitter accounts. With the help of special programs like Twidium Accounter, you can “pump” the number of twitter accounts you need and use them to quickly drive new pages of the site into the search engine index. If you do not have the opportunity to post links to the upgraded Twitter accounts on your own, you can buy such posts through special exchanges. One post with your link on average will cost from 3-4 rubles and more (depending on the coolness of the selected account). But this option will be quite expensive.
The third option for quick indexing is to use the http://getbot.guru/ service, which for just 3 rubles will help you achieve the desired effect with a guaranteed result. Well suited for sites with a rare schedule for adding new publications. There are also cheaper rates. Details and differences between them are best viewed on the website of the service itself. Personally, I am very satisfied with the services of this service as an indexing accelerator.

Of course, you can also add new publications to social bookmarks, which should theoretically also contribute to the rapid indexing of the site. But the effectiveness of such an addition will also depend on the level of your accounts. If you have little activity on them and you use your accounts only for such spam, then there will be practically no useful output.

P.S. with extensive experience is always up-to-date - contact us!

(13 )

If you want to know if a certain page is indexed by a search engine and how many pages of your site are being searched in total, you should learn about the four easiest ways to check the indexing of a site, which are used by all SEO specialists.

In the process of indexing the portal, the search bot first scans it, that is, bypasses it to study the content, and then adds information about the web resource to the database. Then the search system generates a search for these databases. Do not confuse crawling with indexing - they are two different things.

To understand how many more pages of your project are not indexed, you need to know their total number. This will allow you to understand how quickly your site is indexed. You can do this in several ways:

View site map. You will find it at: your_site_name.ru/sitemap.xml. Here, basically, all pages hosted on the resource are shown. But sometimes the sitemap may not be generated correctly, and some of the pages may not be in it.
Use a special program. These programs crawl your entire site and give out all the pages of your site, an example of such programs is Screaming Frog Seo (paid) or Xenus Links Sleuth (free).

Ways to check site indexing

We bring to your attention the 4 most common and simple ways to check which pages are in the index and which are not.

1. Through the webmaster panel

With this method, web resource owners check their presence in the search most often.

Yandex

Log in to Yandex.Webmaster.
Go to menu "Site Indexing".
Under it find the line "Pages in Search".

You can also go the other way:

Select "Site Indexing".
Next go to "History".
Then click on the tab "Pages in Search".

In both the first and second ways, you can study the dynamics of growth or decline in the number of pages in the search engine.

Google

Go to the service control panel Google Webmaster Tools.
Click on the tab Search Console.
Go to "IndexGoogle".
Click on an option "Indexing Status".

2. Through search engine operators

They help refine your search results. For example, using the "site" operator allows you to see the approximate number of pages that are already in the index. To check this parameter, in the Yandex or Google search bar, enter: "site:url_of_your_site".

Important! If the results in Google and Yandex differ greatly, then your site has some problems with the site structure, garbage pages, indexing, or sanctions have been imposed on it.

For search, you can also use additional tools, for example, to find out how page indexing has changed over a certain period of time. To do this, under the search bar, click on the tab "Search Tools" and select a period, for example, "For 24 hours".

3. Through plugins and extensions

Using special programs, checking the indexing of a web resource will occur automatically. This can be done using plugins and extensions, also known as bookmarklets. They are javascript programs that are stored in the browser as standard bookmarks.

The advantage of plugins and extensions is that the webmaster does not need to re-enter the search engine every time and enter site addresses, operators, and so on. The scripts will do everything automatically.

The most popular plugin used for this purpose is RDS bar, which can be downloaded from the app store of any browser.

It is worth noting that the plugin has much more features in Mozilla Firefox than in other browsers. RDS bar provides information about both the entire website and its individual pages

On a note. There are paid and free plugins. And the biggest disadvantage of free plugins is that you regularly need to enter captcha into them.

You can not ignore the bookmarklet "Indexing check". To enable the program, simply drag the link to your browser bar, then launch your portal and click on the extension's tab. So you will open a new tab with Yandex or Google, where you will study the necessary information regarding the indexing of certain pages.

4. With special services

I mainly use third-party services, because they clearly show which pages are in the index and which are not there.

Free service

https://serphunt.ru/indexing/ - there is a check, both for Yandex and Google. You can check up to 50 pages per day for free.

Paid service

Of the paid ones, I like Topvisor the most - the cost is 0.024 rubles. for checking one page.

You upload all the pages of your site to the service and it will show you which are in the search engine index and which are not.

Conclusion

The main goal of the owner of any web resource is to achieve indexing of all pages that will be available to search robots for scanning and copying information into the database. Implementing this task on a large site can be very difficult.

But with the right integrated approach, that is, competent SEO optimization, regular filling of the site with high-quality content and constant monitoring of the process of including pages in the search engine index, you can achieve positive results. To do this, in this article we talked about four methods for checking the indexing of a site.

Know that if pages start to fly out of the search too abruptly, something is wrong with your resource. But often the problem lies not in the indexing process, but in the optimization itself. Do you want to quickly index and get into the TOP search results? Offer your target audience content that outperforms your competitors.

For a young site, fast indexing in search engines is especially important. Because it doesn't have a certain weight (or "trust") yet. It is especially important in the first months of the site's life to update it regularly. The content must also be of high quality.

Quick indexing of a new site in Yandex

In order for your new site to be quickly indexed in Yandex, you need to add it to Yandex Webmaster. Next, select the “Indexation” -> “Page Re-Crawl” block (see figure).

Page crawling in Yandex Webmaster

For a young site, be sure to include the main page in this list. There are frequent cases when the robot visits the main page and indexes all internal links from it. Much more than 20 pages can be indexed this way.

Fast site indexing in Google

Similarly with Yandex, to speed up the indexing of a new site in Google, it must be added to Google Webmaster (Google Search Console). You need to select the "Skinning" tab -> "View as GoogleBot".

Feature View as GoogleBot

And in the form that opens, insert the address of the desired page, click "Scan". After that, you will have the result of the page crawl and the magic button “Request indexing” will appear.

Functionality Request indexing

Click on the button and you will see something like this window:

How to index a website on Google

Here, be sure to select "crawl this URL and direct links." In this case, the robot will try to crawl all internal links on the page you specified. With a high degree of probability, they will all enter the Google index as quickly as possible!

Indexing a new site on an old domain

In this case, the task is not as trivial as it seems. On domains with history, it is often difficult to index a new site. This process can take weeks or months. It depends on the history of the domain: whether sanctions were previously imposed on it and which ones.

The scheme of actions in this case is simple:

Add site to Yandex and Google webmaster
Request reindexing through the appropriate functionality
Wait 2-3 updates.
If nothing has changed, write to support and resolve this issue on an individual basis.

Methods for speeding up the indexing of a young site

In addition to the methods that I indicated above, there are several more that work:

Sharing material on social networks. I recommend using the following: Vkontakte, Facebook, Twitter, Google+ (despite the fact that Google's social network is actually dead, it helps to speed up the indexing of new pages)
Regular website updates. Over time, the site collects statistics on the publication of new materials, it helps to index new pages. Update regularly and maybe you will be able to “feed the fastbot” (in this case, indexing new pages will take 1-10 minutes).
For news sites: log in to Yandex News. This is not as difficult as it might seem, but the effect will be amazing. On all sites from Yandex News, there is a fast bot.
Competent internal structure of the site. Be sure to use linking, TOP materials, etc. Increasing the number of internal links on the page (within reasonable limits) will also speed up indexing

Fast indexing of new pages of the old site

Q&A on indexing young sites

Do you have questions about indexing young sites? Ask them in the comments!

Q: Should a new site be closed from indexing?
A: I recommend that you don't expose your site to crawlers until it's populated with starter content. As my practice shows, it takes much more time to reindex existing pages than to index new ones.

Q: How long does it take for Yandex to index a new site?
A: On average, it's 1-2 updates (from 1 to 3 weeks). But the situations may be different.

Q: What problems can there be with indexing a young site?
A: Probably the main problem is bad content. For this reason, the site may not be indexed. There were also cases when a young, but large site with thousands of pages rolled out. Search still has memories of doorways, so young sites with thousands of pages have a “special attitude”.

Q: How many pages to open a new site for indexing and how often to update it?
A: You can open a site for indexing from 1 page. It is important to follow a simple rule: do not immediately place thousands of pages on the site, because. this can be regarded as search engine spam, and new material is regularly added to the site. Let 1 material in 3 days, but regularly! It is very important!

Q: How often does Yandex index sites?
A: According to Yandex's official information, the frequency of indexing can be from 2 days to several weeks. And about methods of acceleration it is written above.