Search Engine Ranking

Tuesday, 6 March 2007

All about robots

Posted on 09:30 by Unknown

Search engine robots, including our very own Googlebot, are incredibly polite. They work hard to respect your every wish regarding what pages they should and should not crawl. How can they tell the difference? You have to tell them, and you have to speak their language, which is an industry standard called the Robots Exclusion Protocol.

Dan Crow has written about this on the Google Blog recently, including an introduction to setting up your own rules for robots and a description of some of the more advanced options. His first two posts in the series are:
Controlling how search engines access and index your website
The Robots Exclusion Protocol
Stay tuned for the next installment.

While we're on the topic, I'd also like to point you to the robots section of our help center and our earlier posts on this topic:
Debugging Blocked URLs
All About Googlebot
Using a robots.txt File

Update: For more information, please see our robots.txt documentation.

Posted in crawling and indexing | No comments

Monday, 5 March 2007

Using the robots meta tag

Posted on 16:05 by Unknown

Recently, Danny Sullivan brought up good questions about how search engines handle meta tags. Here are some answers about how we handle these tags at Google.

Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret

<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOFOLLOW">

The same way as:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

If content values conflict, we will use the most restrictive. So, if the page has these meta tags:

<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="INDEX">

We will obey the NOINDEX value.

Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages with content values of INDEX or FOLLOW.

Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT". If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it's best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.

Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:

<meta name="ROBOTS" content="NOODP">
<meta name="robots" content="noodp">
<meta name="Robots" content="NoOdp">

If you have multiple content values, you must place a comma between them, but it doesn't matter if you also include spaces. So the following meta tags are interpreted the same way:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:

If you block a page with robots.txt, Googlebot will never crawl the page and will never read any meta tags on the page.
If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.

Valid meta robots content values
Googlebot interprets the following robots meta tag values:

NOINDEX - prevents the page from being included in the index.
NOFOLLOW - prevents Googlebot from following any links on the page. (Note that this is different from the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
NOARCHIVE - prevents a cached copy of this page from being available in the search results.
NOSNIPPET - prevents a description from appearing below the page in the search results, as well as prevents caching of the page.
NOODP - blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results.
NONE - equivalent to "NOINDEX, NOFOLLOW".

A word about content value "NONE"
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.

<META NAME="ROBOTS" CONTENT="NONE">

However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.

Update: For more information, please see our robots meta tag documentation.

Posted in crawling and indexing | No comments

Friday, 2 March 2007

Using the site: command

Posted on 13:49 by Unknown

The site: command enables you to search through a particular site. For instance, a searcher could look for references to [Buffy] in this blog by doing the following search:

site:googlewebmastercentral.blogspot.com buffy

Webmasters sometimes use this command to see a list of indexed pages for a site, like this:

site:www.google.com

Note that with this command, there's no space between the colon and the URL. A search for www.site.com returns URLs that begin with www and a search for site.com returns URLs for all subdomains. (So, site:google.com returns URLs such as www.google.com, checkout.google.com, and finance.google.com). You can do this search from Google or you can go to your webmaster tools account and use the link under Statistics > Index stats. Note that whether this link includes the www depends on how you have added the site to your account.

Historically, Google has avoided showing pages that appear to be duplicate (e.g., pages with the same title and description) in search results. Our goal is to provide useful results to the searcher. However, with a site: command, searchers are likely looking for a full list of results from that site, so we are making a change to do that. In some cases, a site: search doesn't show a full list of results even when the pages are different, and we are resolving that issue as well. Note that this is a display issue only and doesn't in any way affect search rankings. If you see this behavior, simply click the "repeat the search with omitted results included" link to see the full list. The pages that initially don't display continue to show up for regular queries. The display issue affects only a site: search with no associated query. In addition, this display issue is unrelated to supplemental results. Any pages in supplemental results display "Supplemental Result" beside the URL.

Because this change to show all results for site: queries doesn't affect search rankings at all, it will probably happen in the normal course of events as we merge this change into the next time that we push a new executable for handling the site: command. As a result, it may be several weeks or so before you start to see this change, but we'll keep monitoring it to make sure the change goes out.

Posted in general tips, search results | No comments

Tuesday, 27 February 2007

Traveling Down Under: GWC at Search Engine Room and Search Summit Australia

Posted on 15:53 by Unknown

G'day Webmasters! Google Webmaster Central is excited to be heading to Sydney for Search Summit and Search Engine Room on March 1-2 and 20-21, respectively.

In addition to our coverage of topics in bot obedience and site architecture, we'll also provide a clinic for building Sitemaps, and chances to "chew the fat" with the Aussies in the "Google Breakfast" and "Google Webmaster Central Q&A." Our Search Evangelist, Adam Lasnik, will lead a fun session in "Living the Non 9-5 Life, Tips for Achieving Balance, Sanity...", where mostly, we hope to learn from you.

Search Summit

Thursday, March 1st
Site Architecture, CSS and Tableless Design 14:45 - 15:30
Peeyush Ranjan, Engineering Manager

Friday, March 2nd
Bot Obedience 09:45 - 10:00
Dan Crow, Product Manager, Crawl Systems

Web 2.0 & Search 11:15 - 12:00
Dan Crow, Product Manager, Crawl Systems

Google Linking Clinic 12:00 - 12:45
Adam Lasnik, Search Evangelist

Lunch with Google Webmaster Central 12:45 -13:30

Sitemap Clinic 13:30 - 14:15
Maile Ohye, Developer Support Engineer

Google Webmaster Central Q&A 14:15 - 15:00

Living the Non 9-5 Life, Tips for Achieving Balance, Sanity... 15:00 - 15:45
Adam Lasnik, Search Evangelist

Search Engine Room

Tuesday, March 20th
Google Breakfast 07:30 - 09:00
Aaron D'Souza, Software Engineer, Search Quality

Don't Be Evil 09:30 - 10:30
Richard Kimber, Managing Director of Sales and Operations

Posted in events | No comments

Monday, 26 February 2007

Better badware notifications for webmasters

Posted on 12:33 by Unknown

In the fight against badware, protecting Google users by showing warnings before they visit dangerous sites is only a small piece of the puzzle. It's even more important to help webmasters protect their own users, and we've been working on this with StopBadware.org. A few months ago we took the first step and integrated malware notifications into webmaster tools. I'm pleased to announce that we are now including more detailed information in these notifications, and are also sending them to webmasters via email.

Webmaster tools notifications
Now instead of simply informing webmasters that their sites have been flagged and suggesting next steps, we're also showing example URLs that we've determined to be dangerous. This can be helpful when the malicious content is hard to find. For example, a common occurrence with compromised sites is the insertion of a 1-pixel iframe causing the automatic download of badware from another site. By providing example URLs, webmasters are one step closer to diagnosing the problem and ultimately re-securing their sites.

Email notifications
In addition to notifying webmaster tools users, we've also begun sending email notifications to some of the webmasters of sites that we flag for badware. We don't have a perfect process for determining a webmaster's email address, so for now we're sending the notifications to likely webmaster aliases for the domain in question (e.g., webmaster@, admin@, etc). We considered using whois records, but these often contain contact information for the hosting provider or registrar, and you can guess what might happen if a web host learned that one of its client sites was distributing badware. We're planning to allow webmasters to provide a preferred email address for notifications through webmaster tools, so look for this change in the future.

Update: For more information, please see our Help Center article on malware and hacked sites.

Posted in feedback and communication, webmaster tools | No comments

Tuesday, 20 February 2007

Tips on using feeds and information on subscriber counts in Reader

Posted on 12:18 by Unknown

Does your site have a feed? A feed can connect you to your readers and keep them returning to your content. Most blogs have feeds, but increasingly, other types of sites with frequently changing content are making feeds available as well. Some examples of sites that offer feeds:

News sources such as the New York Times publish feeds of their latest stories
Companies like Apple publish feeds of their press releases, as well as a few other feeds.
Blogs including the Official Google Blog publish feeds with their latest posts
Shopping sites like Buy.com publish feeds with noteworthy deals

Find out how many readers are subscribed to your feed
If your site has a feed, you can now get information about the number of Google Reader and Google Personalized Homepage subscribers. If you use Feedburner, you'll start to see numbers from these subscriptions taken into account. You can also find this number in the crawling data in your logs. We crawl feeds with the user-agent Feedfetcher-Google, so simply look for this user-agent in your logs to find the subscriber number. If multiple URLs point to the same feed, we may crawl each separately, so in this case, just count up the subscriber numbers listed for each unique feed-id. An example of what you might see in your logs is below:

User-Agent: Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 4 subscribers; feed-id=1794595805790851116)

Making your feed available to Google
You can submit your feed as a Sitemap in webmaster tools. This will let us know about the URLs listed in the feed so we can crawl and index them for web search. In addition, if you want to make sure your feed shows up in the list of available feeds for Google products, simply add a <link> tag with the feed URL to the <head> section of your page. For instance:

<link rel="alternate" type="application/atom+xml" title="Your Feed Title" href="http://www.example.com/atom.xml" />

Remember that Feedfetcher-Google retrieves feeds only for use in Google Reader and Personalized Homepage. For the content to appear in web search results, Googlebot will have to crawl it as well.

Don't yet have a feed?
If you use a content management system or blogging platform, feed functionality may be built right now. For instance, if you use Blogger, you can go to Settings > Site Feed and make sure that Publish Site Feed is set to Yes. You can also set the feed to either full or short and can add a footer. The URL listed here is what subscribers add to their feed readers. A link to this URL will appear on your blog.

More tips from the Google Reader team
In order to provide the best experience for your users, the Google Reader team has also put together some tips for feed publishers. This document covers feed best practices, common implementation pitfalls, and various ways to promote your feeds. Whether you're creating your feeds from scratch or have been publishing them for a long time, we encourage you to take a look at our tips to make the most of your feeds. If you have any questions, please get in touch.

Posted in products and services, sitemaps | No comments

Wednesday, 14 February 2007

Our Valentine's day gift: out of beta and adding comments

Posted on 01:48 by Unknown

Here at webmaster central, we love the webmaster community -- and today, Valentine's Day, we want to show you that our commitment to you is stronger than ever. We're taking webmaster tools out of beta and enabling comments on this blog.

Bye, bye beta
We've come a long way since our initial launch of the Sitemaps protocol in June 2005. Since then, we've expanded to a full set of webmaster tools, changed our name, listened to your input, and expanded even more. 2006 was a year of great progress, and we're just getting started. Coming out of beta means that we're committed to partnering with webmasters around the world to provide all the tools and information you need about your sites in our index. Together, we can provide the most relevant and useful search results. And more than a million of you, speaking at least 18 different languages, have joined in that partnership.

In addition to the many new features that we've provided, we've been making lots of improvements behind the scenes to ensure that webmaster tools are reliable, scalable, and secure.

The Sitemaps protocol has evolved into version 0.9, and Microsoft and Yahoo have joined us in that support to provide standards that make it easier for you to communicate with search engines. We're excited about how much information we've been able to learn about your sites and we plan to continue to develop the best ways for you to provide us with information about individual pages on your sites.

Hello, comments
Our goal is improved communication with webmasters, and while our blog, discussion forum, and tools help us reach that goal, you can now post comments and feedback directly on this blog as well. This helps you talk to us about topics we're posting. We want to do all we can to encourage an open dialogue between Google and the webmaster community; this is another avenue to do that.

As always, if you have questions or want to talk about things other than a particular blog post, head over to our discussion forum. You'll find our team there often, answering questions and gathering feedback. And if you haven't already, check out the "links to this post" link under every post to see other discussions of this blog across the web.

Thank you, webmasters, for joining us in this great collaboration. Happy Valentine's Day.

Posted in feedback and communication | No comments