Google Bots Learning to Search in New Ways
Posted by David Goldstein , Wednesday, 16 April 2008
For those who are optimising their websites to encourage Google's search bots to place their sites higher in online searches, you will be interested to know their bots have begun a new, more active phase of their indexing jobs for searching deeper into websites. For those who are optimising their websites to encourage Google's search bots to place their sites higher in online searches, you will be interested to know their bots have begun a new, more active phase of their indexing jobs for searching deeper into websites.

In a blog post last week, Jayant Madhavan and Alon Halevy on the Google Webmaster Central Blog said in the past few months Google has been exploring some HTML forms to try to discover new web pages and URLs they otherwise couldn't find and index for users who search on Google. Specifically, when they encounter a <FORM> element on a high-quality site, they might choose to do a small number of queries using the form. For text boxes, their computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, they generate and then try to crawl URLs that correspond to a possible query a user may have made. If they ascertain that the web page resulting from their query is valid, interesting, and includes content not in our index, they may include it in their index much as they would include any other web page.

The article later says, This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.

To read this posting in full, Crawling through HTML forms, see http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html.