Thursday, April 14, 2011

Search Engine Ideology


   To understand the seo process you must know about the architecture of search engines. Search engine contains the following components:
Spider+Crawler+Indexer component group might be implemented as a single program that downloads web pages, analyzes them and then uses their links to find new resources.

   Spider. This program downloads web pages just like a web browser. The difference is that a browser displays the information presented on each page (text, graphics, etc.) while a spider does not have any visual components and works directly with the underlying HTML code of the page.

   Crawler. This program finds all links on each page. Its task is to determine where the spider should go either by evaluating the links or according to a predefined list of addresses. The crawler follows these links and tries to find documents not already known to the search engine.

   Indexer. This component parses each page and analyzes the various elements, such as text, headers, structural or stylistic features, special HTML tags, etc.

   Database. This is the storage area for the data that the search engine downloads and analyzes. Sometimes it is called the index of the search engine.

   Results Engine. The results engine ranks pages. It determines which pages best match a user's query and in what order the pages should be listed. This is done according to the ranking algorithms of the search engine.

   Web server. The search engine web server usually contains a HTML page with an input field where the user can specify the search query he or she is interested in. The web server is also responsible for displaying search results to the user in the form of an HTML page.

Spider+Crawler+Indexer component group might be implemented as a single program that downloads web pages, analyzes them and then uses their links to find new resources.

No comments:

Post a Comment