|
Site Map 2
|
Google: Changes in Google Ranking StrategiesTHE IMPORTANCE OF LINKAGEWho gets linked to, therefore, is not determined by quality, but by visibility, and visibility is no measure of quality. Good content can only overcome its competition by achieving high visibility, but if it has no visibility to begin with, it will accrue less visibility than any site already visible. On the other hand, if Google succeeds in rebalancing their search results by favoring FIXED CONTENT for inactive topics, by favoring government (.GOV, .MIL) and educational (.EDU) sites over commercial (.COM) sites, and by customizing data sets to satisfy classes of queries, they will have bought themselves some breathing space in the continual struggle with search engine optimizers. Google will have gained an advantage for two reasons: First, the search engine optimizers have, over the course of about two years, worked themselves into a link-building frenzy. Most search optimization forums now routinely advise people to build more links in response to requests for help without examining on-page content, search term competitiveness, and off-page factors other than links. Secondly, Google has finally succeeded in composing (if not yet in presenting) a multi-vectored definition of relevance. An optimizer has, at this point in time, no way of knowing which vector is being applied to a targeted search term. In fact, the multiplicity of vectors affords Google an opportunity to experiment with random results. If they devise several methodologies for ascertaining relevance, each with approximate equal probability of success to the others, they can cycle through the methodologies in satisfying queries. Furthermore, if Google is in fact tracking queries by IP address and user-agent, spoofing tactics will have to be adapted by the optimizers which can easily be predicted. That is, any class of queries which is subject to manipulation will distort itself in response to manipulation. There is already a historical database in place. As soon as Google detects a shift in vectors, their alarms should start sounding (but if they are not anticipating such a response, a window of opportunity exists for optimizers to skew the data sets before Google can adopt to the new strategies). About the only option an optimizer will have is to set up a bank of servers which issue randomly mutated queries to multiple data centers, cycling through several thousand IP addresses. Google's tendency to look at IP addresses, however, suggests that even C-block differentiation may not be adequate. Optimizers who want to spoof query relevance will have to position servers around the world. On today's Web, that is not very difficult. Existing investments in server pools already being used to simulate click activity may have to be phased out, although click-fakers have probably already scattered their servers to the four winds anyway. An alternative would be to implement classes of anchor text expressions. Search optimizers have already begun discussing the viability of the strategy, but their priorities are misplaced. Whereas many speculate that "site-wide links" (i.e., a text link replicated across an entire site) are being filtered, penalized, or somehow identified, the truth is that the links themselves may have been reclassed by Google. It may only require that a few sub-sets of links have to be reanchored in order to affect a specific document's relevance. Linkage from TRUSTED CONTENT SITES will continue to be important. In fact, Trusted Content may be used to determine pre-query ranking results for some classes of queries. Trusted Content will, by definition, have passed a significant threshold in the tests. It may be that WEAK RELEVANCE will be applied to data sets which have a large proportion of TRUSTED CONTENT SITES. In which, optimizers will be struggling to propel their client sites into Trusted Site Status. RELEVANCE strength can easily be influenced by the site-specific damping factors used in Classic PageRank. A TRUSTED CONTENT SITE will have a more favorable damping factor. All other sites will have less favorable damping factors. The damping factor may thus be the fulcrum with which Google moves the universe. It is the most flexible and resilient part of the PageRank algorithm. If a mechanism can be devised for adjusting the damping factor, then a document's PageRank becomes more useful. In my followup to "On the Googleness of Being", I wrote: So, outbound links are now more important than inbound links. Inbound links tell Google that a Web site is popular -- someone is linking to it, therefore, it must be a relatively legitimate site. But outbound links tell Google what is important to the page. If the outbound links don't seem to be relevant to the page's content, there is a disconnect. I would say that Google applies a sliding scale to the relevance of outbound links, or some sort of weighting system.Google can assess a document's purpose as much by what the document links to as by what links to the document. A newer page, having few if any inbound links, should be able to influence the determination of its relevance by what it links to. That is, the page makes an initial footprint in its history of association with other pages by saying, "These are the documents I want to be associated with". The "sliding scale" could, in fact, be an indication of the impact of acquired inbound links. That is, at the start of a document's life, its relevance is determined in part by the documents it links to. But as it accumulates inbound links, its relevance will be determined more and more by those inbound links (though never completely so -- and on-page factors will continue to be important). A disconnect between the document's outbound links and its inbound links, or a disconnect between the outbound links and the on-page content, could weaken the document's relevance score. A disconnect between inbound links and on-page content has always weakened relevance scores, but the practice of link bombing offsets that imbalance. In effect, link anchor text can cumulatively overwhelm on-page text for a document. The link anchor text would start out at a deficit because it does not agree with the on-page content, but as more links are acquired with similar anchor text, the deficit shifts to the on-page content. A self-sustaining process kicks in whereby off-page factors propel the document to the top of selected search results. NEW STRATEGIESNATURAL RELEVANCE should be equally determined by off-page factors (inbound link anchor text and directory descriptions), on-page factors (titles, headers, body text, outbound link anchor text), and historical performance (queries for which the document was served as a valid result). ARTIFICIAL RELEVANCE is driven by off-page factors (inbound link anchor text) with little regard for on-page factors or historical performance. ARTIFICIAL RELEVANCE can, in some cases, be shaped to look more natural by adjusting on-page content to agree with off-page content. It may subsequently be adjusted through the implementation of carefully selected queries which generate little interest but are semantically close enough to natural queries to be included in Google's natural query classes. If the natural query classes are weighted by performance within a timeframe (the most logical approach), they will be most vulnerable to consistent artificial querying, rather than to sudden spikes. End notes. 1. Google operates several dozen data centers around the world. Each maintains its own index, data set, and crawling/indexing software. 2. Google divides its index and data tables into sub-files called "shards". The shards are replicated across multiple servers within a data center to create redundancy as a means of compensating for anticipated hardware failure. Important sites with high Calculated PageRank are replicated across more servers. 3. While search engine optimization strategies vary, the basic technique employed on Xenite.Org (with some variations and exceptions) is to select a specific keyword phrase for use in the title tag, 1 or 2 visible H1 or large font header tags, and occasional repetition throughout body text. The keyword phrase is usually applied as anchor text for on-site navigational bars and in the site map. 4. There is no universally recognized method for measuring the competitiveness of a targeted search expression. The methodology used for evaluating data in this paper grades competitiveness on the basis of the number of results returned for an EXACT FIND query (a query string with quotation marks surrounding it), the technical design of the top 10 ranking pages (denoted as "optimized" or "not optimized"), the age of the sites in the top 10 results (denoted by "less than 1 year old" or "1 year old or older"), and the use of the targeted phrase in the title tags of the sites in the top 10 results. The initial determination is based on a correlation between number of listings reported by Google for the EXACT FIND results graded on a scale of 1 to 10, with 10 indicating the highest degree of competitiveness. i.e., <10,000 = 1, <50,000 = 2, <100,000 = 3, <250,000 = 4, <500,000 = 5, < 1,000,000 = 6, < 5,000,000 = 7, < 10,000,000 = 8, < 20,000,000 = 9, >= 20,000,000 = 10. The initial grade is modified up or down by boolean values assigned by sum of the factors of the other determining criteria. For example, if all 10 of the top 10 listings are deemed "optimized" (by examination of page design and/or number of inbound links, where any site with 100 or more inbound links per year of age is considered "optimized"), then a 10 is added to the initial grade. If all 10 top listings are more than 1 year old, another 10 points are added to the grade. If all 10 top listings are employ the phrase in their title tag, another 10 points are added to the grade. Multiple listings from any site are only counted once. A "site" is defined as a collection of pages intentionally designed to be clustered together as a single content entity. Some domains, such as Geocities.com, may be host to thousands of sites. Some domains, such as Google.com, may be a single cohesive site. The maximum competitiveness grade possible for any search expression is therefore 40. The EXACT FIND match for "search engine optimization" would equate to 8 (approx. 7,000,000 listings) + 9 (all top 10 are optimized, but there is one secondary page) + 9 (all top 10 are more than 1 year old, but there is one secondary page) + 8 (title tags include the exact phrase "search engine optimization" in their title tags). The competitiveness grade for this expression is thus 34. The competitive grading scale is denoted as: 31 - 40 Hyperoptimized 21 - 30 Moderately Optimized 11 - 20 Optimized 1 - 10 Not Optimized The more competitively optimized a targeted expression is, the more difficult it becomes for new content to break into the top 10 listings and the more likely that a significant number of inbound links will be required to enter the top 10 listings for any new site. |
The page was created by Michael Martinez.This page is Copyright © 2005-2006 Michael L. Martinez. All Rights Reserved. No portions of this document may be reproduced electronically or otherwise without express written permission, except as occurs through normal browser caching or search engine indexing. Original document copyrights remain those of their respective owners.