|
Home / Internet Marketing
How And Where Search Engines See Duplicate Content
By:Danny Wirken
Introduction
Search engines have become the gateway to information in the Internet. Search engines are so important that websites find that they need to rank well in search engine results pages (SERPs) in order to get noticed. With the numerous websites vying to get into the coveted position of the top 30 results listed in SERPs more and more website owners are using search engine optimization (SEO) techniques to improve their rankings. People who use SEO know that there are certain factors that can affect your ranking positively and of course negatively. Of the negative factors one of the most well-known is duplicate content.
Search engines are biased against duplicate content. As a matter of fact some sites do not get listed in SERPs because of this factor. This happens when crawlers do not index sites which they have previously determined to be a duplicate site of another site. The crawlers skip the duplicate site to be more efficient and save time. Crawler also do this for another reason – to avoid listing duplicate pages in SERPs and thus point users to different sites containing just the same information. Search engines do not like that to happen because it would be irritating for users who expect to see different sites for the different links they click. For similar sites, search engines also usually just list one of the sites and relegate the others under a link that says See related pages. For those that get manage to be listed in the SERPs the page rank is still usually affected and so affects the sites standing.
Where Search Engines See Duplicate Content
So where do crawlers see this duplicate content. And what are the possible content that they would interpret as duplicate? According to an article by William Slawski on Duplicate Content Issues and Search Engines, search engines see duplicate content from the following kind of web pages:
1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites.
2. Alternative print pages – This happens when website owners who are user friendly offer copies of the same documents in different formats for a varied printing options. Although helpful to users it might actually indexed by crawlers as duplicate pages.
3. Pages that reproduce syndicated RSS feeds through a server side script.
4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs.
5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs.
6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs.
7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally. – This is common for company websites that insist on having their logo, description, etc put on every page of their website.
8. Copyright infringement – Plagiarism is of course a good reason for not being indexed. The problem is that crawlers cannot distinguish the original from the duplicate and might mistakenly filter out the original instead.
9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs).
10. Article syndication – Some writer allow their articles to be published in other websites as long as they are given credit for their work. The problem arises when the crawler sees the original article as the duplicate and opts to index duplicate page or at least give it a higher rating.
11. Mirrored sites – Mirrored sites are used to handle the traffic of a very popular site. Mirror sites have a good chance of being ignored by web crawlers and so won’t be indexed.
How Search Engines See Duplicate Content
There are many methods employed by different search engines to determine pages with duplicate content. The methods in many ways, from the concept, to the algorithms, and of course their effectiveness. Search engines are, however, all finding new ways to improve their methods for searching duplicate content as seen by the patents filed by different search engines companies like AltaVista, Microsoft Corporation, Google, and other bodies like the company Digital Equipment Corporation and even the Regents of the University of California.
The different patents include methods for Detecting query-specific duplicate documents, Detecting duplicate and near-duplicate files, clustering closely resembling data objects, identifying near duplicate pages in a hyperlinked database, indexing duplicate database records using a full-record fingerprint, indexing duplicate records of information of a database, utilizing information redundancy to improve text searches and methods and apparatus for detecting and summarizing document similarity within large document sets, and for finding mirrored hosts by analyzing URLs.
Each method is unique and is interesting in its approach. The methods vary greatly from generating fingerprints for records to using query-relevant information to limit the portion of the documents to be compared. Discussing each method would be interesting and would shed light as to how different search engines approach the problem. The new methods are all innovative and if some of them are used in concert with each other, it would surely improve the search engine’s ability to detect duplicate documents. However, since the patent holders are competing companies, it is unlikely that there would be collaboration between them.
Conclusion
As search engines further refine their methods for detecting duplicate content it would be harder for plagiarists to get away with what they do. However, web pages containing duplicate content for a good reason could suffer as well. Furthermore since none of the published patents tackled the issue of differentiating the original content from the duplicate ones refinement in the search engine’s methods might mean further trouble for the website owners of original content. Because of this search engines ought to find ways and invent new methods for identifying original content from duplicate ones as well as valid duplicate content.
Digg
del.icio.us
Blink
Stumble
Spurl
Reddit
Netscape
Furl
Article keywords: internet, SEO, Online Business
Article Source: http://www.articles2k.com
www.theinternetone.net
|
|
| Top Internet Marketing Articles |
- 1). The Fly & The Window Of Opportunity By : Rick Davies
Copyright 2006 1stPromotion.com
One afternoon after going through a small backlog of emails, I was daydreaming about how it shouldn't be tough for absolutely anyone out there to make that first online sale and then to go on and make more and more sales. I feel bad for people that have paid money for some of the programs out there that promise to "stuff their wallets effortlessly" only to find that there is no free lunch.
|
- 2). Sign Our Guestbook – How Signing Guestbook’s Or Creating Your Own Helps Your Website By : Anton Cheranev
Guestbook’s are some of the most popular areas on websites. Before a consumer does business with you, they are likely to check out your guestbook. Guestbook’s are somewhat like testimonials in that they offer real feedback on your site, your business, and your products. They are honest and are created from those that have had some sort of experience with your business or those who simply visit your site.
|
- 3). Where To Find A MySpace Friend Adder By : PaulG
MySpace friend adders have become a great tool in maximizing the MySpace experience. Since MySpace has become popular, so has the importance of having tons of MySpace friends. This can benefit the causal MySpace user, the MySpace Music artist, and even the web publishers and marketers that use MySpace as a stepping stone toward success. Now MySpace friend adders have gone even further by adding other MySpace abilities to their programming.
|
|
|
- 5). 'Social Bookmarking' As An Aggressive and Acceptable Blog Marketing Tactic By : gregw
Social bookmarking websites are becoming more and more popular. They allow you to save bookmarks online and Tag/Categorize them with keywords instead of saving them as bookmarks in the favorite's list of yourbrowser. This is particularly useful when your browser based bookmarks have become unwieldy. It's also help since you can access your bookmarks from any computer where you have an internet connection.
|
- 6). The Four Big Adsense Secrets By : Iszuddin Ismail
Is there a BIG secret to the huge Adsense checks? Yes, you’ve seen those screen shots of 5-figure Adsense earning … so have I. And that made you thinking, why are some people earning those huge income, and you are not.
Whatever people may say, I only see these are the only factors that make the difference between your earning and those ‘big-guns’.
1 - Adsense Placement
Yes, putting your Adsense in a special way can be the difference between 1% click-through rate (CTR) and 10% CTR.
|
- 7). The Downsides Of Reciprocal Linking To Increase Search Engine Rankings By : Brian Gilley
Reciprocal linking is still a highly sought after method for many companies and webmasters hoping to increase their web site’s ranking in the search engines. Back in 2002, when reciprocal linking firmly took its place as a real way to boost your rankings, many webmasters and sites started using it to quickly and frantically get the upper edge on their competition and the search engines.
|
|
|
- 9). Spy on the Competition, Stay on Top By : Jay Chirino
Just when you thought you had this whole online business thing mastered, it turns out that now you must become a secret super spy, break into your competitor’s databases and steal important documented information that you can put to use for your own benefit. So get your rope and mask ready because we’re going in!
Oh, and by the way, I’m just kitting.
|
|
|
| New Internet Marketing Articles |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 10). Why Everyone Needs a Mentor. By : Gary Muller
Why do you need a mentor and what could they
possibly offer to help you grow your business?
Discover why even successful marketers utilize the skills and expertise of an internet marketing
mentor.
|
|
|