Kojex Consult Blog: Duplicate Content Filter: What it is and how it works

Duplicate Content has become a huge topic of discussion lately, thanks to the new filters that search engines have implemented. This article will help you understand why you might be caught in the filter, and ways to avoid it. We'll also show you how you can determine if your pages have duplicate content, and what to do to fix it.

Search engine spam is any deceitful attempts to deliberately trick the search engine into returning inappropriate, redundant, or poor-quality search results. Many times this behavior is seen in pages that are exact replicas of other pages which are created to receive better results in the search engine. Many people assume that creating multiple or similar copies of the same page will either increase their chances of getting listed in search engines or help them get multiple listings, due to the presence of more keywords.

In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, and the spam along with it. Unfortunately, good, hardworking webmasters have fallen prey to the filters imposed by the search engines that remove duplicate content. It is those webmasters who unknowingly spam the search engines, when there are some things they can do to avoid being filtered out. In order for you to truly understand the concepts you can implement to avoid the duplicate content filter, you need to know how this filter works.

First, we must understand that the term "duplicate content penalty" is actually a misnomer. When we refer to penalties in search engine rankings, we are actually talking about points that are deducted from a page in order to come to an overall relevancy score. But in reality, duplicate content pages are not penalized. Rather they are simply filtered, the way you would use a sieve to remove unwanted particles. Sometimes, "good particles" are accidentally filtered out.

Knowing the difference between the filter and the penalty, you can now understand how a search engine determines what duplicate content is. There are basically four types of duplicate content that are filtered out:

Websites with Identical Pages - These pages are considered duplicate, as well as websites that are identical to another website on the Internet are also considered to be spam. Affiliate sites with the same look and feel which contain identical content, for example, are especially vulnerable to a duplicate content filter. Another example would be a website with doorway pages. Many times, these doorways are skewed versions of landing pages. However, these landing pages are identical to other landing pages. Generally, doorway pages are intended to be used to spam the search engines in order to manipulate search engine results.

Scraped Content - Scraped content is taking content from a web site and repackaging it to make it look different, but in essence it is nothing more than a duplicate page. With the popularity of blogs on the internet and the syndication of those blogs, scraping is becoming more of a problem for search engines.

E-Commerce Product Descriptions - Many eCommerce sites out there use the manufacturer's descriptions for the products, which hundreds or thousands of other eCommerce stores in the same competitive markets are using too. This duplicate content, while harder to spot, is still considered spam.

Distribution of Articles - If you publish an article, and it gets copied and put all over the Internet, this is good, right? Not necessarily for all the sites that feature the same article. This type of duplicate content can be tricky, because even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not, according to some experts.

So, how does a search engine's duplicate content filter work? Essentially, when a search engine robot crawls a website, it reads the pages, and stores the information in its database. Then, it compares its findings to other information it has in its database. Depending upon a few factors, such as the overall relevancy score of a website, it then determines which are duplicate content, and then filters out the pages or the websites that qualify as spam. Unfortunately, if your pages are not spam, but have enough similar content, they may still be regarded as spam.

There are several things you can do to avoid the duplicate content filter. First, you must be able to check your pages for duplicate content. Using our Similar Page Checker, you will be able to determine similarity between two pages and make them as unique as possible. By entering the URLs of two pages, this tool will compare those pages, and point out how they are similar so that you can make them unique.

Since you need to know which sites might have copied your site or pages, you will need some help. We recommend using a tool that searches for copies of your page on the Internet: www.copyscape.com. Here, you can put in your web page URL to find replicas of your page on the Internet. This can help you create unique content, or even address the issue of someone "borrowing" your content without your permission.

Let's look at the issue regarding some search engines possibly not considering the source of the original content from distributed articles. Remember, some search engines, like Google, use link popularity to determine the most relevant results. Continue to build your link popularity, while using tools like www.copyscape.com to find how many other sites have the same article, and if allowed by the author, you may be able to alter the article as to make the content unique.

If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter; the Similar Page Checker could help you make your content unique. Further, the more relevant articles you can add to compliment the first article, the better. Search engines look at the entire web page and its relationship to the whole site, so as long as you aren't exactly copying someone's pages, you should be fine.

If you have an eCommerce site, you should write original descriptions for your products. This can be hard to do if you have many products, but it really is necessary if you wish to avoid the duplicate content filter. Here's another example why using the Similar Page Checker is a great idea. It can tell you how you can change your descriptions so as to have unique and original content for your site. This also works well for scraped content also. Many scraped content sites offer news. With the Similar Page Checker, you can easily determine where the news content is similar, and then change it to make it unique.

Do not rely on an affiliate site which is identical to other sites or create identical doorway pages. These types of behaviors are not only filtered out immediately as spam, but there is generally no comparison of the page to the site as a whole if another site or page is found as duplicate, and get your entire site in trouble.

The duplicate content filter is sometimes hard on sites that don't intend to spam the search engines. But it is ultimately up to you to help the search engines determine that your site is as unique as possible. By using the tools in this article to eliminate as much duplicate content as you can, you'll help keep your site original and fresh.

Source: webconfs

Pages

Wednesday, December 21, 2011

Duplicate Content Filter: What it is and how it works

No comments:

Post a Comment

StatCounter