.
 
 
The Search Engine Optimization Experts
 

Duplicated Content in DotNetNuke

Every new and existing website must be careful to avoid duplicate content.

 

Duplicated content equals a low ranking - because only the original version is viewed as being relevant by the search engine.

 

If content your is intentionally duplicated - copied from another website either freely or otherwise, then then remove it, or rewrite using your own words.

 

But what about unintentionally duplicated content - you may not even be aware of it. DotNetNuke will let you create duplicate content, so we will look at how it happens and what to do about it.  Most of the problems are caused by the search engine seeing one page many times and treating it as many separate pages, each with duplicated content.

The login control and the register control

 

The login control, www.yoursite.com/default.aspx?ctl=Login and the register control,  www.yoursite.com/default.aspx?ctl=Register are present on most dotnetnuke sites.  Both of these controls work by reloading the web page, and showing only the login or register module. 

 

This leads to a problem - the original page, and the original page with the login or register control on it are treated as three physical pages by the search engine, even though they are actually all the same page.   The login page is treated as a near duplicate of the original, and furthermore the login and register pages are near duplicates of each other.

 

Edit: The following problem has been fixed since DNN version 4.6.2 - another good reason to upgrade to the latest version if you haven't already:
[If a website uses url rewriting, then the problem is worsened.  /dnn/first-page?ctl=Login and /dnn/second-page.aspx?ctl=Login are seen as exact duplicates.  So a website with 100 pages will have 200 exact duplicate pages and 200 near duplicate pages.  Not a good look! - This has been fixed in DNN 4.6.2 onwards.]

Fortunately, the Google webmaster blog says that this sort of duplicated within the same website probably wont get penalized - instead Google will choose just one version of the duplicates and index only that one.

But since there is absolutely no need to have your login or register pages indexed, we can and should solve this by exluding the login and register control pages using robots.txt

 

The privacy control and the terms control

 

The privacy control www.yoursite.com/default.aspx?ctl=Privacy and the terms controls www.yoursite.com/default.aspx?ctl=Terms work in the same way as the login and register - reload the page with a certain web control.  However it is the privacy and terms controls cause the worst duplication .. Look at any DotNetNuke site privacy or terms.  These two controls contain a big chunk of text which is almost 100% identical to each and every other DotNetNuke site on the internet.  The only thing different on these two pages is the sites admin email.  So if you have these pages you have duplicated content that matches 1000s of other DotNetNuke sites.  Your site will also match www.dotnetnuke.com, who have a high PR of 8 -  and the search engine will know for sure that dotnetnuke.com did not copy you.

 

So your sites privacy/terms will never get a good ranking, but what about the rest of your site?  How much effect will this have on your overall ranking?  It wont help, I can guarantee that.  Because the duplication found here is duplicated content found on many other websites, the chances of penalization or loss of ranking are increased greatly.

 

Two or more urls, one page

in DotNetNuke, it is easy to have two or more urls that point to the same page, eg www.mywebsite.com/ and www.mywebsite.com/home.aspx could be the same DotNetNuke tab.  or /tabid/111/default.aspx and /red-widgets.aspx could be the same tab.

There are two things that must be done here.  First use robots.txt to exclude all but one of the urls that point to the same tab - now they will not be indexed and will not be seen as duplicates.  Secondly, make sure that all links in skins use consistant urls for each tab - make sure each  hyperlink to a tab always uses the same url for the same tab.

This is also duplicate content within the same website, so probably wont harm your ranking, but why risk it?

 

robots.txt

 

Removing duplicates created by the login, register, privacy and terms controls is simple, just use robots.txt to exclude all urls that end in ?ctl=xyz

User-agent: *
Disallow: /web/red-widgets.aspx?ctl=Login
Disallow: /web/red-widgets.aspx?ctl=Register
Disallow: /web/red-widgets.aspx?ctl=Privacy
Disallow: /web/red-widgets.aspx?ctl=Terms
Disallow: /web/blue-widgets.aspx?ctl=Login
Disallow: /web/blue-widgets.aspx?ctl=Register
Disallow: /web/blue-widgets.aspx?ctl=Privacy
Disallow: /web/blue-widgets.aspx?ctl=Terms

Do this for every page that has one of the above controls.   The search engines will not index these control pages anymore, and slowly but surely the duplication will no longer be a problem. Have patience, as the duplicates will not be removed immediately.

The robots.txt file must be placed in your website's root directory.  If you can open it in your web browser at www.yoursite.com/robots.txt then it is in the correct place. 

We now have a module for DotNetNuke that can automatically generate the contents of your robots.txt file with just one click - saving you a large amount of time and effort


// this blog post has also been published on www.codeproject.com //

 

Latest Blog Post

DotNetNuke does a pretty good job on security, but is your host password too easy?

Read more ...

 

More Blog Posts

.