Posts tagged SEO
… or how to avoid duplicate content by keeping the current language in the URL
Preface: Earlier this year I posted about Django CMS 2.2 features that I want to see and one of the things mentioned there was that once you have chosen the language of the site there is no matter whether you will open “/my_page/” or “/en/my_page/” – it just shows the same content. The problem is that this can be considered both duplicate and inconsistent content.
Duplicate because you see the same content with and without the language code in the URL and inconsistent because for the same URL you can get different language versions i.e. different content.
Solution: This can be easy fixed by using a custom middleware that will redirect the URL that does not contain language code. In my case the middleware is stored in “middleware/URLMiddlewares.py”(the path is relative to my project root directory) and contains the following code.
from cms.middleware.multilingual import MultilingualURLMiddleware from django.conf import settings from django.http import HttpResponseRedirect from django.utils import translation class CustomMultilingualURLMiddleware(MultilingualURLMiddleware): def process_request(self, request): lang_path = request.path.split('/') if lang_path in settings.URLS_WITHOUT_LANGUAGE_REDIRECT: return None language = self.get_language_from_request(request) translation.activate(language) request.LANGUAGE_CODE = language if lang_path == '': return HttpResponseRedirect('/%s/' % language) if len([z for z in settings.LANGUAGES if z == lang_path]) == 0: return HttpResponseRedirect('/%s%s' % (language, request.path))
Now a little explanation on what happens in this middleware.
Note: If you are not familiar with how middlewares work go and check Django Middlewares.
Back to the code. First we split the URL by ‘/’ and take the second element(this is where our language code should be) and store in lang_path(8).
URLS_WITHOUT_LANGUAGE_REDIRECT is just a list of URLs that should not be redirected, if lang_path matches any of the URLs we return None i.e. the request is not changed(9-10). This is used for sections of the site that are not language specific for example media stuff.
Then we get language based on the request(11-13).
If lang_path is empty then the user has requested the home page and we redirect him to the correct language version of it(14-15).
If lang_path does not match any of the declared languages this mean that the language code is missing from the URL and the user is redirected to the correct language version of this page(16-17).
To make the middleware above to work you have to update your settings.py.
First add the middleware to your MIDDLEWARE_CLASSES – in my case the path is ‘middleware.URLMiddlewares.CustomMultilingualURLMiddleware’.
Second add URLS_WITHOUT_LANGUAGE_REDIRECT list and place there the URLs that should not be redirected, example:
URLS_WITHOUT_LANGUAGE_REDIRECT = [ 'css', 'js', ]
Specialties: If the language code is not in the URL and there is no language cookie set your browser settings will be used to determine your preferred language. Unfortunately most of the users do not know about this option and it often stays set to its default value. If you want this setting to be ignored just add the following code after line 10 in the middleware above:
if request.META.has_key('HTTP_ACCEPT_LANGUAGE'): del request.META['HTTP_ACCEPT_LANGUAGE']
It removed the HTTP_ACCEPT_LANGUAGE header sent from the browser and Django uses the language set in its settings ad default.
URLS_WITHOUT_LANGUAGE_REDIRECT is extremely useful if you are developing using the built in dev server and serve the media files trough it. But once you put your website on production I strongly encourage you to serve these files directly by the web server instead of using Django static serve.
Final words: In Django 1.4 there will be big changes about multilingual URLs but till then you can use this code will improve your website SEO. Any ideas of improvement will be appreciated.
… or how to make user editable 404 page that stays in the pages tree of the CMS
Basics: Yes you need it! You need 404 page cause you never know what may happen to a link: bad link paste, obsolete or deleted article, someone just playing with your URLs etc. It is better for both you and your website visitors to have a beauty page that follows the website design instead of the webserver default one that usually contains server information which is possible security issue. With Django this is easy, just make a HTML template with file name 404.html, place it in you root template directory and voilà – you are ready. You will also automatically have a request_path variable defined in the context which caries the URL that was not found.
Problem: sometimes clients require to be able to edit their 404 pages. Or other times you need to use some custom context or you want to integrate plug-ins and be able to modify them easy trough the CMS administration. For example: you want do display your brand new awesome “Sitemap Plug-in” on this 404 page.
Solution: Django allows you to specify custom 404 handler view so you just need to define one, set it in urls.py and make it to render the wanted page:
# in urls.py handler404 = 'site_utils.handler404' # in site_utils.py from cms.views import details def handler404(request): return details(request, '404-page-url')
Where ’404-page-url’ is the URL of the page you want to show for 404 errors. So everything seems fine and here is the pitfall. If you use it this way your web page will return “200 OK” instead of “404 Not Found”. This could kill your SEO(except if you want your 404 page as first result for your website). So you just need to add a 404 header to the response:
def handler404(request): response = details(request, 'novini') response.status_code = 404 return response
Final words: Why are these HTTP status codes so important. The reason is that they tells the search engines and other auto crawling services what is the page status. Is it normal page, redirect, not found, error or something else. Providing incorrect status codes may/will have a negative effect on your website SEO so try to keep them correct, especially when it is easy to achieve as in the example above.
Note: If the code above is not working for you, please check Allan’s solution in the comments
… this is the “Post on request” for February 2011
Preface: I have to admit that I was expecting a bigger interest in the “Post on request” topic but probably my blog is too young for this but I think I will try again(soon or not). The more important thing is that Jonas Obrist is the indisputable winner of this month “contest”.
Features: One of the most useful features in the next Django CMS must be the ability to copy placeholder’s content between different language version of one page. For example imagine that you have a home page with several placeholder each with several plugins/snippets inside(latest news, featured products etc.) when creating new language version of the page it is really annoying when you have to add this one by one.
One other feature requested by my colleague Miro is a little opposite of the one I want, he want to be able to add different page templates for each language version of the page. I also find this useful because sometimes your language version are not fully mirrored even on the same page.
Bugs: I am not sure that this is actually a bug but I think it is bad for SEO so I will mark it. If you have a multilingual website, once you have the language set in the cookie the same page is displayed no matter whether you have the language code in the URL. For example, “/en/news/” equals to “/news/”. This causes a duplicate content which is considered bad for SEO and also is misleading because every visit of “/news/” with different language in the cookie return different content. I have done some fix for this that will be presented in the next post.
Conclusion: Thanks to all participants and to the guys at Django CMS – you do a really amazing job, I hope that you will like the features I proposed and that we will be able to see them in the next version. Comments and replies are welcomed as ever.
For a long time in the web world there was a feud between the developers and the SEO guys. Both groups claiming that their work is more important, more sophisticated and with bigger value to the client and website visitors. A have to admit that(as a developer) I was also focused only on my work ignoring the SEO part and leaving it to the others. But for the last year I started to give more attention to this field.
Do not get me wrong, I am not going to become a SEO Expert but the truth is that like every other thing the web is complex system. For example – you can hire the best auto engineer but if he is alone you will never get a car that sells. You need a team – engineers, designer, managers, ergonomics experts etc. Everyone of them should focus in his field but also to cooperate with others to provide the optimal solutions.
As a developers our task is to know the basic principles of the semantic web and search engine optimization and to build applications conformable with them. Then the SEO experts come and do their magic. But it is a process that have to be walked hand by hand for both of us. It is always easier if it is made by design(architecture) the to rewrite it later.
As a final word I want to state that I have a really great time at this meeting and I hope that this was my first but not last attendance on event of this kind.