… or how to avoid duplicate content by keeping the current language in the URL

Preface: Earlier this year I posted about Django CMS 2.2 features that I want to see and one of the things mentioned there was that once you have chosen the language of the site there is no matter whether you will open “/my_page/” or “/en/my_page/” – it just shows the same content. The problem is that this can be considered both duplicate and inconsistent content.
Duplicate because you see the same content with and without the language code in the URL and inconsistent because for the same URL you can get different language versions i.e. different content.

Solution: This can be easy fixed by using a custom middleware that will redirect the URL that does not contain language code. In my case the middleware is stored in “middleware/URLMiddlewares.py”(the path is relative to my project root directory) and contains the following code.

from cms.middleware.multilingual import MultilingualURLMiddleware 
from django.conf import settings
from django.http import HttpResponseRedirect
from django.utils import translation

class CustomMultilingualURLMiddleware(MultilingualURLMiddleware): 
    def process_request(self, request):
        lang_path = request.path.split('/')[1]
        if lang_path in settings.URLS_WITHOUT_LANGUAGE_REDIRECT:
            return None
        language = self.get_language_from_request(request) 
        translation.activate(language) 
        request.LANGUAGE_CODE = language
        if lang_path == '': 
            return HttpResponseRedirect('/%s/' % language)
        if len([z for z in settings.LANGUAGES if z[0] == lang_path]) == 0:
            return HttpResponseRedirect('/%s%s' % (language, request.path))

Now a little explanation on what happens in this middleware.

Note: If you are not familiar with how middlewares work go and check Django Middlewares.

Back to the code. First we split the URL by ‘/’ and take the second element(this is where our language code should be) and store in lang_path(8).
URLS_WITHOUT_LANGUAGE_REDIRECT is just a list of URLs that should not be redirected, if lang_path matches any of the URLs we return None i.e. the request is not changed(9-10). This is used for sections of the site that are not language specific for example media stuff.
Then we get language based on the request(11-13).
If lang_path is empty then the user has requested the home page and we redirect him to the correct language version of it(14-15).
If lang_path does not match any of the declared languages this mean that the language code is missing from the URL and the user is redirected to the correct language version of this page(16-17).

To make the middleware above to work you have to update your settings.py.
First add the middleware to your MIDDLEWARE_CLASSES – in my case the path is ‘middleware.URLMiddlewares.CustomMultilingualURLMiddleware’.

Second add URLS_WITHOUT_LANGUAGE_REDIRECT list and place there the URLs that should not be redirected, example:

URLS_WITHOUT_LANGUAGE_REDIRECT = [
    'css',
    'js',
]

Specialties: If the language code is not in the URL and there is no language cookie set your browser settings will be used to determine your preferred language. Unfortunately most of the users do not know about this option and it often stays set to its default value. If you want this setting to be ignored just add the following code after line 10 in the middleware above:

if request.META.has_key('HTTP_ACCEPT_LANGUAGE'):
    del request.META['HTTP_ACCEPT_LANGUAGE']

It removed the HTTP_ACCEPT_LANGUAGE header sent from the browser and Django uses the language set in its settings ad default.

URLS_WITHOUT_LANGUAGE_REDIRECT is extremely useful if you are developing using the built in dev server and serve the media files trough it. But once you put your website on production I strongly encourage you to serve these files directly by the web server instead of using Django static serve.

Final words: In Django 1.4 there will be big changes about multilingual URLs but till then you can use this code will improve your website SEO. Any ideas of improvement will be appreciated.