Here's a failed attempt to localize my blog

这篇文章并非使用中文书写。您可以用Google Translator来辅助阅读。

I just finished the i18n (internationalization) feature of this blog. A natural continuation of i18n would be L10n (localization), which I sparsely worked during last weeks. But this time, it seems that I am a bit out of luck since there are constraints and limitations of Hexo, which made it impossible.

Then I decided to write down this failed attempt, and move on to other subjects.

What is L10n

One may wonder what is a L10n, since often we talk about "internationalization and localization". In fact, these two ideas are often coupled and used in the same context altogether. But there do have differences between them.

More concretely, i18n refer to the action of supporting various languages and cultures with a generalized process. In a website like this static blog, this mostly means that common texts like header, footer, display messages or menu elements should be displayed in supported languages.

But i18n won't change the content of a post, nor will it display different lists of posts to adapt to different locales or languages. In fact, it's indeed this idea of "changing content" and "adapt to a locale" that will fall into the scope of L10n.

Comparing to the i18n, L10n is not only a technical issue, but it also consists of creating a strategy to localize the contents, which could vary in different cases.

This post gives an intuitive example between i18n and L10n by Java code:

1
2
3
4
5
6
7
// i18n

datetime.ToUtc()

// L10n

localTimezone.ConvertDateTime(utcDatetime)

From the perspective of Hexo, i18n is partly supported via official plugin and the Fluid theme, although they are incomplete. I worked in i18n last winter to complete this feature.

But on the other hand, L10n is not supported at all via existing plugins.

Some people may suggest to build sites for different languages individually, and in this way, they will have a working L10n feature. But I discarded this idea from the beginning, not only because this will break the i18n feature that I just implemented, but also because this way would imply a single strategy, which may not be what I wanted.

So I tried to investigate into the Hexo, Fluid and i18n-generator codes.

The i18n generator works in a way to copy original pages and generate same pages in different languages. So the ideal idea is to rely on generator for pages which are not localized yet, and disable this "copy and generate" for localized pages.

Attempts

My first attempt is to rely on existing i18n generator and folder hierachy. I tried the following ones:

1
2
3
4
5
6
7
8
# Original page
/_posts/<xxxxx>.md

# Attempt 1
/_posts/en/<xxxxx>.md

# Attempt 2
/en/_posts/<xxxxxx>.md

No luck, as both attempt 1 and 2 won't be generated alongside the original page.

Then, I had a look on path variable generated for each page, and clearly there is an issue:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Original page
2023/12/15/<xxxxx>.md

# Attempt 1 without locale
2023/12/15/en/<xxxxx>.md

# Attempt 1 with locale
en/2023/12/15/en/<xxxxx>.md

# Attempt 2 will not be generated

# This is what should be done
en/2023/12/15/<xxxxx>.md

I tried to change the path in generator, like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
locals.posts.forEach(function(page){

var lang = page.path.split('/')[3];
if (languages.indexOf(lang) == -1){
i18n.push(page);
}else {
// The page is in a localized lang
page.path = lang + path.replace(new RegExp(`${lang}\/`, 'i'), '')

// original code...
langPaths.push(page.path);
page.lang = lang;
}
});

But it didn't work. The reason may because that the path is not a stored field of page data, but a readonly getter. Assigning updated path to it will change nothing.

Of course, the original i18n generator code is also troublesome like the hard-coded path.split('/')[0] (originally) won't work in my case and the langPaths was not used at all so the control flow is troublesome, but this is another story.

This readonly path will also affect some utility facilities like tagcloud, as obviously the tagcloud helper don't take care of the locale. I would have to change the Hexo script files in node_modules if I want to have a working tag cloud for L10n feature.

Since it's impossible to toggle the generated pages for fixing the path, I went to the lower parts of the i18n generator, in order to see if it's possible to disable the i18n generation for localized page, suppose that I will come back later for wrong paths.

1
2
3
4
5
6
7
8
9
10
11
12
for (var i = 1; i< languages.length; i++){
var l = languages[i];
var path = l + '/' + page.path;
if (langPaths.indexOf(path) != -1){
continue;
}
// original code till here, added the following guard
if (Array.isArray(page.language)) {
if (!page.language.includes(l)) {
continue
}
}

This intention should be quite clear, that's, while looping in pages, if the page has a language attribute in its front-matter with a bunch of languages, and the current language is not listed, then the page will not be generalized in this language.

With this change, it's therefore possible to add two pages and specify their language attributes as complement to achieve the L10n -- at least on the post list. However, knowing that most of posts don't have a front-matter language attribute, this idea could be quite cumbersome.

Another troublesome thing appeared. In fact, what we want to achieve is to have a set of different posts under a same title to serve different locales. However, Hexo and i18n generator will treat each single post as a dedicated post and generate i18n versions for each one of post. This means that if I continue in this way, I have to rewrite the whole i18n feature including Hexo, Fluid and the i18n generator. It's clearly out of scope.

The following example may illustrate what Hexo will understand for my L10n pages:

1
2
3
4
5
6
7
8
9
10
11
# File
2023/12/15/en/<xxxxx>.md
2023/12/15/zh-TW/<xxxxx>.md

# Generated
2023/12/15/en/<xxxxx>.md
en/2023/12/15/en/<xxxxx>.md
zh-TW/2023/12/15/en/<xxxxx>.md
2023/12/15/zh-TW/<xxxxx>.md
en/2023/12/15/zh-TW/<xxxxx>.md
zh-TW/2023/12/15/zh-TW/<xxxxx>.md

Maybe some more investigation?

As I am writing this post, I realized that the current i18n generator works in a single direction. For example, it could generate i18n versions for a zh-CN (default) post. But if a post is supposed to be written in English, it won't generate it back to the zh-CN version. Accessing this post from the main page would therefore result in an error.

This issue was quickly resolved by adding a process for the reverse generation. I was inspired by the original i18n function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
function i18n_post(locals) {
... // Original codes

// Storage for pages written in languages other than default one
var nonDefault = []

locals.posts.forEach(...) {
... // Original codes

var defaultLang = languages[0]

// Checking if the post supports the default lang
if (Array.isArray(page.language)) {
if (!page.language.includes(defaultLang)) {
nonDefault.push(page)
}
}
}

... // Original codes

// Similar process
nonDefault.forEach(function (page) {
var layouts = ['post', 'page', 'index'];

var path = page.path
var copy = {}
_.extend(copy, page)
copy.lang = 'zh-CN'
copy.__post = true
copy.path = path
_self.log.debug("generate default lang post " + copy.path)
result.push({
path: copy.path,
layout: layouts,
data: copy
})
})
return result.length > 0 ? result : [] // Original codes
}

This change was trivial but it could reveal how pages are handled in i18n generator. So maybe there could be still some possibilities for implementing a L10n feature from scratch.

But I also admit that changing node_modules library codes directly is quite horrible. Working with Git-versioned codes is a much more proper way.

Or maybe it would be preferable to switch to some more modern and user-friendly frameworks?

Conclusion

So after some attempts, it seems that it's just impossible to find an easy solution for localization under the framework of Hexo. I have therefore decided to move on to other subjects.

Building i18n and L10n features on an existing codebase is quite a lot of work and it's hard to implement them consistently. Maybe it would be preferable to choose another less restreint frameworks like coding my own site in Vue.js from scratch, but I would consider it later.


Here's a failed attempt to localize my blog
http://inori.moe/2024/01/24/failed-l12n-attempt/
作者
inori
发布于
2024年1月24日
许可协议