Javascript must be enabled to continue!

Unrestricted Character Encoding for Japanese

The glyphs of the Japanese writing system mainly consist of Chinese characters, and there are tens of thousands of such characters. Because of the amount of characters involved, glyph database creation and character representation in general on computer systems has been the focus of numerous researches and various software systems. Character information is usually represented in a computer system by an encoding. Some encodings target specifically Chinese characters: this is the case for instance of Big-5 and Shift-JIS. Tere are also encodings that aim at covering several, possibly all, writing systems: this the case for instance of Unicode. However, whichever the solution adopted, a significant part of Chinese characters remain uncovered by the current encoding methods. Thanks to the properties and relations featured by Chinese characters, they can be classified into a database with respect to various attributes. First, the formal structure of such a database is described in this paper as a character encoding, thus addressing the character representation issue. Importantly, we show that the proposed logical structure overcome the limitations of existing encodings, most notably the glyph number restriction and the lack of coherency in the code. This theoretical proposal will then be followed by the practical realisation of the proposed database and the visualisation of the corresponding code structure. Finally, an additional experiment is conducted to measure the memory size overhead that is induced by the proposed encoding, comparing with the memory size required by an implementation of Unicode. Once the files are compressed, the memory size overhead is significantly reduced.

IOS Press

Bossard Antoine Kaneko Keiichi

Frontiers in Artificial Intelligence and Applications

2025

Title: Unrestricted Character Encoding for Japanese

Description:

The glyphs of the Japanese writing system mainly consist of Chinese characters, and there are tens of thousands of such characters.

Because of the amount of characters involved, glyph database creation and character representation in general on computer systems has been the focus of numerous researches and various software systems.

Character information is usually represented in a computer system by an encoding.

Some encodings target specifically Chinese characters: this is the case for instance of Big-5 and Shift-JIS.

Tere are also encodings that aim at covering several, possibly all, writing systems: this the case for instance of Unicode.

However, whichever the solution adopted, a significant part of Chinese characters remain uncovered by the current encoding methods.

Thanks to the properties and relations featured by Chinese characters, they can be classified into a database with respect to various attributes.

First, the formal structure of such a database is described in this paper as a character encoding, thus addressing the character representation issue.

Importantly, we show that the proposed logical structure overcome the limitations of existing encodings, most notably the glyph number restriction and the lack of coherency in the code.

This theoretical proposal will then be followed by the practical realisation of the proposed database and the visualisation of the corresponding code structure.

Finally, an additional experiment is conducted to measure the memory size overhead that is induced by the proposed encoding, comparing with the memory size required by an implementation of Unicode.

Once the files are compressed, the memory size overhead is significantly reduced.

Back

Related Results

Zero to hero

Western images of Japan tell a seemingly incongruous story of love, sex and marriage – one full of contradictions and conflicting moral codes. We sometimes hear intriguing stories ...

Transcriptomics extract the key chromium resistance genes of Cellulomonas

Abstract Cellulomonas fimi Clb-11 can reduce high toxic Cr (VI) to low toxic Cr (III). In this study, transcriptomics was used to analyze the key genes, which was involved ...

Foreword

This issue consists of a special report on the Japanese concept of "characters." Since the beginning of this millennium, there has been active discussion of "characters," with a st...

Character Education in Schools: A Comparison of Indonesian and Japanese Policies

Character education in schools is an important concern in improving the quality of education in Indonesia. By comparing policies with Japan, we can understand the differences in ap...

Japanese American Buddhism

Japanese Buddhism was introduced to the United States at the Parliament of World Religions in Chicago in 1893, but the development of Japanese American Buddhism, also known as Nikk...

History of Japanese Labor and Production Management

Tracking with Japan’s macroeconomic fortunes since World War II, global interest in Japanese management practices emerged in the 1950s with the start of Japan’s “miracle economy,” ...

Analisis Nilai Pendidikan Karakter Tokoh Utama dalam Novel Yang Telah Lama Pergi Karya Tere Liye

This study aims to uncover the character education values reflected in the main character in Tere Liye's novel Yang Sudah Lama Pergi (The Long Gone). The focus of the study is to a...

Students’ Attitude Toward Character Building Courses at Bina Nusantara University

Character development is required for students so they can bring benefits to society in the future. Character development in Bina Nusantara University has been carried out since 20...

Email:
Password:

Email: