Last year I wrote a post on how to write Japanese on a western style keyboard. You can find that HOWTO along with some fascinating backstory here. As I was already comfortable with Hangul (Korean), that meant that of the “big 3” Asian languages, I had a (very) basic comprehension of how to write with 2 of them. Well, I figured today seemed like a great day to tackle what is perhaps the most daunting one of all. That’s right, that means this HOWTO is going to cover how to write Chinese on your western keyboard! However, before we can delve into the mechanics of how the process works, we need to understand a little bit of the history.
Disclaimer: Everything you are about to read is based on my time spent with Google and Wikipedia. I’ve almost certainly got some or even much of this wrong so if you do see something I have incorrect, please let me know in the comments and I can update it.
First and foremost, it’s important to know that according to Wikipedia, “Chinese” is not one cohesive language but is rather a collection of between 7 and 13 “language dialects” that collectively form the “language family” that Westerner’s tend to think of as Chinese. Of these dialects, by far the most common is one you’ve almost certainly heard of and that is called “Mandarin.” But that’s not the only one. There are other dialects such as “Wu”, “Yue”, and “Min”. What’s fascinating about these is that they are what is described as “mutually unintelligible” languages. That is to say, despite sharing similar roots, a speaker from one dialect will not be able to readily communicate with a speaker from another dialect. Think Spanish and Italian. Same Latin root, but if a Spanish person travels to Italy, they would have difficulty communicating.
“Standard Chinese” or that is to say the “official” language adopted by the Chinese government is based off of the “Beijing dialect” and is also known as Pekingese. It is specifically adopted by the “People’s Republic of China” (PRC) along with the “Republic of China” (aka Taiwan). It’s also one of four official languages in Singapore. There is another subset of Chinese called Cantonese which is considered more of a “prestige” language and is the official language of Hong Kong. The important thing to take away from this is that, at least as far as my research has revealed for the level of detail I require for this post, all of the languages above use effectively the same written characters. That is to say, if you know what you want to say, you should be able to follow the steps below to write your electronic message in any of the possible Chinese dialects.
The next thing to keep in mind is that generally speaking, like Korean and Japanese, a character generally represents one spoken syllable. That is to say, each “block” should represent one syllable. Also like Korean and Japanese but seemingly taken much further is the fact that a block/character may be a word on its own or a part of a polysyllabic word. As you’ve likely heard, Chinese doesn’t have an alphabet in the sense that Westerners are familiar with. Often times, a character (aka a series of strokes) represents an entire meaning on its own. Put another way, each character is an atomic unit and cannot be further broken down. It is estimated that there are over 60,000 unique characters in the Chinese language. The good news is that research has suggested that a college educated Chinese citizen only needs to know about 3,000 to 4,000 characters to be considered fluent.
While not directly related to his HOWTO, an interesting by-product comes from this approach to language. No alphabet means no way of readily organizing a list of words and definitions in a way that can be quickly referenced. In other words, dictionaries become exceptionally difficult to build and as a result many different approaches are used. One such approach works by sorting by “radicals” aka the roots of a given word. One such system is called the “Kangxi system” and was created in the 1700s. It breaks the language down into 214 discrete radicals. A full list of those radicals can be found here. Another method is organizing words by the number of strokes they contain. With so many characters though, that means you’ll have hundreds or even thousands of words per number of strokes. I must admit that I have a new appreciation for Chinese students, especially those that had to do language homework before the arrival of the computer!
If you’ve read this far, there should be a really obvious question: If there are tens of thousands of characters and each one is atomic, how the heck can you write the language on a keyboard? Forget if the keyboard is designed for Roman alphabets or not, how would you write the language on any keyboard that didn’t have thousands of keys on it? Based on my research it appears that as computers started to be developed, the Chinese came to the same conclusion. Their solution was instead to find a way to directly bridge the Roman Alphabet with Chinese characters so that the existing computer infrastructure could be easily leveraged. In Korean and even in Japanese by contrast, it’s possible technically to have an exclusively domestic language keyboard where the Romanized keys are not even present. This does not appear to be the case with Chinese however.
Instead, in the 1950s the Chinese developed what is essentially a built in Roman to Chinese conversion mechanism called Hanyu Pinyin and is now the official way to convert between Chinese and Roman languages. That is to say, if you need to convert a Chinese name into English, you’d use Pinyin to do it. It’s also the method used to enter Chinese characters into a computer which will be our focus today.
Let’s look at an example, shall we and use one the more famous introductory language words:
In Simplified Chinese, this looks like:
This is broken up into two syllables which (very roughly) transliterate into “ping” and “guo”. Therefore, to write this word in Chinese on a computer, you literally type “pingguo” and the IME (Input Method Editor, discussed in more detail in my Japanese HOWTO) will look up and insert the corresponding Chinese characters. Below is a chart of the 412 possible combinations of sounds that from what I understand can ultimately be used to make any Chinese character and therefore even Chinese speakers who have no interest in learning English whatsoever must still memorize this chart.
That’s the high level so now let’s delve into the details. Note that for my screenshots and examples below, I am using an English language copy of Windows 8.1 Professional.
- First we need to add the Chinese language to our machine. To do this go to Start / Control Panel and select Language. Then choose Add a language. In my screenshot below you’ll note that I’ve previously added Japanese and Chinese.
- When you add the language, you’ll have the option to include Chinese (Simplified) or Chinese (Traditional). Note that based on my research, the handwriting recognition components are only installed if you install the traditional language. (Unless, oddly if you have a copy of Windows purchased in Taiwan).
Actually, let’s take a moment to discuss politics. Language is forever intertwined with the politics and culture of a country and with one as old and as storied as China, that has produced some rather complex rules and oddities. Take the screenshot below for example. You’ll note that Windows makes reference to “Hong Kong SAR”. It took me a while to figure out what SAR meant but turns out it stands for “Special Administrative Region”. I’m not going to open the can of worms that is the “one country” debate in the region but it’s interesting to see how new language terms have to be invented to try to accommodate these political realities.
- Once the language is installed, you’ll have access to an Options menu. We don’t need to change anything in there but I wanted to show it to you anyway to give you an idea of the kinds of options that the developers felt were necessary to include
- The two most interesting options to me are the Cloud Input Method and the Fuzzy pinyin. It turns out that due to the sheer number of characters and the massive number of people that use the language, the rules are not quite as absolute for some aspects of the language as you might expect. From what I understand, the fuzzy pinyin is a means of expanding the number of matching Chinese characters displayed based on what is entered. In other words, if you enter “in” or “ing”, the results for both identified character sets would be included. The cloud input method appears to effectively be a kind of reverse spell check where common pinyin mis-spellings can be updated to include the correct Chinese characters
- Once you have the Chinese language installed, you’ll have the language bar installed in your system tray that should default to display ENG. If you click on it, you’ll have the option to select the new Chinese IME you have installed
- Once you’ve got that (and provided you also installed Traditional Chinese), you’ll have two options for Chinese character input – Pinyin and handwriting recognition. Let’s look at the latter first. Now obviously if you have a tablet or a touch enabled device, this would be easier to work with but even with a mouse it can still be functional. You can bring up the hand writing option by first selecting the touch keyboard on your system tray (which if not present can be brought up by first right clicking on the taskbar and choosing Toolbars / Touch Keyboard) and then selecting the icon that looks like a pen writing on a screen as shown below
- At this point you can use your mouse (or finger or stylus as available) to write the desired Chinese Characters. As you can see below, here is my attempt at writing “apple”. The recognition algorithm is surprisingly good
- As you can see below, it correctly matched the right second character but also gives you a string of other possible matches along the top
- This next piece is also optional but I felt it was important to include. Chinese is a very tonal language and as such multiple different spellings can be had from what to an English speaker sounds like the same word
- Pinyin accounts for this by making heavy use of “accents” aka those little ticks you see on the letter ‘e’ in French for example. The critical thing to remember here is that the Chinese have for all intents and purposes repurposed the English Roman alphabet for their own bidding. The shapes may match what you recognize but that was only to make their life easier in integrating with existing computers. The sounds of these letters are often quite different from what you and I know.
Let’s look at an example that brings together everything we’ve learned. I want to write the sentence “I am learning Mandarin” in Mandarin on my computer. How do I do that? Well first I found a website that included the the phrase as well as an audio file of some saying it. That can be found here.
Next, I used Google Translate to break down each of the base characters / syllables into their English meanings. Fortunately with a simple sentence like this, they broke up quite naturally into a 1 to 1 relationship. Remember that is not going to happen often. That leaves us with this chart:
You’ll note the section for the “other possible intonations”. The link above includes how each of those sound and as you can see each word has 4 different possible accents which are defined by the type of tick they have on their vowel. It turns out that the unicode standard doesn’t appear to fully define these Pinyin marks but I found a guy who wrote a small application that allowed me to easily insert them. That can be found at http://pinyintones.codeplex.com/. Once it’s installed, you’ll have a new IME called Japanese PinyinTones. Why “Japanese”? The developer says it’s to get around a bug when using the application with Microsoft Word. Anyway, once this is installed and this IME is activated, if you type any one of the 412 valid Pinyin combinations and then press numbers 1 through 4 on your keyboard, the proper accent will be automatically added.
Let’s Put it all together now!
It’s time to bring home everything we learned and write our first sentence in Chinese! Remember, our goal is to write “I am learning Mandarin” – in Mandarin.
- Open up your word processor. In my case I’m just using Wordpad
- Make sure your IME is set to use the Simplified Chinese language in the system tray as shown here:
- Because we are diligent students, we have memorized that the word for “I” in Mandarin is 我 and is pronounced “Wha”. In Pinyin, this is written as pu so type that into your word processor
- A popup will appear. If the correct character is displayed, press Space to accept it (note that Chinese doesn’t reply on spaces as punctuation nearly to the same extent as English does)
- Next type in xue which is the Pinyin for “learning”. If the default character is the wrong one, press the corrosponding number on your keyboard for the correct character to insert it
- Now type putonghua which is the Pinyin for “Mandarin”. (Note: The apostrophe seperators are inserted automatically and will clear automatically once you insert the Mandarin.)
Congratulations, you are now learning Mandarin!
BONUS: How to make your entire Operating System in Chinese!
While not required to write Chinese characters, what if you wanted to make your entire operating Chinese but you didn’t buy your computer in a Chinese speaking country? Unlike previous versions of Windows, it is now possible to completely change the core language of an OS without reinstalling Windows. Here’s what you need to do:
- First you need to download the “MUI” or “Multilingual User Interface” for your desired language. Someone was nice enough to link to all of the available MUI packs here:
http://winaero.com/blog/download-official-mui-language-packs-for-windows-8-1-windows-8-and-windows-7/ (Warning, these files can be hundreds of megabytes in size)
- Download the file you need from that site and note that it is a .CAB file
- From Windows, open a run prompt and type lpksetup
- Choose Install Display Languages and choose Browse. Find the file you downloaded and click on it
- Note that this will take several minutes or even much longer to install so be patient
- Once it’s installed, go to Control Panel / Language / and choose Advanced settings on the left hand pane
- Under Override for Windows display language, choose the language from the pack you downloaded. You will be prompted to log out and back in again. Do so.
Congratulations, your Windows operating system is now running in your desired target language!