An Introduction to Japanese Pronunciation

Ryo Furue (Email: my family name AT hawaii PERIOD edu)

I no longer maintain this page, but I leave it here because it seems some people occasionally find it useful.

Recently, I was informed of a brilliant website which shows the standard Japanese pronunciation of any piece of Japanese text you input: Japanese Phonetic Converter – kanji to romaji or furigana


Children learn their mother tongues without learning theories. Why not adults? Why bother to learn theories? You might ask. Unfortunately, we adults can't learn languages as children do. You could try, but that would be inefficient. What we usually do is first to learn some theories and then to judge your performance against the theories when you practice. If you are rich, you could hire a "coach" who knows those theories and whose job is to observe your performance and advise you on how to improve, just as professional baseball players hire their private coaches. If you don't hire a coach, you need to be your own coach. So, you want to learn the theories.

Here I offer some informal and unauthoritative theories on the pronunciation of the Japanese language. My target audience are American English speakers, but speakers of other languages will also find this guide helpful. I should really cite references; I owe a lot to books I read. It's simply because of my laziness that I don't try to determine which book says what. As such, I can't guarantee the correctness of the information given here.

0. The five vowels

Here is minimum knowledge about Japanese vowels. I postpone more detailed description since there are much more important things than accurate knowledge of each Japanese sound.

The Japanese language has only five vowels. Each vowel is transliterated to each of the five vowel letters of the Roman alphabet. In Japanese written in Roman alphabet,

The most important thing to note is that each letter almost always represents one single vowel sound. (There's one minor exception for "i" and another for "u".) In English, the "i" in "sit" and the "i" in "site" represent quite different vowels. This type of wild variation never occurs in Japanese.

1. Each syllable has the same length and strength.

This rule is the simplest but by far the most important. Of course, even in Japanese, important syllables are pronounced more strongly and they can be slightly longer than the unimportant ones. But, the distinction between stressed and unstressed syllables is much smaller than in English. So much so, it's better to think of Japanese syllables having equal lengths and strengths. You can learn subtle variations after you become proficient in Japanese.

A precise definition of a "syllable" will be given shortly. For the moment, let's pretend that we already know what a syllable is in Japanese.

An effective method of practicing this rule is to clap your hands while you pronounce Japanese words. For example, say "yokozuna" (a sumo champion). This word consists of four syllables: yo-ko-zu-na. So, clap your hands four times: clap-clap-clap-clap, keeping the same interval between the claps. Do it again and say yo-ko-zu-na at the same time so that each clap coincides with one syllable. Also, try to say all the syllables evenly with equal strength.

2. There are no diphthongs in Japanese.

A diphthong is a slide from one vowel to another as in the English word "rain". The vowel in this word is written as [ei] in the International Phonetic Alphabet (IPA). This [ei] is considered a single vowel, a slide from an e-vowel to an i-vowel. Therefore "rain" has only one syllable although it has two vowel letters. The Japanese language doesn't have any diphthongs. Two consecutive vowel letters simply indicate two separate vowels and hence two separate syllables. For example, the word "Inoue" (a common family name) is pronounced as four syllables: i-no-u-e, with four hand claps: clap-clap-clap-clap (See rule 1). Consequently, long sequences of vowels aren't uncommon in Japanese words, such as "Aioi" (a-i-o-i, a placename) and "aoi ie" (a-o-i-i-e, a blue house).

By the way, the astute reader (yes, that's you) may ask this question: How can you tell that "Inoue" should be divided as i-no-u-e? Why not in-o-u-e? A good question. That's the theme of the next three sections.

3. All Japanese syllables are open.

An open syllable is one which ends with a vowel. A syllable ending with a consonant is---, that's right, a closed syllable. For example, "get", "an", "cast", and "sports" are closed, and "sky", "knee", "we", and "a" are open.

The Japanese language has only open syllables, with two exceptions, which will be explained in the next two sections. Since all syllables are open, you can unambiguously divide Japanese words written in Roman alphabet into syllables in many cases. For example, "yokozuna" should be yo-ko-zu-na; other divisions would contain one or more closed syllables: yok-o-zu-na, yo-koz-un-a, etc.

4. The special syllable "N"

One of the two exceptions to the preceding rule is the N-syllable. In Japanese, the sound of "n" in certain circumstances forms a syllable by itself. For example, "kanji" (a Chinese character) has three syllables: ka-N-ji. (As in this example, I'll indicate this syllable with a capital "N" in what follows.)

The word "kanji" is unambiguously divided into syllables because of rule 3 ("kan" cannot be a single syllable because it is closed). On the other hand, there are words which cannot be uniquely divided into syllables from their Roman transliterations alone. For example, "Inoue" could be i-N-o-u-e instead of i-no-u-e. This is one of the shortcomings of Roman transliteration. If the word is written in the Japanese characters, there is no such ambiguity.

There are two schemes to avoid this ambiguity. One is to use an apostrophe as in "Shin'ichi" (a common given name), which is unambiguously divided into syllables as shi-N-i-chi: the apostrophe is there to prevent the division shi-ni-chi. The other scheme is to use a hyphen: "Shin-ichi". Unfortunately, these conventions aren't always followed. So that you need to be aware that a spelling like "kanan" can be ka-N-an-N or ka-na-N. (The fact is that both are possible. Ka-N-an-N means "to take into account" and Ka-na-N is the Japanese pronunciation of a Chinese district.)

An important question is, what N-sound should we use? I'll answer this question in an appendix. For the moment, let's use the English n-sound.

5. A double consonant is a syllable.

The other exception to rule 3 is the syllables consisting of double consonants. What's a double consonant? Read aloud (a nonsensical) "seat top". Compare it with "sea top". What's the difference? The time spent to pronounce the "tt" in "seat top" is approximately twice as long as that of the "t" in "sea top".

To understand this phenomenon, we first need to examine how a single [t] sound is produced. Open your mouth and release air from your lungs; The air freely flows through your mouth and exits. Next, block your airway using your tongue and upper teeth. That's the initial position of your speech organs before the articulation of a [t] sound. Gently apply pressure from your lungs. Because your airway is blocked, the air can go nowhere and so pressure builds up behind your tongue in your mouth. Then, release the blockage at the tip of your tongue. Because of the pressure, air rushes through the narrow gap between the tip of your tongue and your upper front teeth. The noise so created is the sound of [t]. The most important thing to note here is that there are two stages in the articulation of [t]: (1) the building-up of pressure and (2) its release. Such a sound is called a "plosive" in phonetics. (This is an imprecise definition, sorry. If I find the time, I'll give more precise definition later. In this discussion, however, this imprecise one will do.) There are several plosives in English and Japanese. For [p], you block your airway by your lips, but otherwise it's the same as [t]. For [k], you block your airway using the back of your tongue and the upper part (ceiling) of your mouth.

Note that phase 1 of a plosive can be indefinitely prolonged. Block your airway, apply pressure, and freeze. You can stay in that state at least for a few seconds. (If you are a good swimmer, you can do that even for a minute.)

Now let's examine what's going on for the "tt" in "seat top". We recognize that the first "t" of "tt" has only phase 1 of [t]. The release of pressure occurs only at the end of the second "t". In other words, instead of making two separate real t's, we actually make a single "t" but indicate the existence of two t's by lengthening phase 1. That's why the "tt" of "seat top" is about twice as long as the "t" in "sea top" although only one [t] is produced in fact.

Perhaps you have by now guessed why I'm explaining all this. The Japanese language treats the double plosive "tt", "kk", "pp", etc. as a syllable. For example, "kakko" (parenthesis) has three syllables: ka-k-ko, clap-clap-clap. The second syllable consists entirely of phase 1 of the [k] sound, which is silence. In contrast, "kako" (the past) consists of two syllables: ka-ko. Although the second syllable of "kakko" is silent (no sound produced), we discern the presence of the syllable thanks to rule 1. The absence of sound for a fixed interval of time unambiguously indicates the presence of the syllable. The word "kakko" can be distinguished from "kako" solely by this presence of the silent second syllable! I think you've now understood how important rule 1 is for the Japanese language.

6. A long vowel comprises two syllables.

This rule is quite simple. Each of the five Japanese vowels has a long and short versions. The long version comprises two syllables, and so, in accordance with rule 1, the long version is precisely twice as long as the short one.

Perhaps, we can regard this rule as a corollary to rule 2. In fact, the long vowel can be viewed as a sequence of two vowels of the same kind. For example, "Iida" (a common family name) consists of three syllables: i-i-da. The "ii" of "Iida" can be viewed as the long version of "i" or as two consecutive short i's. These two views are equivalent: "ii" comprises two syllables (clap-clap) and is pronounced without any break between the two i's. The word "Iida" can be distinguished from "Ida" (another common family name) thanks to rule 1.

Here's a list of how the long version of each vowel is represented in Roman transliteration:

In section 0, I said that each vowel letter represents exactly one vowel with a minor exception for the letter "i" and another for "u". As the list above shows that "ei" is pronounced e-e, not e-i. This is the only instance where the letter "i" doesn't represent the [i] vowel. Also, the combination "ou" is pronounced o-o, not o-u. This is the only exception for the letter "u". But, these exceptions are minor because if you really pronounce "ei" as e-i, that's passable; your interlocutor may not even notice. Likewise for "ou". As you can see from the list above, there's no universally accepted method of indicating long vowels except for "ei" and "ii". I'll give more details on this matter in an appendix.


The most important things have been told. This section explains each Japanese sound in more details.

(t,d), (k,g), (s,z), (p,b)

A good news is that most Japanese consonants are the same as English ones. In particular, these eight sounds are almost identical or very similar between the two languages.

A bad news is that spellings are sometimes somewhat confusing. [TO BE WRITTEN.]

The letter "y" following another consonant letter

English doesn't use the letter "y" in the way Japanese does, although English does have the same "phenomenon". Say "you" aloud and then "oo" (as in "fool"). What's the difference? The difference is obvious: the existence and absence of the initial y-sound, of course. In the International Phonetic Alphabets, this y-sound is written as [j], so the pronunciations of these two words are written as [ju:] and [u:], respectively. Next, say "few" and "foo" aloud. We have the same difference: [fju:] and [fu:].

The Japanese language has similar pairs; Spellings for them are more systematic. For example, "Tokyo" is pronounced [to:kjo:] (to-o-kyo-o, four syllables) and "toko" (to submit a manuscript) is pronounced [to:ko:] (to-o-ko-o, four syllables).

This kind of y-sound following another consonant can be difficult for English speakers. For example, there's no such combination as "ry" in English. To practice this combination, first say "yo" (as in "yoke") and "o" (as in "oh") alternately. And then, say "ryo" and "ro" alternately.


A1. The pronunciation of the syllabic "N"

A1.1. Before a vowel

English-speaking people have a great difficulty in pronouncing a syllabic N followed by a vowel. For example, "Shin'ichi" should be pronounced shi-N-i-chi, not shi-N-ni-chi. That is, there shouldn't be an [n] sound at the top of the third syllable. How is that possible?

One solution is to briefly stop your voice just after the N syllable, as Germans do before the vowel starting a word. ("Guten Abend" would sound like "Gute Nabend" if pronounced by an English speaker.) This strategy will do in Japanese, although it's not common at all.

The usual pronunciation of the syllabic N before a vowel is a nasal vowel. To understand what a nasal vowel is, we first need to understand what ordinary vowels are and what nasal consonants are.

Air from the lungs has two potential exits: the mouth and the nose. When we pronounce a normal vowel, we close the "door" (uvula) to the nasal cavity, so that the air flows only through the mouth. On the other hand, when we pronounce the English [n] and [m], we block the airway by the tongue and upper teeth (for [n]) or by lips (for [m]). At the same time, we open the door to the nasal cavity. So that the air exits solely from the nose. Now, what if we open the "door" (uvula) and the mouth at the same time? That's the nasal vowel. The air exits both from the mouth and nose. The "vowel" so produced obtains a color or hint of an [n].

The N in shi-N-i-chi is actually the nasal version of [i] because it follows a real vowel [i]. If you produce the real [n] by blocking the mouth airway by the tongue and upper teeth, then the next syllable will inevitably become ni, instead of the plain i.

A1.2. Before "k" or "g"

This pronunciation of the syllabic N is easy for English speakers because it is exactly what they are doing in speaking English. Consider the English word "ankle". How is the "n" in it pronounced? Pronounce "Anne" (the female name) and "ankle" alternately and observe the difference in the positions of your tongue when you pronounce the "n" parts of the words. [TO BE CONTINUED]

A2. How to spell long vowels?

The biggest shortcoming of Roman transliteration of Japanese is that there's no universally accepted method of indicating long vowels except for "ii". The Ministry of Education of Japan once endorsed the use of overbar (Place a short horizontal bar on the vowel letter), but it never saw wide use.

For the vowels [a] and [o], there are three common schemes.

One of the most frequently used methods is just to give up indicating the length! For example, "Ryo" (my first name) could be ryo-o (two syllables with a long "o") or ryo (one syllable with a short "o"). From the spelling alone, you can't tell which. It's in fact a long "o", but I've given up indicating it. "Tokyo" is the same: both o's are in fact long (which makes this word consist of four syllables: to-o-kyo-o). "Kyoto", on the other hand, is Kyo-o-to.

Another widely used scheme is to add an "h" after "a" and "o" as in "rahmen" for ra-a-me-N (a Japanized Chinese noodle) and "Endoh" for e-N-do-o (a common family name). This scheme works in many cases, but can result in ambiguity as in "Ohita" (a place name): Is it o-o-i-ta (with a long "o") or o-hi-ta (with a short "o" plus a syllable "hi")?

Another scheme uses two vowel letters as in "raamen" and "Endoo". "Raamen" is OK, but "Endoo" would be pronounced wrongly by English speakers.

For the long [o], there's yet another scheme: "ou". This is the exception for the letter "u" mentioned in section 0. The combination "ou" is pronounced o-o (long o), not o-u. But I said the exception is a minor one, because even if you pronounce o-u instead of o-o, you'll sound OK.

For the vowel [e], the long version is most often written as "ei". For example, "sensei" (teacher, doctor, or professor) is pronounced se-N-se-e with a long "e", not se-N-se-i. This is the exception for the letter "i" I mentioned in section 0. But, remember I said this exception is a minor one. It is minor because even if you really say se-N-se-i instead of se-N-se-e, you will sound OK. Your Japanese interlocutor may not even notice that your pronunciation isn't quite right. I would say that the spelling "ei" should be pronounced e-e not e-i, which is substandard, but that the latter isn't odd, either.

For the long [u], we usually give up indicating that it's long and simply write "u". So, "Kyushu" (a place name) is kyu-u-shu-u.

You see, a lot of confusions are due to Roman alphabets used to write Japanese.