International Enhancements in Java SE 6
One important strength of the Java
Resource Control and Access
|

To provide localized resources in applications, programmers should use resource bundles as defined by the java.util.ResourceBundle
class. This class initiates the searching and loading of localized resources when you invoke its static getBundle
method. The method returns a ResourceBundle
instance that is responsible for providing the localized text, images, and other elements for a target locale. A locale is a cultural identifier defined by a language and geographical region.
Although the default algorithms for searching and loading bundles are well defined, Java SE 6 more clearly specifies resource caching and provides you more control over how your applications search and load localized resources. Applications should continue to use ResourceBundle
methods to retrieve resources, but new features allow you great flexibility in how and where you store the actual content that ResourceBundle
objects provide.
Before Java SE 6, programmers usually stored localized content in either a subclass of ListResourceBundle
or as a properties file. Now, however, you can specify different formats for your resource files. You can, for example, create and use an XML-formatted resource file. You might also decide to change the default naming scheme for localized files. This extra control is available from custom ResourceBundle.Control
classes that you can implement.
The ResourceBundle.Control
class exposes the major steps of the bundle-loading process. Each step has a separate method in the class. You can override those methods to provide customized strategies for searching, loading, and caching resources. Because the Control
class defines methods that implement the existing default strategies, you have to implement only the customized functionality that you want for your particular subclass. By providing your own Control
subclass to the getBundle
method, you control exactly how your application finds and uses localized resources.
Of course, you don't have to create your own Control
class. You can always use the predefined, default Control
. The default class provides methods that implement the default behavior. In the following example, the getBundle
method uses the default Control
:
Locale targetLocale = new Locale("fr", "FR"); // French language, French region ResourceBundle myResources = getBundle("com.sun.demo.intl.AppResource", targetLocale); |
If your host's default locale is en_US
, the default Control
object searches for the following localized AppResource
names in this example:
com.sun.demo.intl.AppResource_fr_FR com.sun.demo.intl.AppResource_fr com.sun.demo.intl.AppResource_en_US com.sun.demo.intl.AppResource_en com.sun.demo.intl.AppResource |
For each bundle name in the preceding list, the default Control
searches for two implementation formats: ResourceBundle
subclasses (.class
format) and PropertyResourceBundle
property files (.properties
format). When it finds a resource in either format, it determines the bundle's parent chain and returns the ResourceBundle
instance. Notice also that the bundle names use locale-specific suffixes -- for example, fr_FR
, fr
, and en_US
-- to differentiate among the various localized bundles with the same base name, AppResource
. Additionally, the default behavior caches bundles. Repeated invocations of getBundle
return cached resources if you ask for the same bundle name. The Java platform documentation describes the getBundle
method behavior in detail.
In some situations, you may prefer a different bundle-loading strategy. The next few sections describe scenarios that differ from the default. The scenarios are the following:
- You deploy only properties files. No class file bundles will be in your application.
- You decide that you want to store resources in locale-specific subdirectories.
- You want cached resources to expire after some time period.
Properties-Only Searches
Some bundle-loading strategies don't require a fully customized Control
subclass. Instead, use the Control
class's static getControl
method to enforce some standard options that differ only slightly from the default. For example, if your application uses properties files exclusively, you can avoid the overhead of searching for ResourceBundle
subclasses. Instead, you can retrieve a control that searches only for properties files.
Call the Control.getControl
method with a List<String>
of file formats that should be supported. The predefined string values are java.class
and java.properties
. Three static final constants define a nonmutable list containing each list option:
FORMAT_CLASS
defines a list containing the string java.class
. FORMAT_PROPERTIES
defines a list containing the string java.properties
. FORMAT_DEFAULT
defines a list containing both java.class
and java.properties
.
Use the Control.FORMAT_PROPERTIES
constant to create a Control
object that searches for properties files only:
Control propOnlyControl = Control.getControl(Control.FORMAT_PROPERTIES); ResourceBundle bundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings", propOnlyControl);
Using the propOnlyControl
instance, the getBundle
method ignores bundle file names ending in .class
, and the method searches only for bundles ending in .properties
.
Locales as Part of the Package Name
Different localizations of the same base bundle name are usually differentiated by locale identifier suffixes. For example, the default or root Warnings
bundle is simply named Warnings.properties
. However, a French version of that bundle would be named Warnings_fr_FR.properties
. Using the default Control
, these different bundles would exist together in the same package. But you can change the way that localized bundles are named.
Imagine that you prefer to put different localizations of the same bundle into separately defined subdirectories or packages. You might create the following property files in your file structure:
com/sun/demo/intl/res/root/Warnings.properties com/sun/demo/intl/res/fr_FR/Warnings.properties com/sun/demo/intl/res/ja_JP/Warnings.properties
To do this, you must define your own Control
subclass. The subclass must override the following methods:
getFormats
toBundleName
Override the getFormats
method because your application will use only properties files. Override the toBundleName
method because your application will use the specified locale as part of the new bundle's package name rather than append locale-specific suffixes to the bundle base name.
The following code shows a customized Control
class that allows bundle searches for .properties
files and locale-specific package names instead of bundle-name suffixes.
class SubdirControl extends Control { // This control provides only properties file formats. public List<String> getFormats() { return Control.FORMAT_PROPERTIES; } // Given a base bundle name and a locale, this // method creates a bundle name for a specific locale. // In this case, the bundle name uses the locale as a part // of the package name, not a bundle-name suffix. // public String toBundleName(String bundleName, Locale locale) { StringBuffer localizedBundle = new StringBuffer(); // Find the base bundle name. int nBaseName = bundleName.lastIndexOf('.'); String baseName = bundleName; // Create a new name starting with the package name. if (nBaseName >= 0) { localizedBundle.append(bundleName.substring(0, nBaseName)); baseName = bundleName.substring(nBaseName+1); } String strLocale = locale.toString(); // Now append the locale identification to the package name. if (strLocale.length() > 0 ) { localizedBundle.append("." + strLocale); } else { localizedBundle.append(".root"); } // Now append the basename to the fully qualified package. localizedBundle.append("." + baseName); return localizedBundle.toString(); } }
The following code shows how to provide the customized Control
object to the getBundle
method:
String bundleName = "com.sun.demo.intl.res.Warnings"; SubdirControl control = new SubdirControl(); Locale locale = new Locale("fr", "FR"); ResourceBundle bundle = ResourceBundle.getBundle(bundleName, locale, control);
If the default locale is en_US
, the getBundle
method will use the Control
object to search the following candidate and fallback bundle names:
com.sun.demo.intl.res.fr_FR.Warnings com.sun.demo.intl.res.fr.Warnings com.sun.demo.intl.res.en_US.Warnings com.sun.demo.intl.res.en.Warnings com.sun.demo.intl.res.root.Warnings
Cache Controls
The default behavior for loading bundles includes a check to determine whether the bundle has already been loaded. However, you can change this cache option. If you simply want to clear the cache before a bundle reload, you can call the clearCache
method of the ResourceBundle
class:
ResourceBundle.clearCache(); ResourceBundle myBundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings");
You can even control a cache's expiration by setting a custom time-to-live value. In your own Control
, override the getTimeToLive
method to return the millisecond lifetime value for the bundle. Two predefined values exist: TTL_DONT_CACHE
and TTL_NO_EXPIRATION_CONTROL
.
The default Control
returns TTL_NO_EXPIRATION_CONTROL
, which means that bundles are cached without any expiration value. The value TTL_DONT_CACHE
indicates that the bundle must not be cached at all. If you would like your bundles to expire every four hours to support live updates without restarting your application, for example, you can override the getTimeToLive
method like this:
public long getTimeToLive() { return 4L*60*60*1000; // 14,400,000 milliseconds is four hours. }
The Control
class offers many options for specifying precise bundle searching and loading. This article presents only a few. Some of the other methods you may override include the following:
getCandidateLocales
getFallbackLocale
newBundle
needsReload
See the complete platform documentation for the ResourceBundle.Control
class for more details on these and other methods.

The java.text
and java.util
packages support more than 100 locales. Although existing locales represent the needs of many geographical regions, the locale-sensitive classes in the Java platform do not yet support some areas. Supporting a locale and its data requires a lot of research, including investigating and confirming date and number formats, country name translations, and sort orders. Sometimes even political influences affect locale data. Unfortunately, it is practically impossible to keep the platform's locale data completely up-to-date at all times, even though customers want and need access to new locale data in the platform.
One solution is to provide new application programming interfaces (APIs) that allow you to use any locale data that you may need for your own application. Java SE 6 provides an interface that developers can use to plug in their own locale data and related services. Fortunately, an active project called the Common Locale Data Repository (CLDR) attempts to track global locale data and maintain it. The Unicode Consortium hosts the project. Using the new Locale-Sensitive Services Service Provider Interface (SPI), you can use this or any other locale data in your application.
To provide locale data and services, you must first decide which functionality you want to provide to the application. You can provide locale data for the following locale-sensitive classes:
java.text.BreakIterator
java.text.Collator
java.text.DateFormat
java.text.DateFormatSymbols
java.text.DecimalFormatSymbols
< li>java.text.NumberFormat
java.util.Currency
java.util.Locale
java.util.TimeZone
Once you decide which functionality you want to provide with your locale, you must implement the corresponding service provider interface (SPI), which resides in either the java.text.spi
or java.util.spi
packages.
Imagine that you want to provide a DateFormat
object for a new locale. You should implement the java.text.spi.DateFormatProvider
class. Because java.text.spi.DateFormatProvider
is an abstract class, you must extend it and implement the following methods:
getAvailableLocales
getDateInstance
getDateTimeInstance
< li>getTimeInstance
Notice that getAvailableLocales
method is actually derived from the parent class LocaleServiceProvider
, so all the SPI providers should implement it to declare their supported locales. Notice that the other three methods are mirrored factory methods from the corresponding API class. For example, the getDateInstance
method also exists in the java.text.DateFormat
class.
After implementing the required methods, you must package your service so that you can deploy it with the Java Runtime Environment (JRE). Because the Locale-Sensitive Services SPIs are based on the standard Java Extension Mechanism, you can package them as a JAR file in the JRE extension directory. JREs that use your extension can now provide locale data for previously undefined or unsupported locales.

The Unicode Standard allows users to create equivalent text in different ways. For example, the é
character, named LATIN SMALL LETTER E WITH ACUTE
in the Unicode Standard, has the point value U+00E9
. The base character of e
and the acute accent mark, ´
, are combined into a single code point called a composite or composed character.
However, you can also represent the same visual character by combining the two separate code points for the lowercase letter e
and the acute accent. The two code points U+0065
and U+0301
combine to create the same visual and semantic effect, which is the é
character. The combining characters are sometimes called a combining sequence. Other characters in the Unicode Standard can combine to create similar effects with different combining sequences and character forms.
Some combining sequences are visually different but have the same meaning for most practical purposes. For example, the three-character sequence 1/2
has essentially the same meaning as the single character ½
, or U+00BD
. Similarly, the character 2
and the superscript character ²
are visually different but similar in meaning. These similarities among characters provide many opportunities for users to enter text in many different ways. As you might imagine, text operations such as searching and sorting become quite complicated if you must consider all the various ways to form equivalent text.
The Java platform's java.text.Collator
class understands Unicode text forms and normalizes text for accurate comparisons. The normalization process converts text from disparate text forms to a single form that allows accurate text processing. Until Java SE 6, Collator
used private APIs to normalize text. However, those APIs are now public in Java SE 6.
Use java.text.Normalizer
to normalize text. You might want to normalize text for text-processing operations, serialization, transfer, or database storage. The API has only two static methods: normalize
and isNormalized
. As you might expect, the normalize
method will normalize text into a specific form. The isNormalized
method checks whether text is already normalized to a specific form.
The Normalizer.Form
enumeration represents each Unicode normalization form:
NFD
(Normalization Form D)NFC
(Normalization Form C)NFKD
(Normalization Form KD)NFKC
(Normalization Form KC)
NFD
is canonical decomposition. The decomposition process converts composed character forms to combining sequences as mapped by the Unicode Standard. For example, the single code point U+00F1
for the ñ
character becomes the decomposed combining sequence U+006E U+0303
in NFD
. The new sequence contains the common character n
followed by a combining tilde, ˜
.
NFC
is canonical decomposition followed by canonical composition. After canonically decomposing text, the process maps combining sequences into standard composed code points. For example, applying NFC
to the sequence U+0065 U+0300
creates just a single code point: U+00E8
, or è
. NFC
is the World Wide Web Consortium's recommended normalization form to transfer and process text on the Internet.
NFKD
is compatibility decomposition. After applying canonical decomposition, the process applies a compatibility mapping that transforms some characters to a standard compatible form. Compatibility is determined by a predefined mapping from one character to another character, and the Unicode Standard defines and maintains the mappings. NFKD
creates noticeable changes to the TRADE MARK SIGN
character, which has code point U+2122
. Compatibility decomposition transforms the single code point to the characters TM
, which are the common characters for LATIN CAPITAL LETTER T
(U+0054
) and LATIN CAPITAL LETTER M
(U+004D
).
NFKC
is compatibility decomposition followed by canonical composition. This normalization form tries to create composed characters that are compatible to the original characters. Equivalent compatible characters are defined by the Unicode Standard. If you apply NFKC
to the code point U+1E9B
, LATIN SMALL LETTER LONG S WITH DOT ABOVE
, the decomposition step creates the sequence U+017F U+0307
. Finally, the composition step transforms the sequence to a single composite character U+1E61
.
The following code sample shows how to use the Normalizer
class to transform text to Normalization Form D (NFD
):
String strName = "Jos\u00E9"; // using a composed é String strNFD = Normalizer.normalize(strName, Normalizer.Form.NFD); |
The resulting string strNFD
now contains five code point values: Jose´
. These characters have the Unicode values U+004A U+006F U+0073 U+0065 U+0301
.
You can also test whether text is in a specific normalization form using the isNormalized
method:
boolean bNormalized = Normalizer.isNormalized(strNFD, Normalizer.Form.NFD); System.out.printf("NFD? %b\n", bNormalized); |

The Internationalizing Domain Names in Applications (IDNA) standard, defined by RFC 3490, describes the fact that domain names are no longer restricted to the ASCII character set. With a few restrictions, the full set of characters in the Unicode 3.2 standard are available to define domain names. Unfortunately, domain name server (DNS) and resolver services are not all fully capable of reliably storing and using non-ASCII characters. The IDNA solution defines a method for representing non-ASCII characters with an encoding that uses only ASCII characters. The result is that DNS and name resolver software continue to function with an ASCII-compatible encoding (ACE), but end users can use internationalized domain names using an expanded set of Unicode characters.
Java SE 6 supports the IDNA standard by providing the java.net.IDN
class. This class provides methods for converting a Unicode domain name to an ASCII-compatible name. The available operations are toASCII
and toUnicode
. Applications should convert domain names to ACE using the toASCII
method before submitting the domain names to DNS or resolver services. Applications can use the toUnicode
method to create the Unicode text that users should see.
If you enter a non-ASCII domain name into your application, the application should convert the name using the toASCII
method before submitting it across the Internet.
// Retrieve the domain name from the user interface. String strUnicodeName = txtUnicodeName.getText(); // Convert the international domain name to // an ASCII-compatible encoding. String strACEName = IDN.toASCII(strUnicodeName); |
Using the Japanese domain name shown in Figure 1, the conversion stores the text xn--wgv71a119e.jp
in the strACEName
variable. The new text is the ACE equivalent of the Japanese domain name.
![]()
Figure 1. The IDN class creates an encoded equivalent name for DNS and resolver software.
|
The text xn--wgv71a119e
doesn't mean anything to most people. It's encoded text, suitable for machine or software consumption. Your applications should use the toUnicode
method to convert these ASCII-encoded names into a suitable form that people can typically read and understand. The following code snippet shows how to convert the text back to its original form:
String strACEName = txtACEName.getText(); String strUnicodeName = IDN.toUnicode(strACEName); |

Your customers in Japan frequently use two calendars, the Gregorian calendar and the traditional Japanese Imperial calendar. Although everyone is familiar with the Gregorian calendar, and it may be used more often than not, the Japanese government often uses the Imperial calendar in its forms and documents. The Imperial calendar defines eras based on the reigning period of Japanese emperors.
The Java platform provides calendars by way of the getInstance
method of the java.util.Calendar
class. You can construct a Japanese Imperial calendar by using the locale ja_JP_JP
like this:
Calendar calJapanese = Calendar.getInstance(new Locale("ja", "JP", "JP")); |
Once you've created the calendar, you can use it to set, retrieve, and manipulate dates using Imperial calendar rules for era and year names.
The difference between Gregorian and Imperial calendars is most obvious when you format dates. The java.text.SimpleDateFormat
and java.text.DateFormat
classes support date formats for the new calendar. Create a formatter and display the current date using code like this:
Date now = new Date(); Locale localeJapanese = new Locale("ja", "JP"); Locale localeImperialJapanese = new Locale("ja", "JP", "JP"); DateFormat dfGregorian = DateFormat.getDateInstance(DateFormat.FULL, localeJapanese); DateFormat dfImperial = DateFormat.getDateInstance(DateFormat.FULL, localeImperialJapanese); String strGregorianDate = dfGregorian.format(now); String strImperialDate = dfImperial.format(now); txtGregorianDate.setText(strGregorianDate); txtImperialDate.setText(strImperialDate); |
When using the ja_JP
locale, DateFormat
produces a Gregorian date using Japanese characters for year, month, and day terms. When using the ja_JP_JP
locale, DateFormat
creates a date string using the new Imperial calendar. Figure 2 shows a date in which the Gregorian year 2007 shows as the Imperial year Heisei 19.
![]()
Figure 2. Java SE 6 provides support for the Japanese Imperial calendar.
|

With Java SE 6, the already long list of supported locales just got longer. The platform now includes new locales that are fully supported by the various locale-sensitive classes. Locale data comes from the increasingly popular CLDR data. Although the platform uses CLDR data for the new locales, pre-existing locales in the platform are unaffected.
Table 1: New Locales Available in Java SE 6
|
|
Language
|
Country
|
Locale ID
|
---|---|---|
![]() | ||
Chinese (Simplified)
|
Singapore
|
zh_SG
|
English
|
Malta
|
en_MT
|
English
|
Philippines
|
en_PH
|
English
|
Singapore
|
en_SG
|
Greek
|
Cyprus
|
el_CY
|
Indonesian
|
Indonesia
|
in_ID
|
Irish
|
Ireland
|
ga_IE
|
Japanese (Japanese Imperial calendar)
|
Japan
|
ja_JP_JP
|
Malay
|
Malaysia
|
ms_MY
|
Maltese
|
Malta
|
mt_MT
|
Serbian
|
Bosnia and Herzegovina
|
sr_BA
|
Serbian
|
Serbia and Montenegro
|
sr_CS
|
Spanish
|
United States
|
es_US
|
![]() |
![]() |
Java SE 6 updates the platform's already extensive internationalization support by opening up the platform to allow developers more control over how resources are found and cached. Also, using the Locale-Sensitive Services SPI, you can add locale support that is not already in Java SE 6. The Normalizer
class is no longer private. You can use it to normalize text into the four forms defined by the Unicode Standard: NFC
, NFD
, NFKC
, NFKD
. You don't have to limit your application to plain ASCII domain names. The IDN
class provides an API to convert non-ASCII domain names to usable ASCII-compatible encodings suitable for interacting with DNS and resolver services. You can now use and format dates using the Japanese Imperial calendar. Finally, more than a dozen new locales are available, and their data comes from the Unicode Consortium's CLDR project. The new CLDR-based locale definitions do not affect existing locales.
![]() |
- Documentation for the
java.net.IDN
class - Documentation for the
java.text.Normalizer
class - Documentation for the
java.util.ResouceBundle.Control
class - Documentation for the
java.util.Calendar
class - Technical notes for the new Japanese Imperial calendar support
- Technical notes for internationalization enhancements
- The Locale-Sensitive Services SPI section of the Java Tutorial
- Java Internationalization Technical Articles and Tips
- Naoto Sato's blog about international features in the Java platform
- John O'Conner's blog about international features in the Java platform

John O'Conner is an engineer and writer at Sun Microsystems. He coaches Little League baseball and AYSO soccer teams, which are always populated by at least one of his five children.
Naoto Sato is a Java internationalization engineer in the client software group at Sun Microsystems. Currently, his work is focused on enhancements of locales in the Java platform. Before joining Sun, he worked with the internationalization team at IBM Japan.