Sunday, May 29, 2022

Java_18_Features - UTF-8 by Default - Simple Web Server

Simply put, Java 18 adds a new command line tool named jwebserver.
Usage example: jwebserver -p 9000

Running the command fires up a simple web server that serves static files from the current directory. 

You can define a custom port via -p and a custom directory via -d.

Right now the server is only intended for education, experiments, testing and similar, it’s not intended for the production usage.

For our example we place all the files in the same folder.

To test the Simple Web Server we provide an optional html file and a json file.

We can start the Web Server with the command jwebserver from the command line inside our folder.

The default web server will serve the static pages using the port 8000, example with http://localhost:8000/test.json.

If you want to change the port you can use the parameter -p [port number]. You can find the list of the available options in the official documentation of jwebserver



Instantiate the Simple Web Server from a Java class

If you want to build your own custom simple web server, or you want to start a server from an application you can use the class SimpleFileServer:

In our example we can start the server from the code after having defined the port and the root folder of the static files.


The goal is to provide an educational server that can start from the command line, without external dependencies.

The server has no ambitions, but it could be useful for trainings and, important for us, to quickly generate some REST responses during the development of our frontend application.

It’s very common to create a small node.js server to return a simple JSON to quickly see results in our UI during the frontend development. I created a post for the node.js implementation, you can find the link at the bottom of the page.

The new Java Simple Web server allows us to simulate a web service with just a JSON file and the command jwebserver, without the need to create a simple web server from scratch.

The implementation is very limited, it handles only GET requests. It’s possible to extend the features for our test purposes using the SimpleFileServer class.

UTF-8 by Default

The conversion between raw bytes and the Java programming language’s 16-bit char values is governed by a charset. US-ASCII, UTF-8, and ISO-8859-1 are examples of supported charsets.

In JDK 17 and earlier standard Java APIs normally utilized the default charset if no charset option is given  At startup, the JDK determines the default charset based on the run-time environment, which includes the operating system, the user’s locale, and other considerations. e.g: On Windows, it is a codepage-based charset such as windows-1252 and On macOS, it is UTF-8.

That is changed, when not specified explicitly, the default charset that JDK will now pick for you is always UTF-8.

We can view default charset by running following command
java -XshowSettings:properties

The Problem

The Java standard character set determines how Strings are converted to bytes and vice versa in numerous methods of the JDK class library (e.g., when writing and reading a text file). These include, for example:

  • the constructors of FileReaderFileWriterInputStreamReaderOutputStreamWriter,
  • the constructors of Formatter and Scanner,
  • the static methods URLEncoder.encode() and URLDecoder.decode().

This can lead to unpredictable behavior when an application is developed and tested in one environment – and then run in another (where Java chooses a different default character set).

For example, let's run the following code on Linux or macOS (the Japanese text is "Happy Coding!" according to Google Translate):

try (FileWriter fw = new FileWriter("happy-coding.txt"); BufferedWriter bw = new BufferedWriter(fw)) { bw.write("Java18"); }
Code language: Java (java)

And then, we load this file with the following code on Windows:

try (FileReader fr = new FileReader("happy-coding.txt"); BufferedReader br = new BufferedReader(fr)) { String line = br.readLine(); System.out.println(line); }
Code language: Java (java)

Then the following is displayed:

�ッピーコーディング�
Code language: plaintext (plaintext)

That is because Linux and macOS store the file in UTF-8 format, and Windows tries to read it in Windows-1252 format.

The Problem – Stage Two

It becomes even more chaotic because newer class library methods do not respect the default character set but always use UTF-8 if no character set is specified. These methods include, for example, Files.writeString()Files.readString()Files.newBufferedWriter(), and Files.newBufferedReader().

Let's start the following program, which writes the Japanese text via FileWriter and reads it directly afterward via Files.readString():

try (FileWriter fw = new FileWriter("happy-coding.txt"); BufferedWriter bw = new BufferedWriter(fw)) { bw.write("Java18"); } String text = Files.readString(Path.of("happy-coding.txt")); System.out.println(text);
Code language: Java (java)

Linux and macOS display the correct Japanese text. On Windows, however, we see only question marks:

???????????
Code language: plaintext (plaintext)

That is because, on Windows, FileWriter writes the file using the standard Java character set Windows-1252, but Files.readString() reads the file back in as UTF-8 – regardless of the standard character set.

Possible Solutions to Date

For protecting an application against such errors, there have been two possibilities so far:

  1. Specify the character set when calling all methods that convert strings to bytes and vice versa.
  2. Set the default character set via system property "file.encoding".

The first option leads to a lot of code duplication and is thus messy and error-prone:

FileWriter fw = new FileWriter("happy-coding.txt", StandardCharsets.UTF_8); // ... FileReader fr = new FileReader("happy-coding.txt", StandardCharsets.UTF_8); // ... Files.readString(Path.of("happy-coding.txt"), StandardCharsets.UTF_8);
Code language: Java (java)

Specifying the character set parameters also prevents us from using method references, as in the following example:

Stream<String> encodedParams = ... Stream<String> decodedParams = encodedParams.map(URLDecoder::decode);
Code language: Java (java)

Instead, we would have to write:

Stream<String> encodedParams = ... Stream<String> decodedParams = encodedParams.map(s -> URLDecoder.decode(s, StandardCharsets.UTF_8));
Code language: Java (java)

The second possibility (system property "file.encoding") was firstly not officially documented up to and including Java 17 (see system properties documentation).

Secondly, as explained above, the character set specified is not used for all API methods. So the variant is also error-prone, as we can show with the example from above:

public class Jep400Example { public static void main(String[] args) throws IOException { try (FileWriter fw = new FileWriter("happy-coding.txt"); BufferedWriter bw = new BufferedWriter(fw)) { bw.write("Java18"); } String text = Files.readString(Path.of("happy-coding.txt")); System.out.println(text); } }
Code language: Java (java)

Let's run the program once with standard encoding US-ASCII:

$ java -Dfile.encoding=US-ASCII Jep400Example.java ?????????????????????????????????
Code language: plaintext (plaintext)

The result is garbage because FileWriter takes the default encoding into account, but Files.readString() ignores it and always uses UTF-8. So this variant only works reliably if you use UTF-8 uniformly:

$ java -Dfile.encoding=UTF-8 Jep400Example.java Java18
Code language: plaintext (plaintext)

JEP 400 to the Rescue

With JDK Enhancement Proposal 400, the problems mentioned above will – at least for the most part – be a thing of the past as of Java 18.

The default encoding will always be UTF-8 regardless of the operating system, locale, and language settings.

Also, the system property "file.encoding" will be documented – and we can use it legitimately. However, we should do this with caution. The fact that the Files methods ignore the configured default encoding will not be changed by JEP 400.

According to the documentation, only the values "UTF-8" and "COMPAT" should be used anyway, with UTF-8 providing consistent encoding and COMPAT simulating pre-Java 18 behavior. All other values lead to unspecified behavior.

Quite possibly, "file.encoding" will be deprecated in the future and later removed to eliminate the remaining potential source of errors (methods that respect the default encoding vs. those that do not).

The best way is always to set "-Dfile.encoding" to UTF-8 or omit it altogether.

Reading the Encodings at Runtime

The current default encoding can be read at runtime via Charset.defaultCharset() or the system property "file.encoding". Since Java 17, the system property "native.encoding" can be used to read the encoding, which – before Java 18 – would be the default encoding if none is specified:

System.out.println("Default charset : " + Charset.defaultCharset()); System.out.println("file.encoding : " + System.getProperty("file.encoding")); System.out.println("native.encoding : " + System.getProperty("native.encoding"));
Code language: Java (java)

Without specifying -Dfile.encoding, the program prints the following on Linux and macOS with Java 17 and Java 18:

Default charset : UTF-8 file.encoding : UTF-8 native.encoding : UTF-8
Code language: plaintext (plaintext)

On Windows and Java 17, the output is as follows:

Default charset : windows-1252 file.encoding : Cp1252 native.encoding : Cp1252
Code language: plaintext (plaintext)

And on Windows and Java 18:

Default charset : UTF-8 file.encoding : UTF-8 native.encoding : Cp1252
Code language: plaintext (plaintext)

So the native encoding on Windows remains the same, but the default encoding changes to UTF-8 according to this JEP.

The Previous "Default" Character Set

If we run the little program from above on Linux or macOS and Java 17 with the -Dfile.encoding=default parameter, we get the following output:

Default charset : US-ASCII file.encoding : default native.encoding : UTF-8
Code language: plaintext (plaintext)

This is because the name "default" was previously recognized as an alias for the encoding "US-ASCII".

In Java 18, this is changed: "default" is no longer recognized; the output looks like this:

Default charset : UTF-8 file.encoding : default native.encoding : UTF-8
Code language: plaintext (plaintext)

The system property "file.encoding" is still "default" – but at this point, we would also see any other invalid input. The default character set for an invalid "file.encoding" input is always UTF-8 as of Java 18 or corresponds to the native encoding up to Java 17.

Charset.forName() Taking Fallback Default Value

Not part of the above JEP and not defined in any other JEP is the new method Charset.forName(String charsetName, Charset fallback). This method returns the specified fallback value instead of throwing an IllegalCharsetNameException or an UnsupportedCharsetException if the character set name is unknown or the character set is not supported.

You may also like

Kubernetes Microservices
Python AI/ML
Spring Framework Spring Boot
Core Java Java Coding Question
Maven AWS