Tech Twitter: Java_18_Features - UTF-8 by Default

Simply put, Java 18 adds a new command line tool named jwebserver.

Usage example: jwebserver -p 9000

Running the command fires up a simple web server that serves static files from the current directory.

You can define a custom port via -p and a custom directory via -d.

Right now the server is only intended for education, experiments, testing and similar, it’s not intended for the production usage.

For our example we place all the files in the same folder.

To test the Simple Web Server we provide an optional html file and a json file.

We can start the Web Server with the command jwebserver from the command line inside our folder.

The default web server will serve the static pages using the port 8000, example with http://localhost:8000/test.json.

If you want to change the port you can use the parameter -p [port number]. You can find the list of the available options in the official documentation of jwebserver

Instantiate the Simple Web Server from a Java class

If you want to build your own custom simple web server, or you want to start a server from an application you can use the class SimpleFileServer:

In our example we can start the server from the code after having defined the port and the root folder of the static files.

The goal is to provide an educational server that can start from the command line, without external dependencies.

The server has no ambitions, but it could be useful for trainings and, important for us, to quickly generate some REST responses during the development of our frontend application.

It’s very common to create a small node.js server to return a simple JSON to quickly see results in our UI during the frontend development. I created a post for the node.js implementation, you can find the link at the bottom of the page.

The new Java Simple Web server allows us to simulate a web service with just a JSON file and the command jwebserver, without the need to create a simple web server from scratch.

The implementation is very limited, it handles only GET requests. It’s possible to extend the features for our test purposes using the SimpleFileServer class.

UTF-8 by Default

The conversion between raw bytes and the Java programming language’s 16-bit char values is governed by a charset. US-ASCII, UTF-8, and ISO-8859-1 are examples of supported charsets.

In JDK 17 and earlier standard Java APIs normally utilized the default charset if no charset option is given At startup, the JDK determines the default charset based on the run-time environment, which includes the operating system, the user’s locale, and other considerations. e.g: On Windows, it is a codepage-based charset such as windows-1252 and On macOS, it is UTF-8.

That is changed, when not specified explicitly, the default charset that JDK will now pick for you is always UTF-8.

We can view default charset by running following command
java -XshowSettings:properties

The Problem

The Java standard character set determines how Strings are converted to bytes and vice versa in numerous methods of the JDK class library (e.g., when writing and reading a text file). These include, for example:

the constructors of FileReader, FileWriter, InputStreamReader, OutputStreamWriter,
the constructors of Formatter and Scanner,
the static methods URLEncoder.encode() and URLDecoder.decode().

This can lead to unpredictable behavior when an application is developed and tested in one environment – and then run in another (where Java chooses a different default character set).

For example, let's run the following code on Linux or macOS (the Japanese text is "Happy Coding!" according to Google Translate):

try (FileWriter fw = new FileWriter("happy-coding.txt");
    BufferedWriter bw = new BufferedWriter(fw)) {
  bw.write("Java18");
}
Code language: Java (java)

And then, we load this file with the following code on Windows:

try (FileReader fr = new FileReader("happy-coding.txt");
    BufferedReader br = new BufferedReader(fr)) {
  String line = br.readLine();
  System.out.println(line);
}
Code language: Java (java)

Then the following is displayed:

ãƒ?ãƒƒãƒ”ãƒ¼ã‚³ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ï¼?
Code language: plaintext (plaintext)

That is because Linux and macOS store the file in UTF-8 format, and Windows tries to read it in Windows-1252 format.

The Problem – Stage Two

It becomes even more chaotic because newer class library methods do not respect the default character set but always use UTF-8 if no character set is specified. These methods include, for example, Files.writeString(), Files.readString(), Files.newBufferedWriter(), and Files.newBufferedReader().

Let's start the following program, which writes the Japanese text via FileWriter and reads it directly afterward via Files.readString():

try (FileWriter fw = new FileWriter("happy-coding.txt");
    BufferedWriter bw = new BufferedWriter(fw)) {
  bw.write("Java18");
}

String text = Files.readString(Path.of("happy-coding.txt"));
System.out.println(text);
Code language: Java (java)

Linux and macOS display the correct Japanese text. On Windows, however, we see only question marks:

???????????
Code language: plaintext (plaintext)

That is because, on Windows, FileWriter writes the file using the standard Java character set Windows-1252, but Files.readString() reads the file back in as UTF-8 – regardless of the standard character set.

Possible Solutions to Date

For protecting an application against such errors, there have been two possibilities so far:

Specify the character set when calling all methods that convert strings to bytes and vice versa.
Set the default character set via system property "file.encoding".

The first option leads to a lot of code duplication and is thus messy and error-prone:

FileWriter fw = new FileWriter("happy-coding.txt", StandardCharsets.UTF_8);
// ...
FileReader fr = new FileReader("happy-coding.txt", StandardCharsets.UTF_8);
// ...
Files.readString(Path.of("happy-coding.txt"), StandardCharsets.UTF_8);
Code language: Java (java)

Specifying the character set parameters also prevents us from using method references, as in the following example:

Stream<String> encodedParams = ...
Stream<String> decodedParams = encodedParams.map(URLDecoder::decode);
Code language: Java (java)

Instead, we would have to write:

Stream<String> encodedParams = ...
Stream<String> decodedParams =
    encodedParams.map(s -> URLDecoder.decode(s, StandardCharsets.UTF_8));
Code language: Java (java)

The second possibility (system property "file.encoding") was firstly not officially documented up to and including Java 17 (see system properties documentation).

Secondly, as explained above, the character set specified is not used for all API methods. So the variant is also error-prone, as we can show with the example from above:

public class Jep400Example {
  public static void main(String[] args) throws IOException {
    try (FileWriter fw = new FileWriter("happy-coding.txt");
        BufferedWriter bw = new BufferedWriter(fw)) {
      bw.write("Java18");
    }

    String text = Files.readString(Path.of("happy-coding.txt"));
    System.out.println(text);
  }
}
Code language: Java (java)

Let's run the program once with standard encoding US-ASCII:

$ java -Dfile.encoding=US-ASCII Jep400Example.java
?????????????????????????????????
Code language: plaintext (plaintext)

The result is garbage because FileWriter takes the default encoding into account, but Files.readString() ignores it and always uses UTF-8. So this variant only works reliably if you use UTF-8 uniformly:

$ java -Dfile.encoding=UTF-8 Jep400Example.java
Java18
Code language: plaintext (plaintext)

JEP 400 to the Rescue

With JDK Enhancement Proposal 400, the problems mentioned above will – at least for the most part – be a thing of the past as of Java 18.

The default encoding will always be UTF-8 regardless of the operating system, locale, and language settings.

Also, the system property "file.encoding" will be documented – and we can use it legitimately. However, we should do this with caution. The fact that the Files methods ignore the configured default encoding will not be changed by JEP 400.

According to the documentation, only the values "UTF-8" and "COMPAT" should be used anyway, with UTF-8 providing consistent encoding and COMPAT simulating pre-Java 18 behavior. All other values lead to unspecified behavior.

Quite possibly, "file.encoding" will be deprecated in the future and later removed to eliminate the remaining potential source of errors (methods that respect the default encoding vs. those that do not).

The best way is always to set "-Dfile.encoding" to UTF-8 or omit it altogether.

Reading the Encodings at Runtime

The current default encoding can be read at runtime via Charset.defaultCharset() or the system property "file.encoding". Since Java 17, the system property "native.encoding" can be used to read the encoding, which – before Java 18 – would be the default encoding if none is specified:

System.out.println("Default charset : " + Charset.defaultCharset());
System.out.println("file.encoding   : " + System.getProperty("file.encoding"));
System.out.println("native.encoding : " + System.getProperty("native.encoding"));
Code language: Java (java)

Without specifying -Dfile.encoding, the program prints the following on Linux and macOS with Java 17 and Java 18:

Default charset : UTF-8
file.encoding   : UTF-8
native.encoding : UTF-8
Code language: plaintext (plaintext)

On Windows and Java 17, the output is as follows:

Default charset : windows-1252
file.encoding   : Cp1252
native.encoding : Cp1252
Code language: plaintext (plaintext)

And on Windows and Java 18:

Default charset : UTF-8
file.encoding   : UTF-8
native.encoding : Cp1252
Code language: plaintext (plaintext)

So the native encoding on Windows remains the same, but the default encoding changes to UTF-8 according to this JEP.

The Previous "Default" Character Set

If we run the little program from above on Linux or macOS and Java 17 with the -Dfile.encoding=default parameter, we get the following output:

Default charset : US-ASCII
file.encoding   : default
native.encoding : UTF-8
Code language: plaintext (plaintext)

This is because the name "default" was previously recognized as an alias for the encoding "US-ASCII".

In Java 18, this is changed: "default" is no longer recognized; the output looks like this:

Default charset : UTF-8
file.encoding   : default
native.encoding : UTF-8
Code language: plaintext (plaintext)

The system property "file.encoding" is still "default" – but at this point, we would also see any other invalid input. The default character set for an invalid "file.encoding" input is always UTF-8 as of Java 18 or corresponds to the native encoding up to Java 17.

Charset.forName() Taking Fallback Default Value

Not part of the above JEP and not defined in any other JEP is the new method Charset.forName(String charsetName, Charset fallback). This method returns the specified fallback value instead of throwing an IllegalCharsetNameException or an UnsupportedCharsetException if the character set name is unknown or the character set is not supported.

Kubernetes	Microservices
K8s_introduction Introduction To Docker & Docker-Swarm Mastering Kubernetes Design Patterns common_commands Deep Dive into Kubeproxy: Unraveling Its Inner Workings in Kubernetes Helm KubeApiServer QoS A Deep Dive into Kubernetes Sidecar, Init Containers & Container Communication A Comprehensive Guide to Different Types of Services in Kubernetes Troubleshooting Kubernetes Ingress vs Service Mesh What is Prometheush Simplifying Kubernetes Complexity with the Operator Pattern Dynamic kubernetes cluster scaling POWERFUL TOOLS TO MANAGE KUBERNETEST All k8s Post	MicroServices Design Patterns Reverse proxy v/s Forward proxy How To Implement Hystrix Circuit Breaker In Microservices Application? What is Externalized configuration - Build Once, Run Anywhere in Ms? What is Prometheus Monitoring system & time series database What is an API gateway and why is it important?
Python	AI/ML
Python libraries and frameworks Python Basic Concepts ALL Post Python Intermediate Concepts ALL Post	AI: Categories and Subcategories
Spring Framework	Spring Boot
Spring Framework- Introduction What is bean In Spring Framework? Inversion Of Control [IOC] Spring - Beans AutoWiring Spring - Bean Validations Spring - Event Handling Spring - Internationalization (I18N) Spring - Bean Manipulations or Bean Wrappers Spring - Property Editors Spring - Profiling Spring Expression Language – SpEL API & Example	Building A Dockerizing Spring Boot App Part1 - End-to-End data Encryption Using Public and Private Keys in java / Spring Boot Part2 - End-to-End data Encryption - Different methods of encryption using public and private keys Demystifying Role based JWT Authentication in Modern Web Applications using spring boot
Core Java	Java Coding Question
Java_Fundamentals Java_8_To_18_Features Design_Patterns_&_Principles Benefits of setting initial and maximum memory size to the same value StackoverflowError causes-solutions	Java8_Coding_Question String_Coding_Question Array_Coding_Question Stack_Coding_Question Queue_Coding_Question Linked_List_Coding_Question Binary_Tree_Coding_Question Binary_Search_Tree_Coding_Question Sorting_Coding_Question Graph_Coding_Question DynamicProgramming_Easy_coding_Question Dynamic_Programming_Coding_Question Miscellaneous_Programming_Coding_Question
Maven	AWS
Demystifying the Maven Build Lifecycle: Phases, Goals, and Custom Lifecycles Mastering Maven Profiles: Tailoring Your Builds with Precision Mastering Maven Plugins and Dependency Management with Spring Boot	AWS Basics service AWS Service Sketch AWS v/s Azure Service All AWS Post

Tech Twitter

Sunday, May 29, 2022

Java_18_Features - UTF-8 by Default - Simple Web Server