Java Anti-Patterns
This page collects some bad code that may not look so obviously bad to beginners. Beginners often struggle with the language syntax. They also have little knowledge about the standard JDK class library and how to make the best use of it. In fact I have collected all examples from everyday junior code. I have modified the original code to give it example character and such that it highlights the problems. Many of these problems can easily be detected by SonarQube. I strongly recommend this tool.
Some of these may seem like micro-optimization, premature optimization without profiling or constant factor optimizations. But performance and memory wasted in thousands of these small places adds up quickly and will grind an application to a crawl. And when I say application, I mean a server-side application running on an application server. That's what I do for a living. On desktop GUI applications the situation may not be as bad. But then, what's the only relevant platform that runs client-side Java applications? Android. An embedded platform with very limited resources (memory!). Here even constant factor optimizations pay off quickly. Like iterating over arrays instead of lists.
If you are interested in how to pogram compiler friendly, look at the JDK Performance Wiki.
In the end a lot of your application's performance depends on the overall quality of your code. By the way you should never underestimate the importance of memory footprint. I can't stress that enough. I have seen too many applications with crazy garbage collection overhead and out of memory errors. Even though garbage collection is quite fast, most server-side code's scalability is dominated and limited primarily by its memory use per request/transaction and the request/transaction duration. Improving either of these by a constant factor will directly give you a higher throughput by that factor. If the factor is 10, it can mean supporting 100 or 1000 users, which can make all the difference to your customer.
Compare these scenarios (assume 100MB young generation):
Scenario | thread pool | tx duration | => max. tx / s | mem / tx | => garbage / min | GC / min |
---|---|---|---|---|---|---|
base | 30 | 100 ms | 300 | 50 KB | 900 MB | 9 |
slower | 30 | 1000 ms | 30 | 50 KB | 90 MB | 0.9 |
more mem | 30 | 100 ms | 300 | 500 KB | 9 GB | 90 |
excess mem | 30 | 100 ms | 300 | 5 MB | 90 GB | 900 |
In the slower scenario the transaction duration is 10 times longer. This immediately cuts the maximum number of transactions per second by the factor of 10 as well (limited thread-pool, limited CPU resources). In the more mem scenario each transaction uses 10 times as much memory. This directly bumps up the number of garbage collections to over one per second, which causes non-negligible overhead. Using much more memory like in scenario excess mem this would lead to 15 collections per second, leaving 66ms per collection which is clearly not enough. The system will thrash. Also 66ms is below the transaction duration of 100ms, so many running transactions will still hold onto memory, preventing it from collection, and causing a propagation of that memory to older generations. This means the older generations will start growing and will need a large (slow) collection sooner. The application in that scenario no longer performs. I think this clearly shows how bad excess memory consumption is, compared to just slow code. All your superfast code can't help you when you allocate too much memory.
String concatenation
String s = ""; for (Person p : persons) { s += ", " + p.getName(); } s = s.substring(2); //remove first commaThis is a real memory waster. The repeated concatenation of strings in a loop causes excess garbage and array copying. Moreover it is ugly that the resulting string has to be fixed for an extra comma. Amazingly in 2016 there are still people that believe that the compiler optimizes this somehow. It doesn't even in Java 8! Some morons have even benchmarked the execution time to "prove" that it's fine. No, it isn't fine to produce lots of unnecessary garbage. If you still don't believe me, then I can't help your ignorance either.
StringBuilder sb = new StringBuilder(persons.size() * 16); // well estimated buffer for (Person p : persons) { if (sb.length() > 0) sb.append(", "); // the JIT optimizes the if away out of the loop (peeling) sb.append(p.getName); }
Lost StringBuffer performance
StringBuffer sb = new StringBuffer(); sb.append("Name: "); sb.append(name + '\n'); sb.append("!"); ... String s = sb.toString();This looks like optimized code, but it isn't optimal yet. So why do you optimize in the first place if you then fail to do it properly? Go all the way! The most obvious mistake is the string concatenation in line 3. In line 4 appending a
char
would be faster than appending a String. An also major
omission is the missing length initialization of the buffer which may incur unnecessary resizing (array copying). In JDK 1.5 and above a
StringBuilder
instead of StringBuffer
should have been used: because it is only a local variable the
implicit synchronization is overkill.
Actually, using simple String concatenation compiles to almost perfect byte code: it's only missing the length initialization.
StringBuilder sb = new StringBuilder(100); sb.append("Name: "); sb.append(name); sb.append("\n!"); String s = sb.toString();
String s = "Name: " + name + "\n!";
Testing for string equality
if (name.compareTo("John") == 0) ... if (name == "John") ... if (name.equals("John")) ... if ("".equals(name)) ...None of the above comparisons is wrong - but neither are they really good. The
compareTo
method is overkill and too
verbose. The ==
operator tests for object identity which is probably not what you want. The equals
method
is the way to go, but reversing the constant and variable would give you extra safety if name
is null
.
if ("John".equals(name)) ... if (name.length() == 0) ... if (name.isEmpty()) ...
Converting numbers to Strings
"" + set.size() new Integer(set.size()).toString()The return type of the
Set.size()
method is int
. A conversion to String
is wanted. These two
examples in fact do the conversion. But the first incurs the penalty of a concatenation operation
(translates to (new StringBuilder()).append(i).toString())
). And the second creates an
intermediate Integer wrapper. The correct way of doing it is one of these
Integer.toString(set.size())
Parsing and converting numbers
int v = Integer.valueOf(str).intValue(); int w = Long.valueOf(Double.valueOf(str).longValue).intValue();Learn how to use the API without allocating unnecessary objects.
int v = Integer.parseInt(str); int w = (int) Double.parseDouble(str);
Not taking advantage of immutable objects
zero = new Integer(0); return Boolean.valueOf("true");
Integer
as well as Boolean
are immutable. Thus it doesn't make sense to create several objects that
represent the same value. Those classes have built-in caches for frequently used instances. In the case of Boolean there are even
only two possible instances. The programmer can take advantage of this:
zero = Integer.valueOf(0); return Boolean.TRUE;
XML parsers are for sissies
int start = xml.indexOf("<name>") + "<name>".length(); int end = xml.indexOf("</name>"); String name = xml.substring(start, end);This naive XML parsing only works with the most simple XML documents. It will however fail if a) the name element is not unique in the document, b) the content of name is not only character data c) the text data of name contains escaped characters d) the text data is specified as a CDATA section e) the document uses XML namespaces. XML is way too complex for string operations. There is a reason why XML parsers like Xerces are a over one megabyte jar files! The equivalent with JDOM is:
SAXBuilder builder = new SAXBuilder(false); Document doc = doc = builder.build(new StringReader(xml)); String name = doc.getRootElement().getChild("name").getText();
Assembling XML with String operations
String name = ... String attribute = ... String xml = "<root>" +"<name att=\""+ attribute +"\">"+ name +"</name>" +"</root>";Many beginners are tempted to produce XML output like shown above, by using String operations (which they know so well and which are so easy). Indeed it is very simple and almost beautiful code. However it has one severe shortcoming: It fails to escape reserved characters. So if the variables name or attribute contain any of the reserved characters <, >, &, " or ' this code would produce invalid XML. Also as soon as the XML uses namespaces, String operations may quickly become nasty and hard to maintain. Now XML should be assembled in a DOM. The JDom library is quite nice for that.
Element root = new Element("root"); root.setAttribute("att", attribute); root.setText(name); Document doc = new Documet(); doc.setRootElement(root); XmlOutputter out = new XmlOutputter(Format.getPrettyFormat()); String xml = out.outputString(root);
The XML encoding trap
String xml = FileUtils.readTextFile("my.xml");It is a very bad idea to read an XML file and store it in a String. An XML specifies its encoding in the XML header. But when reading a file you have to know the encoding beforehand! Also storing an XML file in a String wastes memory. All XML parsers accept an InputStream as a parsing source and they figure out the encoding themselves correctly. So you can feed them an InputStream instead of storing the whole file in memory temporarily. The byte order (big-endian, little-endian) is another trap when a multi-byte encoding (such as UTF-8) is used. XML files may carry a byte order mark at the beginning that specifies the byte order. XML parsers handle them correctly.
char is not int
int i = in.read(); char c = (char) i;The above code assumes that you can create a character from a number. It's already wrong technically: int is signed, whereas char is unsigned. A char is simply 16 bits of UTF-16 encoded Unicode. Please note that Unicode defines way more code points than fit into 16 bits (Unicode 9.0 is at 271792 code points vs. 65536 16-bit numbers). Code points like popular emojis are way beyond the BMP and are represented by more than one char even in Java! Anyway, in Java use Reader/Writer or CharsetEncoder/CharsetDecoder instead to convert between characters and their byte representation (see below).
Assuming char represents one character
"\uD83D\uDC31".length() == 2The escape sequence represents the Unicode code point 0x1F431 in UTF-16 as 2 chars. So even though this is only a single cat face symbol on the screen (🐱), the length() method returns 2.
Platform dependent filenames
File tmp = new File("C:\\Temp\\1.tmp"); File exp = new File("export-2013-02-01T12:30.txt"); File f = new File(path +'/'+ filename);Never hard code paths in a filesystem. Different platforms have different conventions, and you can never be sure that a hard coded path is actually available on a random system. Use API calls to create temporary files. Mind that different file systems have different restrictions on what makes a valid file name. Here the exp file contains a colon character, which is illegal on Windows file systems. When you construct absolute or relative paths in the filesystem, be careful of the platform dependent separator character.
File tmp = File.createTempFile("myapp","tmp"); File exp = new File("export-2013-02-01_1230.txt"); File f = new File(path + File.separatorChar + filename); // or even better File dir = new File(path); File f = new File(dir, filename);
Undefined encoding
Reader r = new FileReader(file); Writer w = new FileWriter(file); Reader r = new InputStreamReader(inputStream); Writer w = new OutputStreamWriter(outputStream); String s = new String(byteArray); // byteArray is a byte[] byte[] a = string.getBytes();Each line of the above converts between
byte
and char
using the default platform
encoding. The code behaves differently depending on the platform it runs on. This is harmful if the data
flows from one platform to another. It is considered bad practice to rely on the default platform encoding
at all. Conversions should always be performed with a defined encoding.
Reader r = new InputStreamReader(new FileInputStream(file), "ISO-8859-1"); Writer w = new OutputStreamWriter(new FileOutputStream(file), "ISO-8859-1"); Reader r = new InputStreamReader(inputStream, StandardCharsets.UTF_8); Writer w = new OutputStreamWriter(outputStream, StandardCharsets.UTF_8); String s = new String(byteArray, "ASCII"); byte[] a = string.getBytes("ASCII");
Unbuffered streams
InputStream in = new FileInputStream(file); int b; while ((b = in.read()) != -1) { ... }The above code reads a file byte by byte. Every
read()
call on the stream will cause a native
(JNI) call to the native implementation of the filesystem. Depending on the implementation this may
cause a syscall to the operating system. JNI calls are expensive and so are syscalls. The number of
native calls can be reduced dramatically by wrapping the stream into a BufferedInputStream
.
Reading 1 MB of data from /dev/zero
with the above code took about 1 second on my laptop.
With the fixed code below it was down to 60 milliseconds! That's a 94% saving.
This also applies for output streams of course. And it is true not only for the file system but also for sockets.
InputStream in = new BufferedInputStream(new FileInputStream(file));
Unbuffered operations on InputStreamReader, OutputStreamWriter
Writer w = new OutputStreamWriter(os, StandardCharsets.UTF_8); while (...) { // many small (<8kB) writes w.write("something"); } Reader r = new InputStreamReader(in, StandardCharsets.UTF_8); while (...) { // not reading into a buffer (char[], etc.) int c = r.read(); }As demonstrated OutputStreamWriter uses memory for each call to its write() methods because char to byte conversion is not trivial. Always buffer those writes:
Writer w = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8)); Reader r = new BufferedReader(new InputStreamReader(in, StandardCharsets.UTF_8));For reading and writing a text file, the correct chain of streams becomes:
Writer w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), StandardCharsets.UTF_8)); Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8));
Using PrintWriter for file I/O
PrintWriter w = new PrintWriter(new File("out.txt"), "UTF-8"); w.println("hello world");PrintWriter never throws IOException. Even if the disk is full. Even if you continue to call println() one million times after the disk has been full. Not even when you call
close()
. You need to explicitly call
checkError()
to test for problems. And then you still don't get an exception that would tell you what the heck
happened. All you get is a boolean saying that at some point during your writes there was a problem and
your file is now corrupt.
And here lies the problem. Silent file corruption is nothing anybody wants. Either produce a complete file or don't produce
any at all and error out.
PrintWriter was invented for network I/O, not file I/O. It is used by Servlets for example. Another fine application for it
is logging, when you really don't care if your logging produces I/O errors. It is used in the JDBC API for example.
How to avoid writing corrupt files:
File f = new File("out.txt"); Writer w = null; try { w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), StandardCharsets.UTF_8)); w.append("hello world"); ... w.close(); w = null; } finally { if (w != null) { // there was an exception and f is corrupt try { w.close(); } catch (IOException e) { } f.delete(); } }
Infinite heap
byte[] pdf = toPdf(file);Here a method creates a PDF file from some input and returns the binary PDF data as a byte array. This code assumes that the generated file is small enough to fit into the available heap memory. If this code can not make this 100% sure then it is vulnerable to an out of memory condition. Especially if this code is run server-side which usually means many parallel threads. Bulk data must never be handled with byte arrays. Streams should be used and the data should be spooled to disk or a database.
File pdf = toPdf(file);A similar anti-pattern is to buffer streaming input from an "untrusted" (security term) source. Such as buffering data that arrives on a network socket. If the application doesn't know how much data will be arriving it must make sure that it keeps an eye on the size of the data. If the amount of buffered data exceeds sane limits an error condition (exception) should be signalled to the caller, rather than driving the application against the wall by letting it run into an out of memory condition.
Infinite time
Socket socket = ... socket.connect(remote); InputStream in = socket.getInputStream(); int i = in.read();The above code has two blocking calls that use unspecified timeouts. Imagine if the timeout is infinite. That may cause the application to hang forever. Generally it is an extremely stupid idea to have infinite timeouts in the first place. Infinity is extremely long. Even by the time the Sun turns into a red giant (it explodes), it's still a looong way to Infinity. The average programmer dies at 72. There is simply no real-world situation, where we want to wait that long. Infinite timeout is just an absurd thing. Use an hour, day, week, month, 1 year, 10 years. But not Infinity. To connect to a remote machine I personally find 20 seconds plenty of timeout. A human is not even as patient and would cancel the operation before. While there is a nice override for the connect() method that takes a timeout parameter, there is no such thing for the read(). But you can modify a Socket's socket timeout before every blocking call. (Not just once! You can set different timeouts for different situations.) The socket will throw an exception on blocking calls after that timeout. Also frameworks that communicate over the network should provide an API to control these timeouts and use sensible default values. Infinity is not sensible - it's insane and drives you mad. Who came up with this absolutely useless infinity timeout anyway?
Socket socket = ... socket.connect(remote, 20000); // fail after 20s InputStream in = socket.getInputStream(); socket.setSoTimeout(15000); int i = in.read();Unfortunately the file system API (FileInputStream, FileChannel, FileDescriptor, File) provides no way to set timeouts on file operations. That's very unfortunate. Because these are the most common blocking calls in a Java application: writing to stdout/stderr and reading from stdin are file operations, and writing to log files is common. Operations on the standard input/output streams depend directly on other processes outside of our Java VM. If they decide to block forever, so will reads/writes to these streams in our application. Disk I/O is a limited resource for which all processes on a system compete. There is no guarantee that a simple read/write on a file is quick. It may incur unspecified wait time. Also today remote file systems are ubiquitous. Disks may be on a SAN/NAS, or file systems may be mounted over the network (NFS, AFS, CIFS/Samba). So a filesystem call may actually be a network call: too bad that we don't have the power of the network API here! So if the OS decides that the timeout for the write is 60 seconds you're stuck with it. It is a failure to assume that any disk/file operation is fast, or even remotely instantaneous. An application can do the user a favour by assuming that a file operation can takes seconds. So it's best avoided or done asynchronously (in background). Solutions to this problem are: adequate buffering and queueing/asynchronous processing.
Assuming a cheap timer call
for (...) { long t = System.currentTimeMillis(); long t = System.nanoTime(); Date d = new Date(); Calendar c = new GregorianCalendar(); }
Creating a new Date or Calendar performs a syscall to obtain the current time. On Unix/Linux this is the syscall gettimeofday
which is considered "extremely cheap".
Well, extremely cheap only compared to other syscalls! In that it usually doesn't require a switch from userspace to kernelspace but is rather implemented as a read from a memory
mapped page. Still calls to gettimeofday
are expensive compared to normal code execution.
The exact penalty of the call strongly depends on the architecture and even configuration (modern x86 systems have numerous timers that can be used by the OS: HPET, TSC, RTC, ACPI, clock chips etc.).
On my Linux-2.6.37-rc7 system the timer calls also seem to be synchronised over the system. That means the total available bandwidth of ~800 calls per ms is shared by all threads/processes.
Consequently my dual core running with 2 threads was able to make ~400 calls per ms per thread. (Thanks to J. Davies for that hint)
And last but not least the resolution of this timer is not infinite. At best it is milliseconds, but it may well be rather something like 25 to 50 milliseconds with a large jitter.
Modern Linux system can easily achieve the full ms resolution in System.currentTimeMillis. But that has not always been the case.
System.nanoTime will certainly not have its full theoretical resolution: 1ns = 10-9s which corresponds to 1GHz. So on a CPU with 3GHz this would allow ~3 instructions
to execute the call, which is obviously not enough. I measured a large jitter between 800ns and 1000000ns(1ms).
Clearly calling gettimeofday every 100 nano seconds is wasteful.
Most of the time you don't need the current time as precicely. Caching it outside of the loop is trivial. This way you only access the timer once. You can still decide to clone the Date instance, if you really need different objects. Cloning is extremely cheap compared to a timer access (factor 50 on my system).
Date d = new Date(); for (E entity : entities) { entity.doSomething(); entity.setUpdated((Date) d.clone()); }
Caching the time may not be an option if the loop runs for more than a couple of milliseconds. In that case you may setup a timer that periodically updates a timestamp variable with the current time (using interrupts). Set it to the exact granularity that you need. The coarser that granularity is, the better. On my system this loop is 200 times faster than creating a new Date each time.
private volatile long time; Timer timer = new Timer(true); try { time = System.currentTimeMillis(); timer.scheduleAtFixedRate(new TimerTask() { public void run() { time = System.currentTimeMillis(); } }, 0L, 10L); // granularity 10ms for (E entity : entities) { entity.doSomething(); entity.setUpdated(new Date(time)); } } finally { timer.cancel(); }
Catch all: I don't know the right runtime exception
Query q = ... Person p; try { p = (Person) q.getSingleResult(); } catch(Exception e) { p = null; }This is an example of a J2EE EJB3 query. The getSingleResult throws runtime exceptions when a) the result is not unique, b) there is no result c) when the query could not be executed due to database failure or so. The code above just catches any exception. A typical catch-all block. Using
null
as a result may be the right thing for case b) but not for case a) or c). In general one should not catch more
exceptions than necessary. The correct exception handling is
Query q = ... Person p; try { p = (Person) q.getSingleResult(); } catch(NoResultException e) { p = null; }
Exceptions are annoying
try { doStuff(); } catch(Exception e) { log.fatal("Could not do stuff"); } doMoreStuff();There are two problems with this tiny piece of code. First, if this is really a fatal condition then the method should abort and notify the caller of the fatal condition with an appropriate exception (so why is it caught in the first place?) Hardly ever can you just continue after a fatal condition. Second, this code is very hard to debug because the reason of the failure is lost. Exception objects carry detailed information about where the error occurred and what caused it. Individual subclasses may actually carry a lot of extra information that the caller can use to deal with the situation properly. It's a lot more than a simple error code (which is so popular in the C world. Just look at the Linux kernel. return -EINVAL everywhere...). If you catch highlevel exceptions then at least log the message and stack trace. You should not see exceptions as a necessary evil. They are a great tool for error handling.
try { doStuff(); } catch(Exception e) { throw new MyRuntimeException(e.getMessage(), e); }
Re-wrapping RuntimeException
try { doStuff(); } catch(Exception e) { throw new RuntimeException(e); }Sometimes you really want to re-throw any checked exception as RuntimeException. The above piece of code doesn't take into account however, that RuntimeException extends Exception. The RuntimeException doesn't need to be catched here. Also the exception's message is not propagated properly. A bit better is to catch the RuntimeException separately and not wrap it. Even better is to catch all the checked exceptions individually (even if they are a lot).
try { doStuff(); } catch(RuntimeException e) { throw e; } catch(Exception e) { throw new RuntimeException(e.getMessage(), e); }
try { doStuff(); } catch(IOException e) { throw new RuntimeException(e.getMessage(), e); } catch(NamingException e) { throw new RuntimeException(e.getMessage(), e); }
Not properly propagating the exception
try { } catch(ParseException e) { throw new RuntimeException(); throw new RuntimeException(e.toString()); throw new RuntimeException(e.getMessage()); throw new RuntimeException(e); }This codes just wraps a parsing error into a runtime exception in different ways. None of them provides really good information to the caller. The first just loses all information. The second may do anything depending on what information toString() produces. The default toString() implementation lists the fully qualified exception name followed by the message. Nesting many exceptions will produce an unwieldy long and ugly string, unsuitable for a user. The third just preserves the message, which is better than nothing. The last preserves the cause, but sets the message of the runtime exception to toString() of its cause (see above). The most useful and readable version is to propagate only the cause message in the runtime exception and pass the original exception as the cause:
try { } catch(ParseException e) { throw new RuntimeException(e.getMessage(), e); }
Silly exception messages
try { } catch (ParseException e) { throw new RuntimeException("**** --> OMFG something scary happened !!!!11! <---"); }This exception is useless. It doesn't give the caller any indication why it occurred. Instead it contains ASCII art and emotional wording that helps nobody. Either add useful information or simply pass the message of the original exception. Don't add your custom "operation failed because: " string in front of the original message. It is useless. And it adds that string to the constant pool which will be full of useless strings in a big application. Strings are top space consumers in a compiled application.
try { } catch (ParseException e) { // for code so it gets access to some context throw new MyException(input, e); // for humans throw new RuntimeException(input +": "+ e.getMessage(), e); // or simply throw new RuntimeException(e.getMessage(), e); }
Catching to log
try { ... } catch(ExceptionA e) { log.error(e.getMessage(), e); throw e; } catch(ExceptionB e) { log.error(e.getMessage(), e); throw e; }This code only catches exception to write out a log statement and then rethrows the same exception. This is stupid. Let the caller decide if the message is important to log and remove the whole try/catch clause. Its only useful when you know that the caller doesn't log it. That's the case if the method is called by a framework which is not under your control. If you log because the caller doesn't have enough information to log, then your exception class is inappropriate: pass all required information along in the exception. That's what they are for!
Incomplete exception handling
try { is = new FileInputStream(inFile); os = new FileOutputStream(outFile); } finally { try { is.close(); os.close(); } catch(IOException e) { /* we can't do anything */ } }If streams are not closed, the underlying operating system can't free native resources. This programmer wanted to be careful about closing both streams. So he put the close in a
finally
clause. But if is.close()
throws an IOException then
os.close
is not even executed. Both close statements must be wrapped in their own try/catch clause. Moreover, if creating
the input stream throws an exception (because the file was not found) then os
is null and os.close()
will
throw a NullPointerException
. To make this less verbose I have stripped some newlines.
try { is = new FileInputStream(inFile); os = new FileOutputStream(outFile); } finally { try { if (is != null) is.close(); } catch(IOException e) {/* we can't do anything */} try { if (os != null) os.close(); } catch(IOException e) {/* we can't do anything */} }
The exception that never happens
try { ... do risky stuff ... } catch(SomeException e) { // never happens } ... do some more ...Here the developer executes some code in a try/catch block. He doesn't want to rethrow the exception that one of the called methods declares to his annoyance. As the developer is clever he knows that in his particular situation the exception will never be thrown, so he just inserts an empty catch block. He even puts a nice comment in the empty catch block - but they are famous last words... The problem with this is: how can he be sure? What if the implementation of the called method changes? What if the exception is still thrown in some special case but he just didn't think of it? The code after the try/catch may do the wrong thing in that situation. The exception will go completely unnoticed. The code can be made much more reliable by throwing a runtime exception in the case. This works like an assertion and adheres to the "crash early" principle. The developer will notice if his assumption was wrong. The code after the try/catch will not be executed if the exception occurred against all honest hope and expectation. If the exception really never occurs - fine, nothing changed.
try { ... do risky stuff ... } catch(SomeException e) { // never happens hopefully throw new IllegalStateException(e.getMessage(), e); // crash early, passing all information } ... do some more ...
The transient trap
public class A implements Serializable { private String someState; private transient Log log = LogFactory.getLog(getClass()); public void f() { log.debug("enter f"); ... } }Log objects are not serializable. The programmer knew this and correctly declared the
log
field as transient so it is
not serialised. However the initialisation of this variables happens in the class' initialiser. Upon deserialization initializers and
constructors are not executed! This leaves the deserialized object with a null log
variable which subsequently causes
a NullPointerException
in f()
. Rule of thumb: never use class initialization with transient variables. You
can either solve this case here by using a static variable or by using a local variable:
public class A implements Serializable { private String someState; private static final Log log = LogFactory.getLog(A.class); public void f() { log.debug("enter f"); ... } } public class A implements Serializable { private String someState; public void f() { Log log = LogFactory.getLog(getClass()); log.debug("enter f"); ... } }
Overkill initialization
public class B { private int count = 0; private String name = null; private boolean important = false; }This programmer used to code in C. So naturally he wants to make sure every variable is properly initialized. Here however it is not necessary. The Java language specification guarantees that member variables are initialized with certain values automatically: 0, null, false. By declaring them explicitly the programmer causes a class initializer to be executed before the constructor. This is unnecessary overkill and should be avoided.
public class B { private int count; private String name; private boolean important; }
Log instances: static or not?
This section was edited and before actually suggested not to store log instances in static variables. Turns out I was wrong. Mea culpa. I apologize.Store the darn log instance in a static final variable and be happy.
private static final Log log = LogFactory.getLog(MyClass.class);Here is why:
- Automatically thread-safe. But only with the final keyword included!
- Usable from static and non-static code.
- No problems with serializable classes.
- Initialization cost only once: getLog() may not be as cheap as you might suppose.
- Nobody is going to unload the Log class loader anyway.
Chosing the wrong class loader
Class clazz = Class.forName(name); Class clazz = getClass().getClassLoader().loadClass(name);This code uses the class loader that loaded the current class. getClass() might return something unexpected, like a subclass, or a dynamic proxy. Something out of your control. This is hardly ever what you want when you dynamically load an additional class. Especially in managed environments like Application servers, Servlet engines or Java Webstart this is most certainly wrong. This code will behave very differently depending on the environment it is run in. Environments use the context class loader to provide applications with a class loader they should use to retrieve "their own" classes.
ClassLoader cl = Thread.currentThread().getContextClassLoader(); if (cl == null) cl = MyClass.class.getClassLoader(); // fallback Class clazz = cl.loadClass(name);
Poor use of reflection
Class beanClass = ... if (beanClass.newInstance() instanceof TestBean) ...This programmer is struggling with the reflection API. He needs a way to check for inheritance but didn't find a way to do it. So he just creates a new instance and uses the
instanceof
operator he is used to. Creating an instance of a class you don't know
is dangerous. You never know what this class does. It may be very expensive. Or the default constructor may not even exist. Then this
if statement would throw an exception. The correct way of doing this check is to use the Class.isAssignableFrom(Class)
method. Its semantics is upsidedown of instanceof
.
Class beanClass = ... if (TestBean.class.isAssignableFrom(beanClass)) ...
Synchronization overkill
Collection l = new Vector(); for (...) { l.add(object); }
Vector
is a synchronized ArrayList
. And Hashtable
is a synchronized HashMap
.
Both classes should only be used if synchronization is explicitly required. If however those collections are used as local temporary
variables the synchronization is complete overkill and degrades performance considerably.
I measured a 25% penalty.
Collection l = new ArrayList(); for (...) { l.add(object); }
Wrong list type
Without sample code. Junior developers often have difficulties to chose the right list type. They usually choose quite randomly fromVector
, ArrayList
and LinkedList
. But there are performance considerations to make! The
implementations behave quite differently when adding, iterating or accessing object by index. I'll ignore Vector in this list because
it behaves like an ArrayList, just slower. NB: n is the size of the list, not the number of operations!
I refrain from using the O() notation here because it doesn't give a useful image of what's happening.
The table lists the cost of list operations.
ArrayList | LinkedList | |
---|---|---|
add (append) | const or ~log(n) if growing | const |
insert (middle) | linear or ~n*log(n) if growing | linear |
remove (middle) | linear (always performs complete copy) | linear |
iterate | linear | linear |
get by index | const | linear |
Memory considerations: LinkedList wraps every element into a wrapper object. ArrayList allocates a completely new array each time it needs to grow and performs an array copy on every remove(). All standard Collections can not reuse their Iterator objects, which may cause Iterator churn especially when recursively iterating large tree structures.
Personally I almost never use LinkedList. It would really only make sense when you wanted to insert objects in the middle of a list. But without access to the wrapper object this doesn't scale and has linear cost because you must first traverse the list until you find the insert position. So what exactly is the point of the LinkedList class? I recommend using ArrayLists only.
The HashMap size trap
Map map = new HashMap(collection.size()); for (Object o : collection) { map.put(o.key, o.value); }This developer had good intentions and wanted to make sure that the HashMap doesn't need to be resized. He thus set its initial size to the number of elements he was going to put into it. Unfortunately the HashMap implementation doesn't quite behave like this. It sets its internal threshold to
threshold = (int)(capacity * loadFactor)
. So it will resize after 75% of the collection
have been inserted into the map. The above code will thus always cause extra garbage.
Map map = new HashMap(1 + (int) (collection.size() / 0.75));
Hashtable, HashMap and HashSet are overrated
These classes are extremely popular. Because they have great usability for the developer. Unfortunately they are also horribly inefficient. Hashtables become useful when you have 100 or more entries. But not for just a few elements. In typical code such collections contain around 10 entries - which fits in a CPU cache line! Hashtable and HashMap wrap every key/value pair into an Entry wrapper object. An Entry object is surprisingly large. Not only does it hold a reference to key and value, but also stores the hash code and a forward reference to the next Entry of the hash bucket. When you look at heap dumps with a memory analyzer you will be shocked by how much space is wasted by them in large applications like an application server. When you look at the source code of HashSet you will see that the developers were extremely lazy and just used a HashMap in the backend!Before using any of these classes, think again. IdentityHashMap can be a viable alternative. But be careful, it intentionally breaks the Map interface. It is much more memory efficient by implementing an open hashtable (no buckets), doesn't need an Entry wrapper and uses a simple Object[] as its backend. Instead of a HashSet a simple ArrayList may do similarly well (you can use contains(Object)) as long as it's small and lookups are rare.
For Sets that contain only a handful of entries the whole hashing is overkill and the memory wasted for the HashMap backend plus the wrapper objects is just nuts. Just use an ArrayList or even an array.
Actually it's a shame that there is no efficient Map and Set implementations in the standard JDK!
Lists are overrated
Also List implementations are very popular. But even lists are often not necessary. Simple arrays may do as well. I am not saying that you should not use Lists at all. They are great to work with. But know when to use arrays. The following are indicators that you should be using an array instead of a list:- The list has a fixed size. Example: days of the week. A set of constants.
- The list is often (10'000 times) traversed.
- The list contains wrapper objects for numbers (there are no lists of primitive types).
List<Integer> codes = new ArrayList<Integer>(); codes.add(Integer.valueOf(10)); codes.add(Integer.valueOf(20)); codes.add(Integer.valueOf(30)); codes.add(Integer.valueOf(40)); versus int[] codes = { 10, 20, 30, 40 };
// horribly slow and a memory waster if l has a few thousand elements (try it yourself!) List<Mergeable> l = ...; for (int i=0; i < l.size()-1; i++) { Mergeable one = l.get(i); Iterator<Mergeable> j = l.iterator(i+1); // memory allocation! while (j.hasNext()) { Mergeable other = l.next(); if (one.canMergeWith(other)) { one.merge(other); other.remove(); } } } versus // quite fast and no memory allocation Mergeable[] l = ...; for (int i=0; i < l.length-1; i++) { Mergeable one = l[i]; for (int j=i+1; j < l.length; j++) { Mergeable other = l[j]; if (one.canMergeWith(other)) { one.merge(other); l[j] = null; } } }You save an extra list object (wrapping an array), wrapper objects and possibly lots of iterator instances. Even Sun realized this. That's why Collections.sort() actually copies the list into an array and performs the sort on the array.
Object arrays are soooo flexible
/** * @returns [1]: Location, [2]: Customer, [3]: Incident */ Object[] getDetails(int id) {...Even though documented, this kind of passing back values from a method is ugly and error prone. You should really declare a small class that holds the objects together. This is analoguos to a
struct
in C.
Details getDetails(int id) {...} private class Details { public Location location; public Customer customer; public Incident incident; }
Premature object decomposition
public void notify(Person p) { ... sendMail(p.getName(), p.getFirstName(), p.getEmail()); ... }
class PhoneBook { String lookup(String employeeId) { Employee emp = ... return emp.getPhone(); } }In the first example it's painful to decompose an object just to pass its state on to a method. In the second example the use of this method is very limited. If overall design allows it pass the object itself.
public void notify(Person p) { ... sendMail(p); ... }
class EmployeeDirectory { Employee lookup(String employeeId) { Employee emp = ... return emp; } }
Modifying setters
private String name; public void setName(String name) { this.name = name.trim(); } public void String getName() { return this.name; }This poor developer suffered from spaces at the beginning or end of a name entered by the user. He thought to be clever and just removed the spaces inside the setter method of a bean. But how odd is a bean that modifies its data instead of just holding it? Now the getter returns different data than was set by the setter! If this was done inside an EJB3 entity bean a simple read from the DB would actually modify the data: For every INSERT there would be an UPDATE statement. Let alone how hard it is to debug these side-effects! In general, a bean should not modify its data. It is a data container, not business logic. Do the trimming where it makes sense: in the controller where the input occurs or in the logic where the spaces are not wanted.
person.setName(textInput.getText().trim());
Unnecessary Calendar
Calendar cal = new GregorianCalender(TimeZone.getTimeZone("Europe/Zurich")); cal.setTime(date); cal.add(Calendar.HOUR_OF_DAY, 8); date = cal.getTime();A typical mistake by a developer who is confused about date, time, calendars and time zones. To add 8 hours to a Date there is no need for a Calendar. Neither is the time zone of any relevance. (Think about is if you don't understand this!) However if we wanted to add days (not hours) we would need a Calendar, because we don't know the length of a day for sure (on DST change days may have 23 or 25 hours).
date = new Date(date.getTime() + 8L * 3600L * 1000L); // add 8 hrs
Calendar cal = new GregorianCalender(TimeZone.getTimeZone("Europe/Zurich")); SimpleDateFormat df = new SimpleDateFormat("dd.MM.yyyy HH:mm"); df.setCalendar(cal);Here the Calendar object is completely unnecessary. The DateFormat object already contains a Calendar instance. Reuse that.
SimpleDateFormat df = new SimpleDateFormat("dd.MM.yyyy HH:mm"); df.setTimeZone(TimeZone.getTimeZone("Europe/Zurich"));
Relying on the default TimeZone
Calendar cal = new GregorianCalendar(); cal.setTime(date); cal.set(Calendar.HOUR_OF_DAY, 0); cal.set(Calendar.MINUTE, 0); cal.set(Calendar.SECOND, 0); Date startOfDay = cal.getTime();The developer wanted to calculate the start of the day (0h00). First he obviously missed out the millisecond field of the Calendar. But the real big mistake is not setting the TimeZone of the Calendar object. The Calendar will thus use the default time zone. This may be fine in a Desktop application, but in server-side code this is hardly ever what you want: 0h00 in Shanghai is in a very different moment than in London. The developer needs to check which is the time zone that is relevant for this computation.
Calendar cal = new GregorianCalendar(user.getTimeZone()); cal.setTime(date); cal.set(Calendar.HOUR_OF_DAY, 0); cal.set(Calendar.MINUTE, 0); cal.set(Calendar.SECOND, 0); cal.set(Calendar.MILLISECOND, 0); Date startOfDay = cal.getTime();
Time zone "conversion"
public static Date convertTz(Date date, TimeZone tz) { Calendar cal = Calendar.getInstance(); cal.setTimeZone(TimeZone.getTimeZone("UTC")); cal.setTime(date); cal.setTimeZone(tz); return cal.getTime(); }
If you think this method does something useful, please go and read the article about time. This developer had not read the article and was desperately trying to "fix" the time zone of his date. Actually the method does nothing. The returned Date will not have any different value than the input. Because a Date does not carry time zone information. It is always UTC. And the getTime / setTime methods of Calendar always convert between UTC and the actual time zone of the Calendar.
Using Calendar.getInstance()
Calendar c = Calendar.getInstance(); c.set(2009, Calendar.JANUARY, 15);
This code assumes a Gregorian calendar. But what if the returned Calendar subclass is a Buddhistic, Julian, Hebrew, Islamic, Iranian or Discordian calendar? In these the year 2009 has a very different meaning. And a month called January doesn't exist. Calendar.getInstance() uses the current default locale to select an appropriate implementation. It depends on the Java implementaton which implementations are available. The utility of Calendar.getInstance() is thus very limited, and its use should be avoided as it's result is not well defined.
Calendar c = new GregorianCalendar(timeZone); c.set(2009, Calendar.JANUARY, 15);
Dangerous Calendar manipulation
GregorianCalender cal = new GregorianCalender(TimeZone.getTimeZone("Europe/Zurich")); cal.set(Calendar.SECOND, 0); cal.set(Calendar.MILLISECOND, 0); if (cal.before(other)) doSomething(); cal.setTimeZone(TimeZone.getTimeZone("GMT")); cal.set(Calendar.HOUR_OF_DAY, 23); Date d = cal.getTime();
This code manipulates a Calendar object in ways that are bound to yield undefined results. Calendar objects have complex inner state: individual fields for day, hour, year etc., a millisecond since epoch value (like Date) and a time zone. Depending on what you change, some of these fields are invalidated and are only recomputed from other values when you call certain methods:
set()
invalidates the millisecond since epoch value and dependent fields (changing DATE obviously invalidates DAY_OF_WEEK)setTimeZone()
invalidates all fields execpt the millisecond since epoch valueget(), getTime(), getTimeInMillis(), add(), roll()
recomputes the millisecond since epoch value from the fieldsget(), add()
also recompute invalid fields from millisecond since epoch
Whenever you change fields with set()
, then dependend fields do not get updated until you call
get()
, getTime()
, getTimeInMillis()
, add()
, or roll()
.
The first paragraph of above code calls set()
followed by before()
. There is no guarantee
(according to the API Doc) that before() will see the modified time value.
The second paragraph invalidates all fields and the millisecond since epoch value by calling setTimeZone()
and set()
, losing the calendar's data completely.
See also bug 4827490
Calendar objects should always be manipulated according to these simple rules:
- Initialize TimeZone (and Locale if you need) already in the constructor
- After calls to
set()
add a call togetTimeInMillis()
- After a call to
setTimeZone()
add a call toget()
GregorianCalender cal = new GregorianCalender(TimeZone.getTimeZone("Europe/Zurich")); cal.set(Calendar.SECOND, 0); cal.set(Calendar.MILLISECOND, 0); cal.getTimeInMillis(); if (cal.before(other)) doSomething(); cal.setTimeZone(TimeZone.getTimeZone("GMT")); cal.get(Calendar.DATE); cal.set(Calendar.HOUR_OF_DAY, 23); Date d = cal.getTime();
Calling Date.setTime()
account.changePassword(oldPass, newPass); Date lastmod = account.getLastModified(); lastmod.setTime(System.currentTimeMillis());
The above code updates the last modified date of the account entity. The programmer wants to
be conservative and avoids creating a new Date
object. Instead she uses the the
setTime
method to modify the existing Date
instance.
There is actually nothing wrong with that. But I just do not recommend this practice. Date
objects are usually passed around carelessly. The same Date instance could be passed to numerous
objects, which don't make a copy in their setters. Dates are often used like primitives. Thus if
you modify a Date instance, other objects that use this instance might behave unexpectedly. Of
course it is unclean design if an object exposes its intrinsic Date instance to the outside
world, if you write code that strictly adheres to classical OO-principles
(which I think is too inconvenient). General everyday Java practice however
is to just copy Date references and not clone the object in setters. Thus every programmer should
treat Date as immutable and should not modify existing instances. This should only be done
for performance reasons in special situations. Even then the use of a simple long
is probably equally good.
account.changePassword(oldPass, newPass); account.setLastModified(new Date());
Assuming SimpleDateFormat was thread-safe
public class Constants { public static final SimpleDateFormat date = new SimpleDateFormat("dd.MM.yyyy"); }
The above code is flawed in several ways. It's broken, because it shares a static instance
of a SimpleDateFormat with possibly any number of threads. SimpleDateFormat is not thread-safe. If
multiple threads concurrently use this object the results are undefined. You may observe strange
output from format
and parse
or even exceptions. Unfortunately this mistake is very common!
Yes, sharing a SimpleDateFormat requires proper synchronization. Yes that comes at a price (cache flushes, lock contention, etc.). And yes, creating a SimpleDateFormat is not free either (pattern parsing, object allocation). But simply ignoring thread-safety is not a solution, but a sure way to break your code.
Of course this code also doesn't take the time zone into account. And then defining a class called Constants screams of yet another anti-pattern (see next section).
Having a global Configuration/Parameters/Constants class
public interface Constants { String version = "1.0"; String dateFormat = "dd.MM.yyyy"; String configFile = ".apprc"; int maxNameLength = 32; String someQuery = "SELECT * FROM ..."; }
Often seen in large projects: one class or interface that contains all sorts of constants that are used throughout the application. Why is this bad? Because these constants are unrelated to each other. This class is the only thing that they have in common. And the reference to this class will pollute many again unrelated components of the application. You want to later extract a component and use it in a different application? Or share some classes between a server and a remote client? You may need to ship the constants class as well! This class has introduced a dependency between otherwise unrelated components. This inhibits reuse and loose coupling and gives way to chaos.
Instead put constants where they belong. In no case should constants be used across component boundaries. This is only allowed if the component is a library, on which an explicit dependency is wanted.
Not noticing overflows
public int getFileSize(File f) { long l = f.length(); return (int) l; }
This developer, for whatever reason, wrapped a call to determine the size of a file into a
method that returns an int
instead of a long
. This code does not
support files larger than 2 GB and just returns a wrong length in that case. Code that casts
a value to a smaller size type must first check for a possible overflow and throw an exception.
public int getFileSize(File f) { long l = f.length(); if (l > Integer.MAX_VALUE) throw new IllegalStateException("int overflow"); return (int) l; }
Another version of an overflow bug is the following. Note the missing parantheses in the first println statement.
long a = System.currentTimeMillis(); long b = a + 100; System.out.println((int) b-a); System.out.println((int) (b-a));
And last, a true gem that I uprooted during code review. Note how the programmer tried to be careful, but then failed so badly by assuming an int could ever become larger than its maximum value.
int a = l.size(); a = a + 100; if (a > Integer.MAX_VALUE) throw new ArithmeticException("int overflow");
Using == with float or double
for (float f = 10f; f!=0; f-=0.1) { System.out.println(f); }The above code doesn't behave as expected. It causes an endless loop. Because 0.1 is an infinite binary decimal,
f
will never be exactly 0. Generally you should never compare float
or double values with the equality operator ==. Always use less than or greater than. Java compilers
should be changed to issue a warning in that case. Or even make == an illegal operation for floating
point types in the Java Language Spec. It makes really no sense to have this feature.
for (float f = 10f; f>0; f-=0.1) { System.out.println(f); }
Storing money in floating point variables
float total = 0.0f; for (OrderLine line : lines) { total += line.price * line.count; }
double a = 1.14 * 75; // 85.5 represented as 85.4999... System.out.println(Math.round(a)); // surprising output: 85 System.out.println(10.0/3); // surprising output: 3.3333333333333335 (precision lost twice during division and on conversion to decimal)
BigDecimal d = new BigDecimal(1.14); // precision has already been lost
I have seen many developers coding such a loop. Including myself in my early days. When this code sums 100 order lines with every line having one 0.30$ item, the resulting total is calculated to exactly 29.999971. The developer notices the strange behaviour and changes the float to the more precise double, only to get the result 30.000001192092896. The somewhat surprising result is of course due to the difference in representation of numbers by humans (in decimal format) and computers (in binary format). It always occurs in its most annyoing form when you add fractional amounts of money or calculate the VAT.
Binary representation of floating point numbers was invented for inherently inexact values like measurements. Perfect for engineering! But unusable when you want exact math. Like banks. Or when counting.
There are business cases where you can not afford to lose precision. You lose precision when converting between decimal and binary and when rounding happens in not a well-defined mannor or at indeterminate points. To avoid losing precision you must use fixed point or integer arithmetics. That does not only apply to monetary values, but it is a frequent source of annoyance in business applications and therefore makes a good example. In the second example an unsuspecting user of the program would simply say the computer's calculator is broken. That's of course very embarassing for the programmer.
Consequently an amount of money should never ever be stored in a floating point data type (float, double). Please note that it is not just any calculation that is inexact. Even a simple multiplication with an integer can already yield an inexact result. It is the mere fact of storing a value in a binary representation (float, double) that may already cause rounding! You simply can not store 0.3 as an exact value in float or double. Because float and double are binary IEEE754 types. See also here. You can play around with various numbers and their binary representation here. If you see a float or double in your financial code base, the code will most likely yield inexact results. Instead either a string or fixed point representation should be chosen. A text representation must be in a well-defined format and is not to be confused with user input/output in a locale specific format. Both representations must define the precision (number of digits before and after the decimal point) that is stored.
For calculations the class BigDecimal provides an excellent facility. The class can be used such that it throws runtime exceptions if precision is unexpectedly lost in an operation. This is very helpful to uproot subtle numerical bugs and enables the developer to correct the calculation.
BigDecimal total = BigDecimal.ZERO; for (OrderLine line : lines) { BigDecimal price = new BigDecimal(line.price); BigDecimal count = new BigDecimal(line.count); total = total.add(price.multiply(count)); // BigDecimal is immutable! } total = total.setScale(2, RoundingMode.HALF_UP);
BigDecimal a = (new BigDecimal("1.14")).multiply(new BigDecimal(75)); // 85.5 exact a = a.setScale(0, RoundingMode.HALF_UP); // 86 System.out.println(a); // correct output: 86
BigDecimal a = new BigDecimal("1.14");
Not freeing resources in a finally block
public void save(File f) throws IOException { OutputStream out = new BufferedOutputStream(new FileOutputStream(f)); out.write(...); out.close(); } public void load(File f) throws IOException { InputStream in = new BufferedInputStream(new FileInputStream(f)); in.read(...); in.close(); }
The above code opens an output stream to a file, allocating a file handle in the operating system.
File handles are a rare resource and need to be properly freed, by calling close on the
FileOutputStream (same for FileInputStream of course). To ensure that even in the case of an
exception (the filesystem may become full during the write), closing must happen in a finally
block. Here the stream is also wrapped into a buffering stream. That means not all data will have
been written to disk by the time we arrive at the close()
call. The close call itself will flush
the pending data in the buffer to disk and may thus itself fail with an IOException. If that close
fails the file on disk is incomplete (truncated) and thus probably corrupt. The method should
therefore propagate the IOException in that case. In the case of a FileInputStream we can safely
ignore the potential IOException from a close() call. We have read all data that we need, and there
is nothing useful that we can do if the underlying close() failed anyway. It's not even worth
logging it.
In a perfect world BufferedOutputStream.close()
would be implemented correctly. But sadly it has
a bug that's not going to be fixed:
it loses any IOException from the implicit flush and truncates your file silently. So here
we give the proper workaround with an explicit flush before close.
To be exact the corrected code below can leak in one small corner case: when the file stream was allocated but then allocating the buffered stream fails mysteriously (with out of memory for instance). As a pragmatic person I think in such a pathological case we can safely rely on the garbage collector to clean up the mess. It's not worth the hassle to deal with it.
// code for your cookbook public void save() throws IOException { File f = ... OutputStream out = new BufferedOutputStream(new FileOutputStream(f)); try { out.write(...); out.flush(); // don't lose exception by implicit flush on close } finally { out.close(); } } public void load(File f) throws IOException { InputStream in = new BufferedInputStream(new FileInputStream(f)); try { in.read(...); } finally { try { in.close(); } catch (IOException e) { } } }
Let me give you also the cook book recipe for another ubiquitous pattern: database access. Again this is the pragmatic approach. Yes, rs.close() could fail with mysterious Errors, except they only occur in your university lecture on Quantum Mechanics and not in The Real World (tm). And only perverts would write the try/finally cascade that no Error neutrino can escape. Forgive my sarcasm. Here once and for all this is how to deal with SQL objects:
Car getCar(DataSource ds, String plate) throws SQLException { Car car = null; Connection c = null; PreparedStatement s = null; ResultSet rs = null; try { c = ds.getConnection(); s = c.prepareStatement("select make, color from cars where plate=?"); s.setString(1, plate); rs = s.executeQuery(); if (rs.next()) { car = new Car(); car.make = rs.getString(1); car.color = rs.getString(2); } } finally { if (rs != null) try { rs.close(); } catch (SQLException e) { } if (s != null) try { s.close(); } catch (SQLException e) { } if (c != null) try { c.close(); } catch (SQLException e) { } } return car; }With that said, don't miss the next paragraph.
Abusing finalize()
public class FileBackedCache { private File backingStore; ... protected void finalize() throws IOException { if (backingStore != null) { backingStore.close(); backingStore = null; } } }
This class uses the finalize
method to release a file handle. The problem is that
you can don't know when the method is called. The method is called by the garbage collector. If you
are running out of file handles you want this method to be called rather sooner than later. But the
GC will probably only invoke the method when you are about to run out of heap, which is a very
different situation. It may take anything from milliseconds to days until GC and finalization runs.
The garbage collector manages memory only. It does that very well. But it must
not be abused to manage any other resources apart from that. The GC is not a generic resource management
mechanism! I find Sun's API Doc of the finalize
method very misleading in that respect. It actually suggest to use this method to close I/O
resources - complete bullshit if you ask me. Again: I/O has nothing to do with memory!
Better code provides a public close method, which must be called by a well-defined lifecycle management, like JBoss MBeans or so.
public class FileBackedCache { private File backingStore; ... public void close() throws IOException { if (backingStore != null) { backingStore.close(); backingStore = null; } } }JDK 1.7 (Java 7) has introduced the AutoClosable interface. It enables an automatic call to a
close
method, when the variable (not the object) goes out of scope of a try-with-resource block.
It is very different from a finalizer. Its time of execution
is well-defined at compile time.
try (Writer w = new FileWriter(f)) { // implements Closable w.write("abc"); // w goes out of scope here: w.close() is called automatically in ANY case } catch (IOException e) { throw new RuntimeException(e.getMessage(), e); }
Involuntarily resetting Thread.interrupted
try { Thread.sleep(1000); } catch (InterruptedException e) { // ok } or while (true) { if (Thread.interrupted()) break; }
The above code resets the interrupted flag of the Thread. Subsequent readers will not know that the Thread has been interrupted. If you need to pass on the information about the interrupt, rewrite the code like so.
try { Thread.sleep(1000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } or while (true) { if (Thread.currentThread().isInterrupted()) break; }
Spawning thread from static initializers
class Cache { private static final Timer evictor = new Timer(); }
java.util.Timer spwans a new thread in its constructor. Therefore the above code spawns a new thread in its static initializer. The new Thread will inherit some properties from its parent: context classloader, inheritable ThreadLocals, and some security properties (access rights). It is therefore rarely desireable to have those property set in an uncontrolled way. This may for instance prevent GC of a class loader.
The static initializer is executed by the thread that first loads the class (in any given ClassLoader), which may be a totally random thread from a thread pool of a webserver for example. If you want to control these thread properties you will have to start threads in a static method, and take control of who is calling that method.
class Cache { private static Timer evictor; public static setupEvictor() { evictor = new Timer(); } }
Canceled timer tasks that keep state
final MyClass callback = this; TimerTask task = new TimerTask() { public void run() { callback.timeout(); } }; timer.schedule(task, 300000L); try { doSomething(); } finally { task.cancel(); }
The above code uses a timer to enforce a timeout on doSomething(). The TimerTask contains an (implicit) instance reference to the outer class. Thus as long as the TimerTask exists the instance of MyClass may not be GC'ed. Unfortunately the Timer may keep cancelled TimerTasks around until their scheduled timeout has expired! That would leave the program 5 minutes with a dangling reference to the MyClass instance during which it can not get collected! It's a temorary memory leak. A better TimerTask would override the cancel() method and null the reference there. It requires slightly more code.
TimerTask task = new Job(this); timer.schedule(task, 300000L); try { doSomething(); } finally { task.cancel(); } static class Job extends TimerTask { private volatile MyClass callback; public Job(MyClass callback) { this.callback = callback; } public boolean cancel() { callback = null; return super.cancel(); } public void run() { MyClass cb = callback; if (cb == null) return; cb.timeout(); } }
Holding strong references to ClassLoaders and unflushable caches
In a dynamic system like an application server or OSGI, you should take good care not to prevent ClassLoaders from garbage collection. As you undeploy and redeploy individual applications in an application server you create new class loaders for them. The old ones are unused and should be collected. Java isn't going to let that happen if there is a single dangling reference from container code into your application code.
As various libraries are used throughout an enterprise application, that directly means that libraries should do their very best not to hold involuntary strong references to objects (and thus their class loaders).
This is not easy. Classes like java.beans.Introspector
from the JDK or
org.apache.commons.beanutils.PropertyUtils
from Apache BeanUtils or
org.springframework.beans.CachedIntrospectionResults
from Spring implement caches to speed up
their inner workings. They keep strong references to classes you pass them for analysis. Fortunately
they provide methods to flush their caches. But finding all classes that may have internal caches and
flushing them at the right time is a near to impossible job for the developer.
If you happen to use org.apache.commons.el.BeanInfoManager
from Apache Commons EL you probably have a leak. This ancient class keeps a cache of strong
references that only ever grows until out of memory. And it has no flush method.
Even Tomcat had to implement a
workaround involving reflection to clean it.
It would be much better if these libraries just used soft or weak references in the first place. A quick reminder:
Soft and weak references basically differ in the point in time when they are nulled.- WeakReference: nulled more or less at the same time when the last strong reference to the object goes away. Typical for classloader references (of what use is a classloader if none of its classes are loaded). But be careful if you use this within a ClassLoader implementation.
- SoftReference: the reference is kept even if the last strong reference to the object goes away as long as memory allows. Typical for caches.
Only if the library just caches objects from its own packages (with no external references), it may be fine not to use these special references and just use normal references.
Using soft or weak references also helps the runtime behaviour of your application: if memory gets tight, the last thing you want to spend memory on is caches. So the garbage collector will reclaim the memory used by caches if necessary. A bad example here is JBoss' SQL statement cache: it's compeletely static and can use a lot of memory, even when that is tight. Another bad example is JBoss' authentication cache.
Also every static cache must always provide a simple way to flush its contents. It's the nature of a (clean)cache (as opposed to e.g a write cache) that its contents are not valuable and can be safely discarded at any time. The limits of the cache are another trap. Caches should never grow large, and never cache objects for too long. A really bad example here is the default settings for the JDK DNS cache (it completely ignores DNS record lifetimes, and stores negative lookups forever in an unbounded list). Your API documentation should state if and when caching happens. This also helps the user to estimate runtime performance.
Nested synchronized statements
class Message { private long id; ... public synchronized int compareTo(Message that) { synchronized (that) { return Long.compare(id, that.id); } } }The above code wants to provide a thread-safe compareTo() method. The developer realized that access to the id field needs to be synchronized on the owning instance. So there are two nested synchronized statements here, one to protect this.id, one to protect that.id. Unfortunately this code will deadlock quickly when used by multiple threads. And multiple threads was the very thing that we wanted to support here. When thread 1 does
a.compareTo(b)
and thread 2 does
b.compareTo(a)
they will try to obtain the locks on a and b in reverse order and will deadlock.
Remember locking rule number one: locks must always be taken in the same order by all threads.
We could rewrite the method such that the synchronized statements are not nested at all.
public int compareTo(Message that) { long a; long b; synchronized (this) { a = this.id; } synchronized (that) { b = that.id; } return Long.compare(a, b); }
Doing random file access via RandomAccessFile
RandomAccessFile raf = new RandomAccessFile(f, "r"); for (...) { raf.seek(pos); byte b = raf.readByte(); }Despite its name the
java.io.RandomAccessFile
class is not very suitable for accessing
files in a random-access way. That is: seek, read, seek, read, etc. Each of these directly issue the
corresponding system calls / ioctl on the file descriptor. Every C programmer knows that this sort of
file access is slow and should be replaced with memory mapped file access.
You can do that in Java via MappedByteBuffer. On my laptop that's 50 times faster.
FileInputStream in = new FileInputStream(f); MappedByteBuffer map = in.getChannel().map(MapMode.READ_ONLY, 0, f.lengt()); for (...) { byte b = map.get(pos); }