This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.
Enum, EnumMap, EnumSet
Enums were added to Java 1.5. Technically speaking, each enum
is an Object
and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:
All enum
objects are singletons. This is guaranteed by Java language – you can create an instance of enum
only by its definition. This means that you pay for enum
object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as int
variables and 4 times more than byte
variables, besides type safety and better support in IDE?
The answer is the ordinal()
method implemented by every enum
. It is an increasing number starting from 0 and ending at the number of values in the given enum
minus 1. Such enum
property allows us to use arrays for enum
to any other object mapping: a value related to the first enum value will be stored in array[0]
, second enum will be mapped to array[1]
and so on according to Enum.ordinal()
method result. By the way, it was a short description of JDK EnumMap
class implementation.
If you need to implement a map from Object
into enum
, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK Object
to Object
map, but it wouldn’t be that efficient. The easy way here is to use Trove TObjectByteMap
(or TObjectIntMap
in rare cases when you have more than 128 values in your enum) and map from Object
key intoEnum.ordinal()
value. You will need a decoding method for getters in order to convert byte
into an enum
. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per enum
, if there are less than 128, 64, 32, etc elements in the enum
, but it may make your code more complicated for a very little memory gain).
With all this knowledge at hand, you may now realize that EnumSet
is implemented similarly toBitSet
. There are 2 EnumSet
implementations in JDK: RegularEnumSet
and JumboEnumSet
. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.
RegularEnumSet
utilizes the knowledge of the fact that there are less or equal to 64 values in theenum
. It allows RegularEnumSet
to use a single long
to store all “enum present in the set” flags, rather than using long[]
(utilized by JumboEnumSet
or BitSet
). Using a single long
instead of long[1]
allows to save 4 bytes on long[]
reference and 16 bytes on long
value (long
occupies 8 bytes,long[1]
needs 12 bytes for header, 8 bytes for a single long
and 4 bytes for alignment).
BitSet
As you have noticed from a previous section, a BitSet
is built on top of the long[]
. The minor disadvantage of a BitSet
is that its minimal allowed value is fixed to zero. Thus, if you need aBitSet
starting at some offset, you will have to manually subtract your offset from all your values in BitSet
methods. Besides that, a BitSet
is limited to int
values. It also means a need of some sort of envelope for storing real long
values in it. The LongBitSet
class described in my Bit sets article may give you some useful hints.
Dense set of integers as a BitSet
Details of EnumSet
implementation may give you a hint on efficient storage of a dense set of integer values: find out the minimal allowed value in your set (call it minVal
) and then use aBitSet
for integer values storage: call BitSet.set( value - minVal )
to store your value andBitSet.get( value - minVal )
to check if a value is present in the set. The dense set is a set with more than 1 value per 64 entries.
ArrayList
ArrayList
is one of the most efficient storage structures in JDK: it has an Object[]
for storage plus an int
field for tracking the list size (which is different from the array/list capacity =array.length
). As long as you keep the actual objects in the ArrayList
, it offers you near-optimal storage. All you need to remember about ArrayList
is that it is better not to allocate a largeArrayList
in advance if you are not sure that its capacity will be fully utilized. This is especially important for collections of such arrays (or collections of objects having array fields).
Another important property of ArrayList
to remember is that there is only one ArrayList
method which shrinks its capacity – ArrayList.trimToSize
. clear/remove*
methods decreaseArrayList
size instead of capacity. So, if you use an ArrayList
as a buffer of some sort and you expect some high peaks in your data, but rather low average buffer size, then it may make sense for you to trim your ArrayList
“manually” after that.
If you have a list of numeric wrappers, you’d better use Trove T[Type]ArrayList
, which allows you to store (and most important – keep) primitive values in those collections.
LinkedList
LinkedList
is the only JDK collection which implements the two following interfaces: List
andDeque
, which makes it useful for iteration operations on queue/dequeue/stack structures.
Due to being a deque, each LinkedList
node contains references to the previous and next elements as well as a reference to the data value. Such node occupies 24 bytes (12 bytes header + 3*4 bytes for references), which is 6 times more than ArrayList
in terms of per-node overhead. This structure is also CPU cache-unfriendly due to its spatial non-locality.
ArrayDeque
If all you need is a FIFO/LIFO buffer and you don’t need to access anything except its head or tail, you’d better prefer ArrayDeque
to LinkedList
.
ArrayDeque
is based on an Object[]
, which length is a power of 2. Besides the array, there are int
head and tail pointers – they allow the array to wrap around its edges if required. Because ofArrayDeque
internal memory management, its storage array length is always equal to the size ofArrayDeque
rounded to the next power of 2. It means that you spend less than 8 bytes per value in the worst case and 4 bytes per value in the best case.
Summary
The following table summarizes the storage occupied per stored value assuming that a reference occupies 4 bytes. Note that you must spend 4 byte per Object
reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead.
EnumSet , BitSet | 1 bit per value |
EnumMap | 4 bytes (for value, nothing for key) |
ArrayList | 4 bytes (but may be more if ArrayList capacity is seriously more than its size) |
LinkedList | 24 bytes (fixed) |
ArrayDeque | 4 to 8 bytes, 6 bytes on average |
See also
- An overview of memory saving techniques in Java
- Memory consumption of popular Java data types – part 2
- A few more memory saving techniques in Java
- String.intern in Java 6, 7 and 8 – string pooling
- Java collections overview
- Bit sets
- java.util.ArrayList performance
- java.util.LinkedList performance
Recommended reading
If you want to better understand computer memory system and its implications on your software, I’d recommend you to read Memory Systems: Cache, DRAM, Disk by Bruce Jacob et al.