◉ General-purpose implementations are the most commonly used implementations, designed for everyday use. They are summarized in the table titled General-purpose-implementations.
◉ Special-purpose implementations are designed for use in special situations and display nonstandard performance characteristics, usage restrictions, or behavior.
◉ Concurrent implementations are designed to support high concurrency, typically at the expense of single-threaded performance. These implementations are part of the java.util.concurrent package.
◉ Wrapper implementations are used in combination with other types of implementations, often the general-purpose ones, to provide added or restricted functionality.
◉ Convenience implementations are mini-implementations, typically made available via static factory methods, that provide convenient, efficient alternatives to general-purpose implementations for special collections (for example, singleton sets).
◉ Abstract implementations are skeletal implementations that facilitate the construction of custom implementations — described later in the Custom Collection Implementations section. An advanced topic, it's not particularly difficult, but relatively few people will need to do it.
The general-purpose implementations are summarized in the following table.
General-purpose Implementations
Interfaces | Hash table Implementations | Resizable array Implementations | Tree Implementations | Linked list Implementations | Hash table + Linked list Implementations |
Set | HashSet | TreeSet | LinkedHashSet | ||
List | ArrayList | LinkedList | |||
Queue | |||||
Deque | ArrayDeque | LinkedList | |||
Map | HashMap | TreeMap | LinkedHashMap |
As you can see from the table, the Java Collections Framework provides several general-purpose implementations of the Set, List , and Map interfaces. In each case, one implementation — HashSet, ArrayList, and HashMap — is clearly the one to use for most applications, all other things being equal. Note that the SortedSet and the SortedMap interfaces do not have rows in the table. Each of those interfaces has one implementation (TreeSet and TreeMap) and is listed in the Set and the Map rows. There are two general-purpose Queue implementations — LinkedList, which is also a List implementation, and PriorityQueue, which is omitted from the table. These two implementations provide very different semantics: LinkedList provides FIFO semantics, while PriorityQueue orders its elements according to their values.
Each of the general-purpose implementations provides all optional operations contained in its interface. All permit null elements, keys, and values. None are synchronized (thread-safe). All have fail-fast iterators, which detect illegal concurrent modification during iteration and fail quickly and cleanly rather than risking arbitrary, nondeterministic behavior at an undetermined time in the future. All are Serializable and all support a public clone method.
The fact that these implementations are unsynchronized represents a break with the past: The legacy collections Vector and Hashtable are synchronized. The present approach was taken because collections are frequently used when the synchronization is of no benefit. Such uses include single-threaded use, read-only use, and use as part of a larger data object that does its own synchronization. In general, it is good API design practice not to make users pay for a feature they don't use. Furthermore, unnecessary synchronization can result in deadlock under certain circumstances.
If you need thread-safe collections, the synchronization wrappers, described in the Wrapper Implementations section, allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for general-purpose implementations, whereas it is mandatory for legacy implementations. Moreover, the java.util.concurrent package provides concurrent implementations of the BlockingQueue interface, which extends Queue, and of the ConcurrentMap interface, which extends Map. These implementations offer much higher concurrency than mere synchronized implementations.
As a rule, you should be thinking about the interfaces, not the implementations. That is why there are no programming examples in this section. For the most part, the choice of implementation affects only performance. The preferred style, as mentioned in the Interfaces section, is to choose an implementation when a Collection is created and to immediately assign the new collection to a variable of the corresponding interface type (or to pass the collection to a method expecting an argument of the interface type). In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer free to change implementations anytime that it is warranted by performance concerns or behavioral details.
The sections that follow briefly discuss the implementations. The performance of the implementations is described using words such as constant-time, log, linear, n log(n), and quadratic to refer to the asymptotic upper-bound on the time complexity of performing the operation. All this is quite a mouthful, and it doesn't matter much if you don't know what it means. If you're interested in knowing more, refer to any good algorithms textbook. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes, the nominally slower implementation may be faster. When in doubt, measure the performance!
Set Implementations
The Set implementations are grouped into general-purpose and special-purpose implementations.
General-Purpose Set Implementations
There are three general-purpose Set implementations — HashSet, TreeSet, and LinkedHashSet. Which of these three to use is generally straightforward. HashSet is much faster than TreeSet (constant-time versus log-time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet interface, or if value-ordered iteration is required, use TreeSet; otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.
LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, it provides insertion-ordered iteration (least recently inserted to most recently) and runs nearly as fast as HashSet. The LinkedHashSet implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet without incurring the increased cost associated with TreeSet.
One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, choosing an initial capacity that's too high can waste both space and time. On the other hand, choosing an initial capacity that's too low wastes time by copying the data structure each time it's forced to increase its capacity. If you don't specify an initial capacity, the default is 16. In the past, there was some advantage to choosing a prime number as the initial capacity. This is no longer true. Internally, the capacity is always rounded up to a power of two. The initial capacity is specified by using the int constructor. The following line of code allocates a HashSet whose initial capacity is 64.
Set<String> s = new HashSet<String>(64);
The HashSet class has one other tuning parameter called the load factor. If you care a lot about the space consumption of your HashSet, read the HashSet documentation for more information. Otherwise, just accept the default; it's almost always the right thing to do.
If you accept the default load factor but want to specify an initial capacity, pick a number that's about twice the size to which you expect the set to grow. If your guess is way off, you may waste a bit of space, time, or both, but it's unlikely to be a big problem.
LinkedHashSet has the same tuning parameters as HashSet, but iteration time is not affected by capacity. TreeSet has no tuning parameters.
Special-Purpose Set Implementations
There are two special-purpose Set implementations — EnumSet and CopyOnWriteArraySet.
EnumSet is a high-performance Set implementation for enum types. All of the members of an enum set must be of the same enum type. Internally, it is represented by a bit-vector, typically a single long. Enum sets support iteration over ranges of enum types. For example, given the enum declaration for the days of the week, you can iterate over the weekdays. The EnumSet class provides a static factory that makes it easy.
for (Day d : EnumSet.range(Day.MONDAY, Day.FRIDAY))
System.out.println(d);
Enum sets also provide a rich, typesafe replacement for traditional bit flags.
EnumSet.of(Style.BOLD, Style.ITALIC)
CopyOnWriteArraySet is a Set implementation backed up by a copy-on-write array. All mutative operations, such as add, set, and remove, are implemented by making a new copy of the array; no locking is ever required. Even iteration may safely proceed concurrently with element insertion and deletion. Unlike most Set implementations, the add, remove, and contains methods require time proportional to the size of the set. This implementation is only appropriate for sets that are rarely modified but frequently iterated. It is well suited to maintaining event-handler lists that must prevent duplicates.
List Implementations
List implementations are grouped into general-purpose and special-purpose implementations.
General-Purpose List Implementations
There are two general-purpose List implementations — ArrayList and LinkedList. Most of the time, you'll probably use ArrayList, which offers constant-time positional access and is just plain fast. It does not have to allocate a node object for each element in the List, and it can take advantage of System.arraycopy when it has to move multiple elements at the same time. Think of ArrayList as Vector without the synchronization overhead.
If you frequently add elements to the beginning of the List or iterate over the List to delete elements from its interior, you should consider using LinkedList. These operations require constant-time in a LinkedList and linear-time in an ArrayList. But you pay a big price in performance. Positional access requires linear-time in a LinkedList and constant-time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think you want to use a LinkedList, measure the performance of your application with both LinkedList and ArrayList before making your choice; ArrayList is usually faster.
ArrayList has one tuning parameter — the initial capacity, which refers to the number of elements the ArrayList can hold before it has to grow. LinkedList has no tuning parameters and seven optional operations, one of which is clone. The other six are addFirst, getFirst, removeFirst, addLast, getLast, and removeLast. LinkedList also implements the Queue interface.
Special-Purpose List Implementations
CopyOnWriteArrayList is a List implementation backed up by a copy-on-write array. This implementation is similar in nature to CopyOnWriteArraySet. No synchronization is necessary, even during iteration, and iterators are guaranteed never to throw ConcurrentModificationException. This implementation is well suited to maintaining event-handler lists, in which change is infrequent, and traversal is frequent and potentially time-consuming.
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList. But Vector has loads of legacy operations, so be careful to always manipulate the Vector with the List interface or else you won't be able to replace the implementation at a later time.
If your List is fixed in size — that is, you'll never use remove, add, or any of the bulk operations other than containsAll — you have a third option that's definitely worth considering.
Map Implementations
Map implementations are grouped into general-purpose, special-purpose, and concurrent implementations.
General-Purpose Map Implementations
The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap. If you need SortedMap operations or key-ordered Collection-view iteration, use TreeMap; if you want maximum speed and don't care about iteration order, use HashMap; if you want near-HashMap performance and insertion-order iteration, use LinkedHashMap. In this respect, the situation for Map is analogous to Set.
LinkedHashMap provides two capabilities that are not available with LinkedHashSet. When you create a LinkedHashMap, you can order it based on key access rather than insertion. In other words, merely looking up the value associated with a key brings that key to the end of the map. Also, LinkedHashMap provides the removeEldestEntry method, which may be overridden to impose a policy for removing stale mappings automatically when new mappings are added to the map. This makes it very easy to implement a custom cache.
For example, this override will allow the map to grow up to as many as 100 entries and then it will delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Special-Purpose Map Implementations
There are three special-purpose Map implementations — EnumMap, WeakHashMap and IdentityHashMap. EnumMap, which is internally implemented as an array, is a high-performance Map implementation for use with enum keys. This implementation combines the richness and safety of the Map interface with a speed approaching that of an array. If you want to map an enum to a value, you should always use an EnumMap in preference to an array.
WeakHashMap is an implementation of the Map interface that stores only weak references to its keys. Storing only weak references allows a key-value pair to be garbage-collected when its key is no longer referenced outside of the WeakHashMap. This class provides the easiest way to harness the power of weak references. It is useful for implementing "registry-like" data structures, where the utility of an entry vanishes when its key is no longer reachable by any thread.
IdentityHashMap is an identity-based Map implementation based on a hash table. This class is useful for topology-preserving object graph transformations, such as serialization or deep-copying. To perform such transformations, you need to maintain an identity-based "node table" that keeps track of which objects have already been seen. Identity-based maps are also used to maintain object-to-meta-information mappings in dynamic debuggers and similar systems. Finally, identity-based maps are useful in thwarting "spoof attacks" that are a result of intentionally perverse equals methods because IdentityHashMap never invokes the equals method on its keys. An added benefit of this implementation is that it is fast.
Concurrent Map Implementations
The java.util.concurrent package contains the ConcurrentMap interface, which extends Map with atomic putIfAbsent, remove, and replace methods, and the ConcurrentHashMap implementation of that interface.
ConcurrentHashMap is a highly concurrent, high-performance implementation backed up by a hash table. This implementation never blocks when performing retrievals and allows the client to select the concurrency level for updates. It is intended as a drop-in replacement for Hashtable: in addition to implementing ConcurrentMap, it supports all the legacy methods peculiar to Hashtable. Again, if you don't need the legacy operations, be careful to manipulate it with the ConcurrentMap interface.
Queue Implementations
The Queue implementations are grouped into general-purpose and concurrent implementations.
General-Purpose Queue Implementations
As mentioned in the previous section, LinkedList implements the Queue interface, providing first in, first out (FIFO) queue operations for add, poll, and so on.
The PriorityQueue class is a priority queue based on the heap data structure. This queue orders elements according to the order specified at construction time, which can be the elements' natural ordering or the ordering imposed by an explicit Comparator.
The queue retrieval operations — poll, remove, peek, and element — access the element at the head of the queue. The head of the queue is the least element with respect to the specified ordering. If multiple elements are tied for least value, the head is one of those elements; ties are broken arbitrarily.
PriorityQueue and its iterator implement all of the optional methods of the Collection and Iterator interfaces. The iterator provided in method iterator is not guaranteed to traverse the elements of the PriorityQueue in any particular order. For ordered traversal, consider using Arrays.sort(pq.toArray()).
Concurrent Queue Implementations
The java.util.concurrent package contains a set of synchronized Queue interfaces and classes. BlockingQueue extends Queue with operations that wait for the queue to become nonempty when retrieving an element and for space to become available in the queue when storing an element. This interface is implemented by the following classes:
◉ LinkedBlockingQueue — an optionally bounded FIFO blocking queue backed by linked nodes
◉ ArrayBlockingQueue — a bounded FIFO blocking queue backed by an array
◉ PriorityBlockingQueue — an unbounded blocking priority queue backed by a heap
◉ DelayQueue — a time-based scheduling queue backed by a heap
◉ SynchronousQueue — a simple rendezvous mechanism that uses the BlockingQueue interface
In JDK 7, TransferQueue is a specialized BlockingQueue in which code that adds an element to the queue has the option of waiting (blocking) for code in another thread to retrieve the element. TransferQueue has a single implementation:
◉ LinkedTransferQueue — an unbounded TransferQueue based on linked nodes
Deque Implementations
The Deque interface, pronounced as "deck", represents a double-ended queue. The Deque interface can be implemented as various types of Collections. The Deque interface implementations are grouped into general-purpose and concurrent implementations.
General-Purpose Deque Implementations
The general-purpose implementations include LinkedList and ArrayDeque classes. The Deque interface supports insertion, removal and retrieval of elements at both ends. The ArrayDeque class is the resizable array implementation of the Deque interface, whereas the LinkedList class is the list implementation.
The basic insertion, removal and retieval operations in the Deque interface addFirst, addLast, removeFirst, removeLast, getFirst and getLast. The method addFirst adds an element at the head whereas addLast adds an element at the tail of the Deque instance.
The LinkedList implementation is more flexible than the ArrayDeque implementation. LinkedList implements all optional list operations. null elements are allowed in the LinkedList implementation but not in the ArrayDeque implementation.
In terms of efficiency, ArrayDeque is more efficient than the LinkedList for add and remove operation at both ends. The best operation in a LinkedList implementation is removing the current element during the iteration. LinkedList implementations are not ideal structures to iterate.
The LinkedList implementation consumes more memory than the ArrayDeque implementation. For the ArrayDeque instance traversal use any of the following:
foreach
The foreach is fast and can be used for all kinds of lists.
ArrayDeque<String> aDeque = new ArrayDeque<String>();
. . .
for (String str : aDeque) {
System.out.println(str);
}
Iterator
The Iterator can be used for the forward traversal on all kinds of lists for all kinds of data.
ArrayDeque<String> aDeque = new ArrayDeque<String>();
. . .
for (Iterator<String> iter = aDeque.iterator(); iter.hasNext(); ) {
System.out.println(iter.next());
}
The ArrayDeque class is used in this tutorial to implement the Deque interface. The complete code of the example used in this tutorial is available in ArrayDequeSample. Both the LinkedList and ArrayDeque classes do not support concurrent access by multiple threads.
Concurrent Deque Implementations
The LinkedBlockingDeque class is the concurrent implementation of the Deque interface. If the deque is empty then methods such as takeFirst and takeLast wait until the element becomes available, and then retrieves and removes the same element.
Wrapper Implementations
Wrapper implementations delegate all their real work to a specified collection but add extra functionality on top of what this collection offers. For design pattern fans, this is an example of the decorator pattern. Although it may seem a bit exotic, it's really pretty straightforward.
These implementations are anonymous; rather than providing a public class, the library provides a static factory method. All these implementations are found in the Collections class, which consists solely of static methods.
Synchronization Wrappers
The synchronization wrappers add automatic synchronization (thread-safety) to an arbitrary collection. Each of the six core collection interfaces — Collection, Set, List, Map, SortedSet, and SortedMap — has one static factory method.
public static <T> Collection<T> synchronizedCollection(Collection<T> c);
public static <T> Set<T> synchronizedSet(Set<T> s);
public static <T> List<T> synchronizedList(List<T> list);
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m);
public static <T> SortedSet<T> synchronizedSortedSet(SortedSet<T> s);
public static <K,V> SortedMap<K,V> synchronizedSortedMap(SortedMap<K,V> m);
Each of these methods returns a synchronized (thread-safe) Collection backed up by the specified collection. To guarantee serial access, all access to the backing collection must be accomplished through the returned collection. The easy way to guarantee this is not to keep a reference to the backing collection. Create the synchronized collection with the following trick.
List<Type> list = Collections.synchronizedList(new ArrayList<Type>());
A collection created in this fashion is every bit as thread-safe as a normally synchronized collection, such as a Vector.
In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation. The following is the idiom to iterate over a wrapper-synchronized collection.
Collection<Type> c = Collections.synchronizedCollection(myCollection);
synchronized(c) {
for (Type e : c)
foo(e);
}
If an explicit iterator is used, the iterator method must be called from within the synchronized block. Failure to follow this advice may result in nondeterministic behavior. The idiom for iterating over a Collection view of a synchronized Map is similar. It is imperative that the user synchronize on the synchronized Map when iterating over any of its Collection views rather than synchronizing on the Collection view itself, as shown in the following example.
Map<KeyType, ValType> m = Collections.synchronizedMap(new HashMap<KeyType, ValType>());
...
Set<KeyType> s = m.keySet();
...
// Synchronizing on m, not s!
synchronized(m) {
while (KeyType k : s)
foo(k);
}
One minor downside of using wrapper implementations is that you do not have the ability to execute any noninterface operations of a wrapped implementation. So, for instance, in the preceding List example, you cannot call ArrayList's ensureCapacity operation on the wrapped ArrayList.
Unmodifiable Wrappers
Unlike synchronization wrappers, which add functionality to the wrapped collection, the unmodifiable wrappers take functionality away. In particular, they take away the ability to modify the collection by intercepting all the operations that would modify the collection and throwing an UnsupportedOperationException. Unmodifiable wrappers have two main uses, as follows:
◉ To make a collection immutable once it has been built. In this case, it's good practice not to maintain a reference to the backing collection. This absolutely guarantees immutability.
◉ To allow certain clients read-only access to your data structures. You keep a reference to the backing collection but hand out a reference to the wrapper. In this way, clients can look but not modify, while you maintain full access.
Like synchronization wrappers, each of the six core Collection interfaces has one static factory method.
public static <T> Collection<T> unmodifiableCollection(Collection<? extends T> c);
public static <T> Set<T> unmodifiableSet(Set<? extends T> s);
public static <T> List<T> unmodifiableList(List<? extends T> list);
public static <K,V> Map<K, V> unmodifiableMap(Map<? extends K, ? extends V> m);
public static <T> SortedSet<T> unmodifiableSortedSet(SortedSet<? extends T> s);
public static <K,V> SortedMap<K, V> unmodifiableSortedMap(SortedMap<K, ? extends V> m);
Checked Interface Wrappers
The Collections.checked interface wrappers are provided for use with generic collections. These implementations return a dynamically type-safe view of the specified collection, which throws a ClassCastException if a client attempts to add an element of the wrong type. The generics mechanism in the language provides compile-time (static) type-checking, but it is possible to defeat this mechanism. Dynamically type-safe views eliminate this possibility entirely.
Convenience Implementations
This section describes several mini-implementations that can be more convenient and more efficient than general-purpose implementations when you don't need their full power. All the implementations in this section are made available via static factory methods rather than public classes.
List View of an Array
The Arrays.asList method returns a List view of its array argument. Changes to the List write through to the array and vice versa. The size of the collection is that of the array and cannot be changed. If the add or the remove method is called on the List, an UnsupportedOperationException will result.
The normal use of this implementation is as a bridge between array-based and collection-based APIs. It allows you to pass an array to a method expecting a Collection or a List. However, this implementation also has another use. If you need a fixed-size List, it's more efficient than any general-purpose List implementation. This is the idiom.
List<String> list = Arrays.asList(new String[size]);
Note that a reference to the backing array is not retained.
Immutable Multiple-Copy List
Occasionally you'll need an immutable List consisting of multiple copies of the same element. The Collections.nCopies method returns such a list. This implementation has two main uses. The first is to initialize a newly created List; for example, suppose you want an ArrayList initially consisting of 1,000 null elements. The following incantation does the trick.
List<Type> list = new ArrayList<Type>(Collections.nCopies(1000, (Type)null);
Of course, the initial value of each element need not be null. The second main use is to grow an existing List. For example, suppose you want to add 69 copies of the string "fruit bat" to the end of a List<String>. It's not clear why you'd want to do such a thing, but let's just suppose you did. The following is how you'd do it.
lovablePets.addAll(Collections.nCopies(69, "fruit bat"));
By using the form of addAll that takes both an index and a Collection, you can add the new elements to the middle of a List instead of to the end of it.
Immutable Singleton Set
Sometimes you'll need an immutable singleton Set, which consists of a single, specified element. The Collections.singleton method returns such a Set. One use of this implementation is to remove all occurrences of a specified element from a Collection.
c.removeAll(Collections.singleton(e));
A related idiom removes all elements that map to a specified value from a Map. For example, suppose you have a Map — job — that maps people to their line of work and suppose you want to eliminate all the lawyers. The following one-liner will do the deed.
job.values().removeAll(Collections.singleton(LAWYER));
One more use of this implementation is to provide a single input value to a method that is written to accept a collection of values.
Empty Set, List, and Map Constants
The Collections class provides methods to return the empty Set, List, and Map — emptySet, emptyList, and emptyMap. The main use of these constants is as input to methods that take a Collection of values when you don't want to provide any values at all, as in this example.
tourist.declarePurchases(Collections.emptySet());
Summary of Implementations
Implementations are the data objects used to store collections, which implement the interfaces described in the Interfaces lesson.
The Java Collections Framework provides several general-purpose implementations of the core interfaces:
◉ For the Set interface, HashSet is the most commonly used implementation.
◉ For the List interface, ArrayList is the most commonly used implementation.
◉ For the Map interface, HashMap is the most commonly used implementation.
◉ For the Queue interface, LinkedList is the most commonly used implementation.
◉ For the Deque interface, ArrayDeque is the most commonly used implementation.
Each of the general-purpose implementations provides all optional operations contained in its interface.
The Java Collections Framework also provides several special-purpose implementations for situations that require nonstandard performance, usage restrictions, or other unusual behavior.
The java.util.concurrent package contains several collections implementations, which are thread-safe but not governed by a single exclusion lock.
The Collections class (as opposed to the Collection interface), provides static methods that operate on or return collections, which are known as Wrapper implementations.
Finally, there are several Convenience implementations, which can be more efficient than general-purpose implementations when you don't need their full power. The Convenience implementations are made available through static factory methods.
Questions and Exercises: Implementations
Questions
1. You plan to write a program that uses several basic collection interfaces: Set, List, Queue, and Map. You're not sure which implementations will work best, so you decide to use general-purpose implementations until you get a better idea how your program will work in the real world. Which implementations are these?
2. If you need a Set implementation that provides value-ordered iteration, which class should you use?
3. Which class do you use to access wrapper implementations?
Exercises
1. Write a program that reads a text file, specified by the first command line argument, into a List. The program should then print random lines from the file, the number of lines printed to be specified by the second command line argument. Write the program so that a correctly-sized collection is allocated all at once, instead of being gradually expanded as the file is read in. Hint: To determine the number of lines in the file, use java.io.File.length to obtain the size of the file, then divide by an assumed size of an average line.
Answers to Questions and Exercises:
Questions
1. Question: You plan to write a program that uses several basic collection interfaces: Set, List, Queue, and Map. You're not sure which implementations will work best, so you decide to use general-purpose implementations until you get a better idea how your program will work in the real world. Which implementations are these?
Answer:
Set: HashSet
List: ArrayList
Queue: LinkedList
Map: HashMap
2. Question: If you need a Set implementation that provides value-ordered iteration, which class should you use?
Answer:
TreeSet guarantees that the sorted set is in ascending element order, sorted according to the natural order of the elements or by the Comparator provided.
3. Question: Which class do you use to access wrapper implementations?
Answer:
You use the Collections class, which provides static methods that operate on or return collections.
Exercises
1. Exercise: Write a program that reads a text file, specified by the first command line argument, into a List. The program should then print random lines from the file, the number of lines printed to be specified by the second command line argument. Write the program so that a correctly-sized collection is allocated all at once, instead of being gradually expanded as the file is read in. Hint: To determine the number of lines in the file, use java.io.File.length to obtain the size of the file, then divide by an assumed size of an average line.
Answer:
Since we are accessing the List randomly, we will use ArrayList. We estimate the number of lines by taking the file size and dividing by 50. We then double that figure, since it is more efficient to overestimate than to underestmate.
import java.util.*;
import java.io.*;
public class FileList {
public static void main(String[] args) {
final int assumedLineLength = 50;
File file = new File(args[0]);
List<String> fileList =
new ArrayList<String>((int)(file.length() / assumedLineLength) * 2);
BufferedReader reader = null;
int lineCount = 0;
try {
reader = new BufferedReader(new FileReader(file));
for (String line = reader.readLine(); line != null;
line = reader.readLine()) {
fileList.add(line);
lineCount++;
}
} catch (IOException e) {
System.err.format("Could not read %s: %s%n", file, e);
System.exit(1);
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {}
}
}
int repeats = Integer.parseInt(args[1]);
Random random = new Random();
for (int i = 0; i < repeats; i++) {
System.out.format("%d: %s%n", i,
fileList.get(random.nextInt(lineCount - 1)));
}
}
}
This program actually spends most of its time reading in the file, so pre-allocating the ArrayList has little affect on its performance. Specifying an initial capacity in advance is more likely to be useful when your program repeatly creates large ArrayList objects without intervening I/O.
Set Implementations
The Set implementations are grouped into general-purpose and special-purpose implementations.
General-Purpose Set Implementations
There are three general-purpose Set implementations — HashSet, TreeSet, and LinkedHashSet. Which of these three to use is generally straightforward. HashSet is much faster than TreeSet (constant-time versus log-time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet interface, or if value-ordered iteration is required, use TreeSet; otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.
LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, it provides insertion-ordered iteration (least recently inserted to most recently) and runs nearly as fast as HashSet. The LinkedHashSet implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet without incurring the increased cost associated with TreeSet.
One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, choosing an initial capacity that's too high can waste both space and time. On the other hand, choosing an initial capacity that's too low wastes time by copying the data structure each time it's forced to increase its capacity. If you don't specify an initial capacity, the default is 16. In the past, there was some advantage to choosing a prime number as the initial capacity. This is no longer true. Internally, the capacity is always rounded up to a power of two. The initial capacity is specified by using the int constructor. The following line of code allocates a HashSet whose initial capacity is 64.
Set<String> s = new HashSet<String>(64);
The HashSet class has one other tuning parameter called the load factor. If you care a lot about the space consumption of your HashSet, read the HashSet documentation for more information. Otherwise, just accept the default; it's almost always the right thing to do.
If you accept the default load factor but want to specify an initial capacity, pick a number that's about twice the size to which you expect the set to grow. If your guess is way off, you may waste a bit of space, time, or both, but it's unlikely to be a big problem.
LinkedHashSet has the same tuning parameters as HashSet, but iteration time is not affected by capacity. TreeSet has no tuning parameters.
Special-Purpose Set Implementations
There are two special-purpose Set implementations — EnumSet and CopyOnWriteArraySet.
EnumSet is a high-performance Set implementation for enum types. All of the members of an enum set must be of the same enum type. Internally, it is represented by a bit-vector, typically a single long. Enum sets support iteration over ranges of enum types. For example, given the enum declaration for the days of the week, you can iterate over the weekdays. The EnumSet class provides a static factory that makes it easy.
for (Day d : EnumSet.range(Day.MONDAY, Day.FRIDAY))
System.out.println(d);
Enum sets also provide a rich, typesafe replacement for traditional bit flags.
EnumSet.of(Style.BOLD, Style.ITALIC)
CopyOnWriteArraySet is a Set implementation backed up by a copy-on-write array. All mutative operations, such as add, set, and remove, are implemented by making a new copy of the array; no locking is ever required. Even iteration may safely proceed concurrently with element insertion and deletion. Unlike most Set implementations, the add, remove, and contains methods require time proportional to the size of the set. This implementation is only appropriate for sets that are rarely modified but frequently iterated. It is well suited to maintaining event-handler lists that must prevent duplicates.
List Implementations
List implementations are grouped into general-purpose and special-purpose implementations.
General-Purpose List Implementations
There are two general-purpose List implementations — ArrayList and LinkedList. Most of the time, you'll probably use ArrayList, which offers constant-time positional access and is just plain fast. It does not have to allocate a node object for each element in the List, and it can take advantage of System.arraycopy when it has to move multiple elements at the same time. Think of ArrayList as Vector without the synchronization overhead.
If you frequently add elements to the beginning of the List or iterate over the List to delete elements from its interior, you should consider using LinkedList. These operations require constant-time in a LinkedList and linear-time in an ArrayList. But you pay a big price in performance. Positional access requires linear-time in a LinkedList and constant-time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think you want to use a LinkedList, measure the performance of your application with both LinkedList and ArrayList before making your choice; ArrayList is usually faster.
ArrayList has one tuning parameter — the initial capacity, which refers to the number of elements the ArrayList can hold before it has to grow. LinkedList has no tuning parameters and seven optional operations, one of which is clone. The other six are addFirst, getFirst, removeFirst, addLast, getLast, and removeLast. LinkedList also implements the Queue interface.
Special-Purpose List Implementations
CopyOnWriteArrayList is a List implementation backed up by a copy-on-write array. This implementation is similar in nature to CopyOnWriteArraySet. No synchronization is necessary, even during iteration, and iterators are guaranteed never to throw ConcurrentModificationException. This implementation is well suited to maintaining event-handler lists, in which change is infrequent, and traversal is frequent and potentially time-consuming.
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList. But Vector has loads of legacy operations, so be careful to always manipulate the Vector with the List interface or else you won't be able to replace the implementation at a later time.
If your List is fixed in size — that is, you'll never use remove, add, or any of the bulk operations other than containsAll — you have a third option that's definitely worth considering.
Map Implementations
Map implementations are grouped into general-purpose, special-purpose, and concurrent implementations.
General-Purpose Map Implementations
The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap. If you need SortedMap operations or key-ordered Collection-view iteration, use TreeMap; if you want maximum speed and don't care about iteration order, use HashMap; if you want near-HashMap performance and insertion-order iteration, use LinkedHashMap. In this respect, the situation for Map is analogous to Set.
LinkedHashMap provides two capabilities that are not available with LinkedHashSet. When you create a LinkedHashMap, you can order it based on key access rather than insertion. In other words, merely looking up the value associated with a key brings that key to the end of the map. Also, LinkedHashMap provides the removeEldestEntry method, which may be overridden to impose a policy for removing stale mappings automatically when new mappings are added to the map. This makes it very easy to implement a custom cache.
For example, this override will allow the map to grow up to as many as 100 entries and then it will delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Special-Purpose Map Implementations
There are three special-purpose Map implementations — EnumMap, WeakHashMap and IdentityHashMap. EnumMap, which is internally implemented as an array, is a high-performance Map implementation for use with enum keys. This implementation combines the richness and safety of the Map interface with a speed approaching that of an array. If you want to map an enum to a value, you should always use an EnumMap in preference to an array.
WeakHashMap is an implementation of the Map interface that stores only weak references to its keys. Storing only weak references allows a key-value pair to be garbage-collected when its key is no longer referenced outside of the WeakHashMap. This class provides the easiest way to harness the power of weak references. It is useful for implementing "registry-like" data structures, where the utility of an entry vanishes when its key is no longer reachable by any thread.
IdentityHashMap is an identity-based Map implementation based on a hash table. This class is useful for topology-preserving object graph transformations, such as serialization or deep-copying. To perform such transformations, you need to maintain an identity-based "node table" that keeps track of which objects have already been seen. Identity-based maps are also used to maintain object-to-meta-information mappings in dynamic debuggers and similar systems. Finally, identity-based maps are useful in thwarting "spoof attacks" that are a result of intentionally perverse equals methods because IdentityHashMap never invokes the equals method on its keys. An added benefit of this implementation is that it is fast.
Concurrent Map Implementations
The java.util.concurrent package contains the ConcurrentMap interface, which extends Map with atomic putIfAbsent, remove, and replace methods, and the ConcurrentHashMap implementation of that interface.
ConcurrentHashMap is a highly concurrent, high-performance implementation backed up by a hash table. This implementation never blocks when performing retrievals and allows the client to select the concurrency level for updates. It is intended as a drop-in replacement for Hashtable: in addition to implementing ConcurrentMap, it supports all the legacy methods peculiar to Hashtable. Again, if you don't need the legacy operations, be careful to manipulate it with the ConcurrentMap interface.
Queue Implementations
The Queue implementations are grouped into general-purpose and concurrent implementations.
General-Purpose Queue Implementations
As mentioned in the previous section, LinkedList implements the Queue interface, providing first in, first out (FIFO) queue operations for add, poll, and so on.
The PriorityQueue class is a priority queue based on the heap data structure. This queue orders elements according to the order specified at construction time, which can be the elements' natural ordering or the ordering imposed by an explicit Comparator.
The queue retrieval operations — poll, remove, peek, and element — access the element at the head of the queue. The head of the queue is the least element with respect to the specified ordering. If multiple elements are tied for least value, the head is one of those elements; ties are broken arbitrarily.
PriorityQueue and its iterator implement all of the optional methods of the Collection and Iterator interfaces. The iterator provided in method iterator is not guaranteed to traverse the elements of the PriorityQueue in any particular order. For ordered traversal, consider using Arrays.sort(pq.toArray()).
Concurrent Queue Implementations
The java.util.concurrent package contains a set of synchronized Queue interfaces and classes. BlockingQueue extends Queue with operations that wait for the queue to become nonempty when retrieving an element and for space to become available in the queue when storing an element. This interface is implemented by the following classes:
◉ LinkedBlockingQueue — an optionally bounded FIFO blocking queue backed by linked nodes
◉ ArrayBlockingQueue — a bounded FIFO blocking queue backed by an array
◉ PriorityBlockingQueue — an unbounded blocking priority queue backed by a heap
◉ DelayQueue — a time-based scheduling queue backed by a heap
◉ SynchronousQueue — a simple rendezvous mechanism that uses the BlockingQueue interface
In JDK 7, TransferQueue is a specialized BlockingQueue in which code that adds an element to the queue has the option of waiting (blocking) for code in another thread to retrieve the element. TransferQueue has a single implementation:
◉ LinkedTransferQueue — an unbounded TransferQueue based on linked nodes
Deque Implementations
The Deque interface, pronounced as "deck", represents a double-ended queue. The Deque interface can be implemented as various types of Collections. The Deque interface implementations are grouped into general-purpose and concurrent implementations.
General-Purpose Deque Implementations
The general-purpose implementations include LinkedList and ArrayDeque classes. The Deque interface supports insertion, removal and retrieval of elements at both ends. The ArrayDeque class is the resizable array implementation of the Deque interface, whereas the LinkedList class is the list implementation.
The basic insertion, removal and retieval operations in the Deque interface addFirst, addLast, removeFirst, removeLast, getFirst and getLast. The method addFirst adds an element at the head whereas addLast adds an element at the tail of the Deque instance.
The LinkedList implementation is more flexible than the ArrayDeque implementation. LinkedList implements all optional list operations. null elements are allowed in the LinkedList implementation but not in the ArrayDeque implementation.
In terms of efficiency, ArrayDeque is more efficient than the LinkedList for add and remove operation at both ends. The best operation in a LinkedList implementation is removing the current element during the iteration. LinkedList implementations are not ideal structures to iterate.
The LinkedList implementation consumes more memory than the ArrayDeque implementation. For the ArrayDeque instance traversal use any of the following:
foreach
The foreach is fast and can be used for all kinds of lists.
ArrayDeque<String> aDeque = new ArrayDeque<String>();
. . .
for (String str : aDeque) {
System.out.println(str);
}
Iterator
The Iterator can be used for the forward traversal on all kinds of lists for all kinds of data.
ArrayDeque<String> aDeque = new ArrayDeque<String>();
. . .
for (Iterator<String> iter = aDeque.iterator(); iter.hasNext(); ) {
System.out.println(iter.next());
}
The ArrayDeque class is used in this tutorial to implement the Deque interface. The complete code of the example used in this tutorial is available in ArrayDequeSample. Both the LinkedList and ArrayDeque classes do not support concurrent access by multiple threads.
Concurrent Deque Implementations
The LinkedBlockingDeque class is the concurrent implementation of the Deque interface. If the deque is empty then methods such as takeFirst and takeLast wait until the element becomes available, and then retrieves and removes the same element.
Wrapper Implementations
Wrapper implementations delegate all their real work to a specified collection but add extra functionality on top of what this collection offers. For design pattern fans, this is an example of the decorator pattern. Although it may seem a bit exotic, it's really pretty straightforward.
These implementations are anonymous; rather than providing a public class, the library provides a static factory method. All these implementations are found in the Collections class, which consists solely of static methods.
Synchronization Wrappers
The synchronization wrappers add automatic synchronization (thread-safety) to an arbitrary collection. Each of the six core collection interfaces — Collection, Set, List, Map, SortedSet, and SortedMap — has one static factory method.
public static <T> Collection<T> synchronizedCollection(Collection<T> c);
public static <T> Set<T> synchronizedSet(Set<T> s);
public static <T> List<T> synchronizedList(List<T> list);
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m);
public static <T> SortedSet<T> synchronizedSortedSet(SortedSet<T> s);
public static <K,V> SortedMap<K,V> synchronizedSortedMap(SortedMap<K,V> m);
Each of these methods returns a synchronized (thread-safe) Collection backed up by the specified collection. To guarantee serial access, all access to the backing collection must be accomplished through the returned collection. The easy way to guarantee this is not to keep a reference to the backing collection. Create the synchronized collection with the following trick.
List<Type> list = Collections.synchronizedList(new ArrayList<Type>());
A collection created in this fashion is every bit as thread-safe as a normally synchronized collection, such as a Vector.
In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation. The following is the idiom to iterate over a wrapper-synchronized collection.
Collection<Type> c = Collections.synchronizedCollection(myCollection);
synchronized(c) {
for (Type e : c)
foo(e);
}
If an explicit iterator is used, the iterator method must be called from within the synchronized block. Failure to follow this advice may result in nondeterministic behavior. The idiom for iterating over a Collection view of a synchronized Map is similar. It is imperative that the user synchronize on the synchronized Map when iterating over any of its Collection views rather than synchronizing on the Collection view itself, as shown in the following example.
Map<KeyType, ValType> m = Collections.synchronizedMap(new HashMap<KeyType, ValType>());
...
Set<KeyType> s = m.keySet();
...
// Synchronizing on m, not s!
synchronized(m) {
while (KeyType k : s)
foo(k);
}
One minor downside of using wrapper implementations is that you do not have the ability to execute any noninterface operations of a wrapped implementation. So, for instance, in the preceding List example, you cannot call ArrayList's ensureCapacity operation on the wrapped ArrayList.
Unmodifiable Wrappers
Unlike synchronization wrappers, which add functionality to the wrapped collection, the unmodifiable wrappers take functionality away. In particular, they take away the ability to modify the collection by intercepting all the operations that would modify the collection and throwing an UnsupportedOperationException. Unmodifiable wrappers have two main uses, as follows:
◉ To make a collection immutable once it has been built. In this case, it's good practice not to maintain a reference to the backing collection. This absolutely guarantees immutability.
◉ To allow certain clients read-only access to your data structures. You keep a reference to the backing collection but hand out a reference to the wrapper. In this way, clients can look but not modify, while you maintain full access.
Like synchronization wrappers, each of the six core Collection interfaces has one static factory method.
public static <T> Collection<T> unmodifiableCollection(Collection<? extends T> c);
public static <T> Set<T> unmodifiableSet(Set<? extends T> s);
public static <T> List<T> unmodifiableList(List<? extends T> list);
public static <K,V> Map<K, V> unmodifiableMap(Map<? extends K, ? extends V> m);
public static <T> SortedSet<T> unmodifiableSortedSet(SortedSet<? extends T> s);
public static <K,V> SortedMap<K, V> unmodifiableSortedMap(SortedMap<K, ? extends V> m);
Checked Interface Wrappers
The Collections.checked interface wrappers are provided for use with generic collections. These implementations return a dynamically type-safe view of the specified collection, which throws a ClassCastException if a client attempts to add an element of the wrong type. The generics mechanism in the language provides compile-time (static) type-checking, but it is possible to defeat this mechanism. Dynamically type-safe views eliminate this possibility entirely.
Convenience Implementations
This section describes several mini-implementations that can be more convenient and more efficient than general-purpose implementations when you don't need their full power. All the implementations in this section are made available via static factory methods rather than public classes.
List View of an Array
The Arrays.asList method returns a List view of its array argument. Changes to the List write through to the array and vice versa. The size of the collection is that of the array and cannot be changed. If the add or the remove method is called on the List, an UnsupportedOperationException will result.
The normal use of this implementation is as a bridge between array-based and collection-based APIs. It allows you to pass an array to a method expecting a Collection or a List. However, this implementation also has another use. If you need a fixed-size List, it's more efficient than any general-purpose List implementation. This is the idiom.
List<String> list = Arrays.asList(new String[size]);
Note that a reference to the backing array is not retained.
Immutable Multiple-Copy List
Occasionally you'll need an immutable List consisting of multiple copies of the same element. The Collections.nCopies method returns such a list. This implementation has two main uses. The first is to initialize a newly created List; for example, suppose you want an ArrayList initially consisting of 1,000 null elements. The following incantation does the trick.
List<Type> list = new ArrayList<Type>(Collections.nCopies(1000, (Type)null);
Of course, the initial value of each element need not be null. The second main use is to grow an existing List. For example, suppose you want to add 69 copies of the string "fruit bat" to the end of a List<String>. It's not clear why you'd want to do such a thing, but let's just suppose you did. The following is how you'd do it.
lovablePets.addAll(Collections.nCopies(69, "fruit bat"));
By using the form of addAll that takes both an index and a Collection, you can add the new elements to the middle of a List instead of to the end of it.
Immutable Singleton Set
Sometimes you'll need an immutable singleton Set, which consists of a single, specified element. The Collections.singleton method returns such a Set. One use of this implementation is to remove all occurrences of a specified element from a Collection.
c.removeAll(Collections.singleton(e));
A related idiom removes all elements that map to a specified value from a Map. For example, suppose you have a Map — job — that maps people to their line of work and suppose you want to eliminate all the lawyers. The following one-liner will do the deed.
job.values().removeAll(Collections.singleton(LAWYER));
One more use of this implementation is to provide a single input value to a method that is written to accept a collection of values.
Empty Set, List, and Map Constants
The Collections class provides methods to return the empty Set, List, and Map — emptySet, emptyList, and emptyMap. The main use of these constants is as input to methods that take a Collection of values when you don't want to provide any values at all, as in this example.
tourist.declarePurchases(Collections.emptySet());
Summary of Implementations
Implementations are the data objects used to store collections, which implement the interfaces described in the Interfaces lesson.
The Java Collections Framework provides several general-purpose implementations of the core interfaces:
◉ For the Set interface, HashSet is the most commonly used implementation.
◉ For the List interface, ArrayList is the most commonly used implementation.
◉ For the Map interface, HashMap is the most commonly used implementation.
◉ For the Queue interface, LinkedList is the most commonly used implementation.
◉ For the Deque interface, ArrayDeque is the most commonly used implementation.
Each of the general-purpose implementations provides all optional operations contained in its interface.
The Java Collections Framework also provides several special-purpose implementations for situations that require nonstandard performance, usage restrictions, or other unusual behavior.
The java.util.concurrent package contains several collections implementations, which are thread-safe but not governed by a single exclusion lock.
The Collections class (as opposed to the Collection interface), provides static methods that operate on or return collections, which are known as Wrapper implementations.
Finally, there are several Convenience implementations, which can be more efficient than general-purpose implementations when you don't need their full power. The Convenience implementations are made available through static factory methods.
Questions and Exercises: Implementations
Questions
1. You plan to write a program that uses several basic collection interfaces: Set, List, Queue, and Map. You're not sure which implementations will work best, so you decide to use general-purpose implementations until you get a better idea how your program will work in the real world. Which implementations are these?
2. If you need a Set implementation that provides value-ordered iteration, which class should you use?
3. Which class do you use to access wrapper implementations?
Exercises
1. Write a program that reads a text file, specified by the first command line argument, into a List. The program should then print random lines from the file, the number of lines printed to be specified by the second command line argument. Write the program so that a correctly-sized collection is allocated all at once, instead of being gradually expanded as the file is read in. Hint: To determine the number of lines in the file, use java.io.File.length to obtain the size of the file, then divide by an assumed size of an average line.
Answers to Questions and Exercises:
Questions
1. Question: You plan to write a program that uses several basic collection interfaces: Set, List, Queue, and Map. You're not sure which implementations will work best, so you decide to use general-purpose implementations until you get a better idea how your program will work in the real world. Which implementations are these?
Answer:
Set: HashSet
List: ArrayList
Queue: LinkedList
Map: HashMap
2. Question: If you need a Set implementation that provides value-ordered iteration, which class should you use?
Answer:
TreeSet guarantees that the sorted set is in ascending element order, sorted according to the natural order of the elements or by the Comparator provided.
3. Question: Which class do you use to access wrapper implementations?
Answer:
You use the Collections class, which provides static methods that operate on or return collections.
Exercises
1. Exercise: Write a program that reads a text file, specified by the first command line argument, into a List. The program should then print random lines from the file, the number of lines printed to be specified by the second command line argument. Write the program so that a correctly-sized collection is allocated all at once, instead of being gradually expanded as the file is read in. Hint: To determine the number of lines in the file, use java.io.File.length to obtain the size of the file, then divide by an assumed size of an average line.
Answer:
Since we are accessing the List randomly, we will use ArrayList. We estimate the number of lines by taking the file size and dividing by 50. We then double that figure, since it is more efficient to overestimate than to underestmate.
import java.util.*;
import java.io.*;
public class FileList {
public static void main(String[] args) {
final int assumedLineLength = 50;
File file = new File(args[0]);
List<String> fileList =
new ArrayList<String>((int)(file.length() / assumedLineLength) * 2);
BufferedReader reader = null;
int lineCount = 0;
try {
reader = new BufferedReader(new FileReader(file));
for (String line = reader.readLine(); line != null;
line = reader.readLine()) {
fileList.add(line);
lineCount++;
}
} catch (IOException e) {
System.err.format("Could not read %s: %s%n", file, e);
System.exit(1);
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {}
}
}
int repeats = Integer.parseInt(args[1]);
Random random = new Random();
for (int i = 0; i < repeats; i++) {
System.out.format("%d: %s%n", i,
fileList.get(random.nextInt(lineCount - 1)));
}
}
}
This program actually spends most of its time reading in the file, so pre-allocating the ArrayList has little affect on its performance. Specifying an initial capacity in advance is more likely to be useful when your program repeatly creates large ArrayList objects without intervening I/O.
0 comments:
Post a Comment