Chapter 6 DataParallelism
6.1 并行化流操作
Data parallelism is a way to split up work to be done on many cores at the same time .
计算一组专辑的曲目总长度,拿到每张album, 得到List< Track>信息, 通过flatMap组成新的Track Stream, 然后mapToInt计算每个Track的length:
public int serialArraySum() {
return albums.stream()
.flatMap(Album::getTracks)
.mapToInt(Track::getLength)
.sum();
}
/*
public Stream<Track> getTracks() {
return tracks.stream(); // List<Track> Stream
}
*/
并行化处理:
public int parallelArraySum() {
return albums.parallelStream()
.flatMap(Album::getTracks)
.mapToInt(Track::getLength)
.sum();
}
根据性能好坏,we can split up common data source from the core library into three main groups by performance characteristics(性能) :
- The good(性能好)
AnArrayList, anarray, or theIntStream.rangeconstructor. These data sources all support random access, which means they can be split up arbitrarily with ease. - The Okay(性能一般)
TheHashSetandTreeSet. You can’t easily decompose these with perfect amounts of balance, but most of the time it’s possible to do so. - The Bad(性能差)
Some data structures just don’t split well; for example, they may take O(N) time to decompose. Example here include aLinkedList, which is computationally hard to split in half.Also,Streams.iterateandBufferedReader.lineshave unknown length at the beginning, so it’s pretty hard to estimate when to split these sources.
在讨论流中单独操作每一块的种类时,we can differentiate between two types of stream operation: stateless(无状态的) and stateful(有状态的)。Stateless operations need to maintain no concept of state over the whole operation; stateful operations have the overhead and constraint of maintaining state.
If you can get a way with using stateless operations, then you will get better parallel performance. Examples of stateless operations include map, filter, and flatMap; sorted, distinct, and limit are stateful.
6.2 Parallel Array Operations(并行化数组操作)
These operations are all located on the utility class Arrays.
Parallel operations on arrays:
Initializing an array using a for loop:
public static double[] imperativeInitilize(int size) {
double[] values = new double[size];
for(int i = 0; i < values.length;i++) {
values[i] = i;
}
return values;
}
Using parallelSetAll method in order to do this easily in parallel.
eg:
public static double[] parallelInitialize(int size) {
double[] values = new double[size];
Arrays.parallelSetAll(values, i -> i);
return values;
}
6.3 More Example
// 求平方和
public static int sumOfSquares(IntStream range) {
return range.parallel()
.map(x -> x * x)
.sum();
}
public static int sequentialSumOfSquares(IntStream range) {
return range.map(x -> x * x)
.sum();
}
// 并行化执行,注意初始值不能设为其他值,除了1
// 初值须为函数的恒等值,用恒等值和其他值做reduce操作时,
// 其他值保持不变
public static int multiplyThrough(List<Integer> numbers) {
return 5 * numbers.parallelStream()
.reduce(1, (acc, x) -> x * acc);
}
在来比较一个:
// fast
public int fastSumOfSquares() {
return arrayListOfNumbers.parallelStream()
.mapToInt(x -> x * x)
.sum();
}
// slowly
public int slowSumOfSquares() {
return linkedListOfNumbers.parallelStream()
.map(x -> x * x)
.reduce(0, (acc, x) -> acc + x);
}
Chapter 7. Testing, Debugging, and Refactoring
7.1 Refactor example
ThreadLocal<Album> thisAlbum = new ThreadLocal<Album>() {
@Override
protected Album initialValue() {
return database.lookupCurrentAlbum();
}
}
// other
ThreadLocal<Album> thisAlbum = ThreadLocal.withInitial(() -> database.lookupCurrentAlbum());
More examples:
1) refactor step1
/**
* Album 专辑
* Track 专辑里的歌曲
*/
// 计算所有专辑歌曲的时间
public long countRunningTime(List<Album> albums) {
long count = 0;
for (Album album : albums) {
for (Track track : album.getTrackList()) {
count += track.getLength();
}
}
return count;
}
// 计算所有专辑有多少音乐家
public long countMusicians(List<Album> albums) {
long count = 0;
for (Album album : albums) {
count += album.getMusicianList().size();
}
return count;
}
// 计算所有专辑有多少首歌曲
public long countTracks(List<Album> albums) {
long count = 0;
for (Album album : albums) {
count += album.getTrackList().size();
}
return count;
}
2) refactor step2
/**
* Album 专辑
* Track 专辑里的歌曲
*/
// 计算所有专辑歌曲的时间
public long countRunningTime(List<Album> albums) {
return albums.stream().mapToLong(
album -> album.getTracks().mapToLong(track -> track.getLength()).sum()
).sum();
}
// 计算所有专辑有多少音乐家
public long countMusicians(List<Album> albums) {
return albums.stream().mapToLong(
album -> album.getMusicianList().size()
).sum();
}
// 计算所有专辑有多少首歌曲
public long countTracks(List<Album> albums) {
return albums.stream().mapToLong(
album -> album.getTrackList().size()
).sum();
}
3) refactor step3
ToLongFunction
/**
* Album 专辑
* Track 专辑里的歌曲
*/
// 计算所有专辑歌曲的时间
public long countRunningTime(List<Album> albums) {
return contFeature(album -> album.getTracks().mapToLong(track -> track.getLength()).sum(), albums);
}
// 计算所有专辑有多少音乐家
public long countMusicians(List<Album> albums) {
return contFeature(album -> album.getMusicianList().size(), albums);
}
// 计算所有专辑有多少首歌曲
public long countTracks(List<Album> albums) {
return contFeature(album -> album.getTrackList().size(), albums);
}
public long contFeature(ToLongFunction<Album> function, List<Album> albums) {
return albums.stream().mapToLong(function).sum();
}
7.2 Unit Test
// Converting strings into their uppercase equivalents
public static List<String> allToUpperCase(List<String> words) {
return words.stream().map(str -> str.toUpperCase()).collect(Collectors.toList());
}
public static List<String> elementFirstToUpperCaseLambdas(List<String> words) {
return words.stream().map(
value -> {
char firstChar = Character.toUpperCase(value.charAt(0));
return firstChar + value.substring(1);
}
).collect(Collectors.toList());
}
Do use method references:
public class UnitTest {
public static List<String> elementFirstToUpperCaseLambdasMethodRef(List<String> words) {
return words.stream().map(
UnitTest::firstToUpperCase
).collect(Collectors.toList());
}
public static String firstToUpperCase(String value) {
char firstChar = Character.toUpperCase(value.charAt(0));
return firstChar + value.substring(1);
}
}
The Solution: peek
public static Set<String> forEachLoggingFailure(Album album) {
album.getMusicians()
.filter(artist -> artist.getName().startsWith("The"))
.map(artist -> artist.getNationality())
.forEach(nationality -> System.out.println("Found: " + nationality));
Set<String> nationalities
= album.getMusicians()
.filter(artist -> artist.getName().startsWith("The"))
.map(artist -> artist.getNationality())
.collect(Collectors.toSet());
return nationalities;
}
The streams library contains a method that lets you look at each value in turn and also lets you continue to operate on the same underlying stream.It’s called peek.
public static Set<String> nationalityReportUsingPeek(Album album) {
Set<String> nationalities
= album.getMusicians()
.filter(artist -> artist.getName().startsWith("The"))
.map(artist -> artist.getNationality())
.peek(nation -> System.out.println("Found nationality: " + nation))
.collect(Collectors.toSet());
return nationalities;
}
这样还可以使用peek进行日志记录。
Chapter 8. Design and Architectural Principles
The critical design tool for software development is a mind well educated in design principles. It is not…technology.
本文探讨了如何利用Java Stream API实现数据并行处理,包括并行流操作、并行数组初始化及常见并行处理技巧。并通过具体示例对比了不同数据结构在并行处理时的性能表现。
2072

被折叠的 条评论
为什么被折叠?



