Cluster by

I recently gave a "mental challenge" in form of writing a clusterBy function that take a list and a function that calculates some result on every element of the list; the result of the clusterBy function should be a list of clusters. A cluster is a list of elements from the original list for which the clustering function returns the same value.

As an example, let's cluster a list of strings using the length of the string. In other words, ("one", "two", "three", "eleven", "twelve") should cluster to (List(three), List(twelve, eleven), List(two, one)).
We create three clusters: one for strings of length 3, one for strings of length 5 and one for strings of length 6. We then add the values to the appropriate cluster.
I had a go at implementing the code in Scala, and here's my first attempt. For historical purposes, and to show different approaches, I will keep returning to this post as I discover better implementations.

def clusterBy[A, B](l: Iterable[A])(f: A => B) =
  l.map(e => (f(e), e))./:(new HashMap[B, List[A]])(
    (res, cur) => res + ((cur._1, cur._2 :: res.getOrElse(cur._1, Nil)))
  ).values

Calling clusterBy(List("one", "two", "three", "eleven", "twelve"))(_.length) returns MapLike(List(three), List(twelve, eleven), List(two, one)). Success! So, what did I do? Let's explore each step of the way.
First, the concepts. We need to go through all elements of the given container and calculate the value of f for the elements. This gives us element -> f(element) container. We then need to group the pairs into a new structure with elements that contain f(element) -> (element1, element2, …, elementn). The result of the entire operation are the values of the grouped by container.
cluster-by-1
Onwards to the code, then.
We start with def clusterBy[A, B](l: Iterable[A])(f: A => B), which defines a function called clusterBy with two generic types A and B--that's the def clusterBy[A, B] bit. The function is curried--it takes two parameter lists. The first parameter list includes just one element, l: Iterable[A]. This is the source data, a container that holds source values of type A. The second parameter list includes just one element, f: A => B, which is a function that takes the element type (A) and maps it to some other type B.
The next line starts with l.map(e => (f(e), e)) which maps the elements of the parameter l into some other container that contains tuples made up of the mapped element and the original element. So, if l were List("x", "xy")l.map(e => (f(e), e)) would return List((1, "x"), (2, "xy")).
Next, we fold left the returned list using empty map (that's the new HashMap[B, List[A]]--our grouped by container). The fold operation (working on the tuples) takes the first element of the tuple as the key in the group by map and the second element of the tuple is the value that should be added to the value associated with the key in the map. The line

(res, cur) => res + ((cur._1, cur._2 :: res.getOrElse(cur._1, Nil)))

receives the res running result of the fold left operation (the map) and cur, which is every (f(element) -> element) pair. The operation returns a map that is constructed by putting (in the old Java speak) a pair where the key is cur._1 and the value is either a new list with just cur._2 in it, or a list constructed by adding cur._2 to the existing list.
Finally, the result of the entire operation are the values of the constructed group by container.

转载于:https://my.oschina.net/u/2935389/blog/1475433

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值