gg_sankey

本文介绍如何使用R语言的ggplot2包绘制Sankey图,包括构建左右两边的条形图,标注比例,以及绘制中间的引流线。通过自定义函数计算导流线的点坐标,并利用geom_ribbon添加到图层。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

好像有好久没有更新了,一直想自己用ggplot2实现一下sankey图,就着手做了一下最简单的.

一般的sankey图长这样,左边一列,右边一列,中间的条带是左右两个状态之间的转变.

那么,首先我们就需要构建左右两边的bar,在每个柱的中间标注上所占的比例:

library(ggplot2)
color_list <- c("#f38181", "#fce38a", "#61c0bf", "#95e1d3")
bar_data <- data.frame(
  x = c(1, 1, 1, 11, 11,11,11),
  type = c("a", "b", "c", "a", "b", "c","d"),
  y = c(0.2, 0.3, 0.5, 0.1, 0.5, 0.2, 0.2)
)
text_data_create <- function(bar_data){
  x = bar_data$x
  text = bar_data$y
  y = apply(
    matrix(names(table(x)), ncol = 1),
    1,
    function(x_group){
      index = which(x == as.numeric(x_group))
      start = cumsum(text[index])
      end = c(0, start[1:(length(start)-1)])
      
      return((1-(start + end)/2))
    }
  )
  text_data =  data.frame(
    x = x,
    y = unlist(y), 
    text = text
  )
}
bar_p <- ggplot(data = bar_data) +
  geom_bar(position = "fill", stat = "identity", aes(fill = type, x,y), colour = "white", width = 0.8) +
  geom_text(data = text_data_create(bar_data), aes(x, y, label = text)) +
  scale_fill_manual(values = color_list) 

结果如图:

接下去就是中间引流线的构建,简单来说其实就是确定上线和下线,为了美观,我用 \(X^{3}\)给线加上弧度:

river_data_create <- function(start_y_upper, end_y_upper, start_y_lower, end_y_lower){
  x = seq((1 + 0.8/2), (11 - 0.8/2), length = 10000)
  mean_y_upper = (start_y_upper + end_y_upper)/2
  y_upper = (start_y_upper - mean_y_upper)/(4.6^3)*(-x + 6)^3 + mean_y_upper
  mean_y_lower = (start_y_lower + end_y_lower)/2
  y_lower = (start_y_lower - mean_y_lower)/(4.6^3)*(-x + 6)^3 + mean_y_lower
  river_data = data.frame(
    x,
    y_upper,
    y_lower
  )
  text_data = data.frame(
    x = 6,
    y = (start_y_upper + end_y_lower) / 2,
    text = as.character(start_y_upper - start_y_lower)
  )
  return(list(line = river_data, text = text_data))
}

这样就完成了计算导流线的点坐标,之后就利用 geom_ribbon 往图层上添加即可.

river_data <- river_data_create(1,0.9,0.9, 0.8)
sankey_p <- bar_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[1], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

river_data <- river_data_create(0.5,0.6, 0.4, 0.5)
sanky_p <- sanky_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[3], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

river_data <- river_data_create(0.2,0.2, 0, 0)
sanky_p <- sanky_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[3], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

最后就是对theme的调整,把一些没用的线去掉:

虽然现在已经有很多包可以实现 sankey 图的绘画, 比如 riverplot, 但是实现一次还是挺有意思的.

最后,祝您

身体健康.

转载于:https://www.cnblogs.com/wwdPeRl/p/11127051.html

D:\workview\box1\.venv\Scripts\python.exe "D:\workview\box1\专业 - 年级分布桑基图与堆叠柱状图.py" Traceback (most recent call last): File "D:\workview\box1\专业 - 年级分布桑基图与堆叠柱状图.py", line 132, in <module> plot_major_sankey() File "D:\workview\box1\专业 - 年级分布桑基图与堆叠柱状图.py", line 54, in plot_major_sankey fig = go.Figure(go.Sankey( ^^^^^^^^^^ File "D:\workview\box1\.venv\Lib\site-packages\plotly\graph_objs\_sankey.py", line 955, in __init__ self._set_property("node", arg, node) File "D:\workview\box1\.venv\Lib\site-packages\plotly\basedatatypes.py", line 4422, in _set_property _set_property_provided_value(self, name, arg, provided) File "D:\workview\box1\.venv\Lib\site-packages\plotly\basedatatypes.py", line 398, in _set_property_provided_value obj[name] = val ~~~^^^^^^ File "D:\workview\box1\.venv\Lib\site-packages\plotly\basedatatypes.py", line 4944, in __setitem__ self._set_compound_prop(prop, value) File "D:\workview\box1\.venv\Lib\site-packages\plotly\basedatatypes.py", line 5355, in _set_compound_prop val = validator.validate_coerce(val, skip_invalid=self._skip_invalid) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\workview\box1\.venv\Lib\site-packages\_plotly_utils\basevalidators.py", line 2489, in validate_coerce v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\workview\box1\.venv\Lib\site-packages\plotly\graph_objs\sankey\_node.py", line 684, in __init__ self._process_kwargs(**dict(arg, **kwargs)) File "D:\workview\box1\.venv\Lib\site-packages\plotly\basedatatypes.py", line 4470, in _process_kwargs raise err ValueError: Invalid property specified for object of type plotly.graph_objs.sankey.Node: 'font' Did you mean "line"? Valid properties: align Sets the alignment method used to position the nodes along the horizontal axis. color Sets the `node` color. It can be a single value, or an array for specifying color for each `node`. If `node.color` is omitted, then the default `Plotly` color palette will be cycled through to have a variety of colors. These defaults are not fully opaque, to allow some visibility of what is beneath the node. colorsrc Sets the source reference on Chart Studio Cloud for `color`. customdata Assigns extra data to each node. customdatasrc Sets the source reference on Chart Studio Cloud for `customdata`. groups Groups of nodes. Each group is defined by an array with the indices of the nodes it contains. Multiple groups can be specified. hoverinfo Determines which trace information appear when hovering nodes. If `none` or `skip` are set, no information is displayed upon hovering. But, if `none` is set, click and hover events are still fired. hoverlabel :class:`plotly.graph_objects.sankey.node.Hoverlabel` instance or dict with compatible properties hovertemplate Template string used for rendering the information that appear on hover box. Note that this will override `hoverinfo`. Variables are inserted using %{variable}, for example "y: %{y}" as well as %{xother}, {%_xother}, {%_xother_}, {%xother_}. When showing info for several points, "xother" will be added to those with different x positions from the first point. An underscore before or after "(x|y)other" will add a space on that side, only when this field is shown. Numbers are formatted using d3-format's syntax %{variable:d3-format}, for example "Price: %{y:$.2f}". https://github.com/d3/d3-format/tree/v1.4.5#d3-format for details on the formatting syntax. Dates are formatted using d3-time-format's syntax %{variable|d3-time-format}, for example "Day: %{2019-01-01|%A}". https://github.com/d3/d3-time- format/tree/v2.2.3#locale_format for details on the date formatting syntax. The variables available in `hovertemplate` are the ones emitted as event data described at this link https://plotly.com/javascript/plotlyjs-events/#event- data. Additionally, every attributes that can be specified per-point (the ones that are `arrayOk: true`) are available. Variables `sourceLinks` and `targetLinks` are arrays of link objects.Finally, the template string has access to variables `value` and `label`. Anything contained in tag `<extra>` is displayed in the secondary box, for example "<extra>{fullData.name}</extra>". To hide the secondary box completely, use an empty tag `<extra></extra>`. hovertemplatesrc Sets the source reference on Chart Studio Cloud for `hovertemplate`. label The shown name of the node. labelsrc Sets the source reference on Chart Studio Cloud for `label`. line :class:`plotly.graph_objects.sankey.node.Line` instance or dict with compatible properties pad Sets the padding (in px) between the `nodes`. thickness Sets the thickness (in px) of the `nodes`. x The normalized horizontal position of the node. xsrc Sets the source reference on Chart Studio Cloud for `x`. y The normalized vertical position of the node. ysrc Sets the source reference on Chart Studio Cloud for `y`. Did you mean "line"? Bad property path: font ^^^^ 进程已结束,退出代码为 1
最新发布
06-26
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值