Materials to understand LSTM

本文批评了现有LSTM论文中常见的表述不清及误导性图表,并通过对比不同版本的LSTM架构图,指出了它们的问题所在。作者推荐了一篇优秀的博客文章及其直观的LSTM说明图,同时分享了一个自己绘制的更易于理解的LSTM架构图。

People never judge an academic paper by those user experience standards that they apply to software. If the purpose of a paper were really promoting understanding, then most of them suck. A while ago, I read this article talking about academic pretentiousness and it speaks my heart out. My feeling is, papers are not for better understanding but rather for self-promotion. It’s a way for scholars to declare achievements and make others admire. Therefore the golden standard for an academic paper has always been letting others acknowledge the greatness, but never understand enough to surpass itself.

When it comes to LSTM, for example, there aren’t many good materials. Most likely what they would show you is this bullshit image:

There are many things on this image pisses me off.

First of all, they use these f shapes (with a footnote “f” that looks like a “t”)to denote non-linearity and at the same time, they have f_t in their equations. And they are not the same thing!

Second, they have these dash lines and solid lines to represent time delay. So solid lines are C_t, and dash lines are C_t-1. There are 5 arrows coming out of the “Cell”, but only 4 are labelled. One C_t is incorrectly labelled with dash line. One solid line is supposed to be a dash line and C_t-1.

Third, what the hell are black dots? You have to look at the equation to figure out that they are element-wise multiplications of two vectors.

And finally, C_t is supposed to be calculated as a summation of f_t * C_t-1 and i_t * tanh(W_xc * x_t + W_hc*h_t-1 +b_c) (the third equation). But the image completely misses the plus sign.

Compared to this shitty image, the following version is slightly better:

The 3 C_t-1 are correctly represented as dash lines. The plus sign is there too. Sigmoid functions are written with the proper sigma sign, not some freaking f_f.

See, things don’t have to be so difficult. And understanding shouldn’t be a painful experience. But unfortunately, academic writing makes it so almost all the time.

Perhaps the best learning material is really this excellent blog post:

And its diagram is simply beautiful:

I think the most evil thing about the first two versions are that they treat C (the cell, or the memory) not as an input to the LSTM block, but rather a thing that is internal to the block. And they adopt this complex dash line, solid line thing to represent delay. This is really confusing.

But I do noticed one difference between the first two diagrams and the last one. The first two diagrams sum up the inputs with the outputs from the previous layer to calculate the gates (for example, f_t and i_t). But in the third version, the author concatenate them. I don’t know if it is because the third version is a variation or this doesn’t matter in terms of correctness. (But I don’t think it should be concatenation because the vector size h shouldn’t be changed. If it is really concatenation, the vector size of h will change to the size of x+h?)

I drew a new diagram, which I think is better:


原文地址: https://medium.com/@shiyan/materials-to-understand-lstm-34387d6454c1

分布式微服务企业级系统是一个基于Spring、SpringMVC、MyBatis和Dubbo等技术的分布式敏捷开发系统架构。该系统采用微服务架构和模块化设计,提供整套公共微服务模块,包括集中权限管理(支持单点登录)、内容管理、支付中心、用户管理(支持第三方登录)、微信平台、存储系统、配置中心、日志分析、任务和通知等功能。系统支持服务治理、监控和追踪,确保高可用性和可扩展性,适用于中小型企业的J2EE企业级开发解决方案。 该系统使用Java作为主要编程语言,结合Spring框架实现依赖注入和事务管理,SpringMVC处理Web请求,MyBatis进行数据持久化操作,Dubbo实现分布式服务调用。架构模式包括微服务架构、分布式系统架构和模块化架构,设计模式应用了单例模式、工厂模式和观察者模式,以提高代码复用性和系统稳定性。 应用场景广泛,可用于企业信息化管理、电子商务平台、社交应用开发等领域,帮助开发者快速构建高效、安全的分布式系统。本资源包含完整的源码和详细论文,适合计算机科学或软件工程专业的毕业设计参考,提供实践案例和技术文档,助力学生和开发者深入理解微服务架构和分布式系统实现。 【版权说明】源码来源于网络,遵循原项目开源协议。付费内容为本人原创论文,包含技术分析和实现思路。仅供学习交流使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值