Mathematics Basics - Multivariate Calculus (Partial Derivatives)

本文深入探讨了偏导数的概念及其在多元函数梯度计算中的应用,通过实例解释了如何求解函数的偏导数,并介绍了链式法则在求导过程中的作用。进一步讨论了雅可比矩阵的定义、性质及其在确定函数极值点中的关键作用,同时提到了海森矩阵在区分最大值、最小值和鞍点的应用。

Partial Derivatives

Building on what we have learned previously on univariate calculus, we can extend the concept of gradient to multivariate cases. For example, in a function f ( x , y , z ) = sin ⁡ ( x ) e y z 2 f(x,y,z)=\sin(x)e^{yz^2} f(x,y,z)=sin(x)eyz2 we want to understand the influence of each input variable x x x, y y y and z z z on this function. This requires us to perform differentiation of function f f f with respect to each input variable separately. The results obtained are called partial derivatives of function f f f because only one variable is differentiated at a time.

∂ f ∂ x = cos ⁡ ( x ) e y z 2 ∂ f ∂ y = z 2 sin ⁡ ( x ) e y z 2 ∂ f ∂ z = 2 y z sin ⁡ ( x ) e y z 2 \begin{aligned} \frac{\partial f}{\partial x}&=\cos(x)e^{yz^2}\\ \frac{\partial f}{\partial y}&=z^2\sin(x)e^{yz^2}\\ \frac{\partial f}{\partial z}&=2yz\sin(x)e^{yz^2} \end{aligned} xfyfzf=cos(x)eyz2=z2sin(x)eyz2=2yzsin(x)eyz2

Notice two things here. First, the derivative symbol is changed from d f d x \frac{df}{dx} dxdf to ∂ f ∂ x \frac{\partial f}{\partial x} xf to signify this only partially differentiates the function. Second, when we partially differentiate with respect to one variable, all the other variables are treated as constant. Therefore, when performing ∂ f ∂ x \frac{\partial f}{\partial x} xf, only sin ⁡ ( x ) \sin(x) sin(x) term is differentiated and e y z 2 e^{yz^2} eyz2 remains as a constant term.

Let’s take one step forward. If our input variables x x x, y y y and z z z are subsequently expressed in another variable t t t as follows.

x = t − 1 y = t 2 z = 1 t \begin{aligned} x&=t-1\\ y&=t^2\\ z&=\frac{1}{t} \end{aligned} xyz=t1=t2=t1

We can obtain the derivative of function f f f with respect to t t t by applying chain rules to each of its partial derivatives with respect to x x x, y y y and z z z.

d f ( x , y , z ) d t = ∂ f ∂ x ⋅ d x d t + ∂ f ∂ y ⋅ d y d t + ∂ f ∂ z ⋅ d z d t = cos ⁡ ( x ) e y z 2 ⋅ ( 1 ) + z 2 sin ⁡ ( x ) e y z 2 ⋅ ( 2 t ) + 2 y sin ⁡ ( x ) e y z 2 ⋅ ( − 1 t 2 ) = e y z 2 [ cos ⁡ ( x ) + 2 t z 2 sin ⁡ ( x ) − 2 y z t 2 sin ⁡ ( x ) ] \begin{aligned} \frac{df(x,y,z)}{dt}&=\frac{\partial f}{\partial x}\cdot\frac{dx}{dt}+\frac{\partial f}{\partial y}\cdot\frac{dy}{dt}+\frac{\partial f}{\partial z}\cdot\frac{dz}{dt}\\ &=\cos(x)e^{yz^2}\cdot(1)+z^2\sin(x)e^{yz^2}\cdot(2t)+2y\sin(x)e^{yz^2}\cdot(-\frac{1}{t^2})\\ &=e^{yz^2}[\cos(x)+2tz^2\sin(x)-\frac{2yz}{t^2}\sin(x)] \end{aligned} dtdf(x,y,z)=xfdtdx+yfdtdy+zfdtdz=cos(x)eyz2(1)+z2sin(x)eyz2(2t)+2ysin(x)eyz2(t21)=eyz2[cos(x)+2tz2sin(x)t22yzsin(x)]

Next we substitute the x x x, y y y and z z z terms by their respective t t t expression. The result is derivative of function f f f with respect to t t t only. This process of deriving derivatives via some intermediate variables is called total derivative.

d f ( x , y , z ) d t = e t 2 ⋅ 1 t 2 [ cos ⁡ ( t − 1 ) + 2 t ⋅ 1 t 2 sin ⁡ ( t − 1 ) − 2 t 2 ⋅ t 2 ⋅ 1 t sin ⁡ ( t − 1 ) ] = e [ cos ⁡ ( t − 1 ) + 2 t sin ⁡ ( t − 1 ) − 2 t sin ⁡ ( t − 1 ) ] = e cos ⁡ ( t − 1 ) \begin{aligned} \frac{df(x,y,z)}{dt}&=e^{t^2\cdot \frac{1}{t^2}}[\cos(t-1)+2t\cdot\frac{1}{t^2}\sin(t-1)-\frac{2}{t^2}\cdot t^2\cdot\frac{1}{t}\sin(t-1)]\\ &=e[\cos(t-1)+\cancel{\frac{2}{t}\sin(t-1)}-\cancel{\frac{2}{t}\sin(t-1)}]\\ &=e\cos(t-1) \end{aligned} dtdf(x,y,z)=et2t21[cos(t1)+2tt21sin(t1)t22t2t1sin(t1)]=e[cos(t1)+t2sin(t1) t2sin(t1) ]=ecos(t1)

We can verify the derivative result by substituting t t t into f ( x , y , z ) f(x,y,z) f(x,y,z) in the beginning and differentiating it with respect to t t t directly.

d f ( x , y , z ) d t = d d t sin ⁡ ( t − 1 ) e t 2 ⋅ 1 t 2 = d d t sin ⁡ ( t − 1 ) e = e cos ⁡ ( t − 1 ) \begin{aligned} \frac{df(x,y,z)}{dt}&=\frac{d}{dt}\sin(t-1)e^{t^2\cdot\frac{1}{t^2}}\\ &=\frac{d}{dt}\sin(t-1)e\\ &=e\cos(t-1) \end{aligned} dtdf(x,y,z)=dtdsin(t1)et2t21=dtdsin(t1)e=ecos(t1)

That is exactly the same as our total derivative approach. You might be wondering why we have to take a detour by doing partial derivative first followed by a substitution. In real-world applications, we can seldom find a nice analytical expression that explicitly relates function f f f to its input variable t t t or such an expression might be too complicated to be differentiated. Therefore, we have to break down the differentiation process into smaller manageable pieces and join the results back afterwards.

Jacobian

We can expression the partial derivatives of a function in a vector form. This vector representation is called Jacobian and is denoted by letter J. For example, we have a function f ( x , y , z ) = x 2 y + z 3 f(x,y,z)=x^2y+z^3 f(x,y,z)=x2y+z3 and its partial derivatives as

∂ f ∂ x = 2 x y ∂ f ∂ y = x 2 ∂ f ∂ z = 3 z 2 \begin{aligned} \frac{\partial f}{\partial x}&=2xy\\ \frac{\partial f}{\partial y}&=x^2\\ \frac{\partial f}{\partial z}&=3z^2 \end{aligned} xfyfzf=2xy=x2=3z2

Then we write the Jacobian of function f f f as

J ( x , y , z ) = [ 2 x y , x 2 , 3 z 2 ] J(x,y,z)=[2xy,x^2,3z^2] J(x,y,z)=[2xy,x2,3z2]

One property of Jacobian is that it points at the direction of steepest slope at any differentiable point. For example, at point (1,1,1)​, the corresponding vector in our previously calculated Jacobian is

J ( 1 , 1 , 1 ) = [ 2 , 1 , 3 ] J(1,1,1)=[2,1,3] J(1,1,1)=[2,1,3]

That is equivalent to say at point (1,1,1), the steepest slope is pointing at direction [2,1,3]. Furthermore, the steeper the slope, the greater the magnitude of Jacobian is. Therefore, we can compare the slope at one point with that at another point directly by magnitudes of their corresponding Jacobian. I will save the proof of finding steepest slope at a point in our later discussion. For now, please just accept it as a unique property of Jacobian.

In addition, since Jacobian vector tells us the direction of steepest slope, it is also a good indicator of special points in a function. If Jacobian evaluated at a point is 0, it means this is already the steepest point around its neighbors. This point must be a maximum, a minimum or a saddle point. As a result, we can find all points satisfying J = 0 J=0 J=0 as special points of a function.

Jacobian has some other interesting properties, too. For instance, when it is applied to multiple functions and multiple variables, Jacobian is expressed in a matrix form. Jacobian matrix is a very important tool in nonlinear algebra. We have learned from linear algebra that a transformation matrix can change a vector from one vector space to another vector space. However, this requires the vector spaces before and after transformation are both linear. What do we do with a transformation like below?

u ( x , y ) = x + sin ⁡ ( y ) v ( x , y ) = y + sin ⁡ ( x ) \begin{aligned} u(x,y)&=x+\sin(y)\\ v(x,y)&=y+\sin(x) \end{aligned} u(x,y)v(x,y)=x+sin(y)=y+sin(x)

Although the vector space defined by u u u and v v v are not linear, it turns out the finite region around each point after transformation is still very closed to linear. This is called local linearity. Jacobian matrix can be used to describe the linear transformation happening at each point where the function is differentiable. In addition, the determinant of Jacobian matrix (if it is a square matrix) represents the change in size around any given point after transformation. This property is utilized in the evaluation of multiple integral of a function.

We are not going to walk through the details of Jacobian matrix and its application here as it is not relevant to our topics later. Nonetheless, for people who are interested to know more about Jacobian, I highly recommend the introductory courses offered by Khan Academy and MIT. Both of them have an excellent discussion of this topic.

Apply Jacobian in Reality

There is one practical concern about using Jacobian. If we have found a point with Jacobian J = 0 J=0 J=0, how do we tell whether it is a maximum or a minimum point? One way is to also check the Jacobian at points around it to see whether they are all above or below the point of interest. However, this method is not very robust. Instead, we can use a simple extension to Jacobian called Hessian to help us. We know that every element of the Jacobian vector is obtained by partially differentiating function f f f with respect to its input variables x x x, y y y, z z z, etc. Hessian is the second order derivative of f f f which differentiates the Jacobian vector again with respect to each input variables.

Let’s consider our previous example of f ( x , y , z ) = x 2 y + z 3 f(x,y,z)=x^2y+z^3 f(x,y,z)=x2y+z3. Given that it has a Jacobian vector J = ( 2 x y , x 2 , 3 z 2 ) J=(2xy, x^2, 3z^2) J=(2xy,x2,3z2), we can calculate its Hessian, H, as

H = ( ∂ 2 f ∂ 2 x ∂ 2 f ∂ x ∂ y ∂ 2 f ∂ x ∂ z ∂ 2 f ∂ y ∂ x ∂ 2 f ∂ 2 y ∂ 2 f ∂ y ∂ z ∂ 2 f ∂ z ∂ x ∂ 2 f ∂ z ∂ y ∂ 2 f ∂ 2 z ) = ( 2 y 2 x 0 2 x 0 0 0 0 6 z ) H=\begin {pmatrix}\frac{\partial^2f}{\partial^2x}&\frac{\partial^2f}{\partial x\partial y}&\frac{\partial^2f}{\partial x\partial z}\\ \frac{\partial^2f}{\partial y\partial x}&\frac{\partial^2f}{\partial^2y}&\frac{\partial^2f}{\partial y\partial z}\\ \frac{\partial^2f}{\partial z\partial x}&\frac{\partial^2f}{\partial z\partial y}&\frac{\partial^2f}{\partial^2z} \end{pmatrix} =\begin{pmatrix} 2y&2x&0\\ 2x&0&0\\ 0&0&6z \end{pmatrix} H=2x2fyx2fzx2fxy2f2y2fzy2fxz2fyz2f2z2f=2y2x02x00006z

Hessian is an n by n square matrix where n is the number of variables in function f f f. It is also symmetric across the leading diagonal if f f f is continuous everywhere. We can use Hessian to deduce the maximum, minimum and saddle point of a function after we have obtained all the points with Jacobian J = 0 J=0 J=0.

  • If the determinant of Hessian at a point is positive, this point is either a maximum or a minimum point.
  • If the determinant of Hessian at a point is positive and the first term (i.e. top left corner) of Hessian is positive, this point is a minimum point.
  • If the determinant of Hessian at a point is positive and the first term (i.e. top left corner) of Hessian is negative, this point is a maximum point.
  • If the determinant of Hessian at a point is negative, this point is a saddle point.

We typically use Jacobian (and Hessian) to help us solve optimization problems because it gives us the combination of input variables that yield a maximum or a minimum value. However, this would require a function relating input and output variables to exist in the first place. In reality, this could be the most challenging step. For example, in many optimization problems the dimensionality can easily go up to hundreds or thousands (think about the neurons in a neural network problem). It is not possible to explicitly write out the expression for such a large number of variables. Moreover, even we are just solving for a 2-dimensional optimization problem it might happen that no clear analytical expression exists or calculating the value at each point is computationally too expensive. So it is still not viable to write out the function and subsequently evaluate Jacobian for every single point.

What could we do if there is no function for us to optimize? One approach we can adopt to is called numerical methods. Recall that we derived the gradient at a point by approximating rise over run of a finite interval. Given a number of data points, we can calculate the gradient between any pair of neighboring points. These are our approximation of the function’s derivatives at different points when a well-defined analytical expression does not exist. Starting from an initial point, we can take a step in the direction calculated by the gradient. In each subsequent step, we recalculate the gradient at the current point and make a move in its direction until the gradient is zero. This is how we arrive at the maximum or minimum point without an explicit function. In the multi-dimensional case, we will perform partial derivatives with respect to each of the input variable and take a step in each dimension until all partial derivatives are evaluated to zero.

One practical consideration is how big a step we shall take. Clearly, if we take a large step each time we might overshoot and miss the optimal point. But there are problems with taking too small a step too. Not only because it will take a long time to reach the optimal point, but also because of the constraint in our computational power. It will not be possible for computers to store infinitely precise numbers. So if a step change is too small, our computer might not be able to detect it at all. This happens a lot in actual machine learning practice.

Other problems we might encounter in practice include discontinuous functions and noisy data. If we blandly follow the steepest gradient path, we could be hit by a sudden stop where no value exists. With noise in data, the Jacobian evaluated at these points might lead us to a wrong direction. These practical issues remind us to always treat every gradient step with caution and perform validation where possible.

This concludes our discussion on partial derivatives and Jacobian. We now know that partial derivatives are useful in solving maximum and minimum point of a function. They can be conveniently represented by a Jacobian vector. There are, however, some challenges we have to overcome in actual applications because we are not always working with a nice analytical expression. We shall accept this fact and make use of whatever data point available to keep on moving towards the optimal point.


(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)

内容概要:文章以“智能网页数据标注工具”为例,深入探讨了谷歌浏览器扩展在毕业设计中的实战应用。通过开发具备实体识别、情感分类等功能的浏览器扩展,学生能够融合前端开发、自然语言处理(NLP)、本地存储与模型推理等技术,实现高效的网页数据标注系统。文中详细解析了扩展的技术架构,涵盖Manifest V3配置、内容脚本与Service Worker协作、TensorFlow.js模型在浏览器端的轻量化部署与推理流程,并提供了核心代码实现,包括文本选择、标注工具栏动态生成、高亮显示及模型预测功能。同时展望了多模态标注、主动学习与边缘计算协同等未来发展方向。; 适合人群:具备前端开发基础、熟悉JavaScript和浏览器机制,有一定AI模型应用经验的计算机相关专业本科生或研究生,尤其适合将浏览器扩展与人工智能结合进行毕业设计的学生。; 使用场景及目标:①掌握浏览器扩展开发全流程,理解内容脚本、Service Worker与弹出页的通信机制;②实现在浏览器端运行轻量级AI模型(如NER、情感分析)的技术方案;③构建可用于真实场景的数据标注工具,提升标注效率并探索主动学习、协同标注等智能化功能。; 阅读建议:建议结合代码实例搭建开发环境,逐步实现标注功能并集成本地模型推理。重点关注模型轻量化、内存管理与DOM操作的稳定性,在实践中理解浏览器扩展的安全机制与性能优化策略。
基于Gin+GORM+Casbin+Vue.js的权限管理系统是一个采用前后端分离架构的企业级权限管理解决方案,专为软件工程和计算机科学专业的毕业设计项目开发。该系统基于Go语言构建后端服务,结合Vue.js前端框架,实现了完整的权限控制和管理功能,适用于各类需要精细化权限管理的应用场景。 系统后端采用Gin作为Web框架,提供高性能的HTTP服务;使用GORM作为ORM框架,简化数据库操作;集成Casbin实现灵活的权限控制模型。前端基于vue-element-admin模板开发,提供现代化的用户界面和交互体验。系统采用分层架构和模块化设计,确保代码的可维护性和可扩展性。 主要功能包括用户管理、角色管理、权限管理、菜单管理、操作日志等核心模块。用户管理模块支持用户信息的增删改查和状态管理;角色管理模块允许定义不同角色并分配相应权限;权限管理模块基于Casbin实现细粒度的访问控制;菜单管理模块动态生成前端导航菜单;操作日志模块记录系统关键操作,便于审计和追踪。 技术栈方面,后端使用Go语言开发,结合Gin、GORM、Casbin等成熟框架;前端使用Vue.js、Element UI等现代前端技术;数据库支持MySQL、PostgreSQL等主流关系型数据库;采用RESTful API设计规范,确保前后端通信的标准化。系统还应用了单例模式、工厂模式、依赖注入等设计模式,提升代码质量和可测试性。 该权限管理系统适用于企业管理系统、内部办公平台、多租户SaaS应用等需要复杂权限控制的场景。作为毕业设计项目,它提供了完整的源码和论文文档,帮助学生深入理解前后端分离架构、权限控制原理、现代Web开发技术等关键知识点。系统设计规范,代码结构清晰,注释完整,非常适合作为计算机相关专业的毕业设计参考或实际项目开发的基础框架。 资源包含完整的系统源码、数据库设计文档、部署说明和毕
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值