When a function depends on two or more variables, "the derivative" does not quite make sense — derivative with respect to which variable? A partial derivative answers: the derivative with respect to one variable, treating all others as fixed constants.
The notation ∂f/∂x is read "the partial derivative of f with respect to x." It measures how f changes if you move in the x-direction while keeping y (and any other variables) still. The computation is exactly ordinary differentiation — just treat every other variable as a number.
Computing Partial Derivatives
- To compute ∂f/∂x: treat y (and all other variables) as constants, then differentiate normally with respect to x.
- To compute ∂f/∂y: treat x as constant, differentiate with respect to y.
- All the usual rules (power, product, chain, etc.) apply.
Worked Examples
- f(x,y) = x³y²+sin(xy)+eˣ.
- ∂f/∂x = 3x²y² + y·cos(xy) + eˣ
- ∂f/∂y = 2x³y + x·cos(xy)
Higher-Order Partial Derivatives
- fₓₓ = ∂²f/∂x² (differentiate twice with respect to x)
- fᵧᵧ = ∂²f/∂y²
- fₓᵧ = ∂²f/∂x∂y (differentiate first with respect to y, then x)
- Clairaut's Theorem: if fₓᵧ and fᵧₓ are continuous, then fₓᵧ = fᵧₓ — mixed partials are equal.
The Chain Rule for Partial Derivatives
If z=f(x,y) and x=x(t), y=y(t), then dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt). The multivariable chain rule sums up all contributions from each path through which t affects z.
The partial derivative ∂f/∂x measures how f changes when x changes while all other variables are held fixed. Notationally: ∂f/∂x = lim(h→0) [f(x+h,y,...) − f(x,y,...)] / h.
The Limit Definition
Just as the single-variable derivative is defined as a limit of difference quotients, so is the partial derivative: ∂f/∂x = lim(h→0) [f(x+h, y) − f(x, y)] / h. The difference: only x changes by h; y remains fixed. This is not merely notation — it is a fundamentally different operation from the total derivative, because it isolates one direction of change in a multidimensional space.
Extended Examples — All Function Types
Geometric Interpretation
For z = f(x,y), the surface lives in 3D space. ∂f/∂x at (a,b) is the slope of the curve obtained by slicing the surface with the plane y = b — the slope in the x-direction at the point (a, b, f(a,b)). ∂f/∂y at (a,b) is the slope of the curve obtained by slicing with the plane x = a — the slope in the y-direction. The gradient ∇f = (∂f/∂x, ∂f/∂y) combines both into one vector pointing in the direction of steepest ascent on the surface.
Higher-Order and Mixed Partial Derivatives
fₓₓ = ∂²f/∂x² — differentiate twice with respect to x. fₓᵧ = ∂²f/∂y∂x — differentiate first with respect to x, then y. (Note: Leibniz notation reads right-to-left for mixed partials, so ∂²f/∂y∂x means differentiate x first, then y — matching subscript notation fₓᵧ.) Clairaut's Theorem guarantees fₓᵧ = fᵧₓ for continuously differentiable functions — mixed partials commute.
The Multivariable Chain Rule in Full
If z = f(x, y) and both x = x(t) and y = y(t) are functions of t, then: dz/dt = (∂f/∂x)·(dx/dt) + (∂f/∂y)·(dy/dt). Each path from t to z contributes one term: t affects x, which affects z; t also affects y, which affects z. The total rate is the sum of these two contributions. This generalises naturally: if z = f(x₁, ..., xₙ) and each xᵢ = xᵢ(t), then dz/dt = Σᵢ (∂f/∂xᵢ)·(dxᵢ/dt) — a sum over all paths.
Applications in Machine Learning
In a neural network with weight matrix W, the loss L is a function of all weights simultaneously — L(W). Training requires computing ∂L/∂wᵢⱼ for every individual weight wᵢⱼ. These are partial derivatives of L with respect to each weight, holding all others fixed. Backpropagation computes all these partial derivatives efficiently using the multivariable chain rule, propagating gradients backwards through the network layer by layer. Understanding partial derivatives is therefore a prerequisite for understanding why neural networks train the way they do.
- Stewart, J. (2015). Multivariable Calculus, §14.3. Cengage.
- Rudin, W. (1976). Principles of Mathematical Analysis, Ch. 9. McGraw-Hill.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, §4.3. MIT Press.