Partial Derivatives — Differentiating with Multiple Variables

When a function depends on two or more variables, "the derivative" does not quite make sense — derivative with respect to which variable? A partial derivative answers: the derivative with respect to one variable, treating all others as fixed constants.

The notation ∂f/∂x is read "the partial derivative of f with respect to x." It measures how f changes if you move in the x-direction while keeping y (and any other variables) still. The computation is exactly ordinary differentiation — just treat every other variable as a number.

Computing Partial Derivatives

To compute ∂f/∂x: treat y (and all other variables) as constants, then differentiate normally with respect to x.
To compute ∂f/∂y: treat x as constant, differentiate with respect to y.
All the usual rules (power, product, chain, etc.) apply.

Worked Examples

f(x,y) = x³y²+sin(xy)+eˣ.
∂f/∂x = 3x²y² + y·cos(xy) + eˣ
∂f/∂y = 2x³y + x·cos(xy)

Higher-Order Partial Derivatives

fₓₓ = ∂²f/∂x² (differentiate twice with respect to x)
fᵧᵧ = ∂²f/∂y²
fₓᵧ = ∂²f/∂x∂y (differentiate first with respect to y, then x)
Clairaut's Theorem: if fₓᵧ and fᵧₓ are continuous, then fₓᵧ = fᵧₓ — mixed partials are equal.

The Chain Rule for Partial Derivatives

If z=f(x,y) and x=x(t), y=y(t), then dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt). The multivariable chain rule sums up all contributions from each path through which t affects z.

Definition

The partial derivative ∂f/∂x measures how f changes when x changes while all other variables are held fixed. Notationally: ∂f/∂x = lim(h→0) [f(x+h,y,...) − f(x,y,...)] / h.

The Limit Definition

Just as the single-variable derivative is defined as a limit of difference quotients, so is the partial derivative: ∂f/∂x = lim(h→0) [f(x+h, y) − f(x, y)] / h. The difference: only x changes by h; y remains fixed. This is not merely notation — it is a fundamentally different operation from the total derivative, because it isolates one direction of change in a multidimensional space.

Extended Examples — All Function Types

Six Partial Derivative Computations Building fluency

f = x²y³∂f/∂x = 2xy³ (y³ is a constant factor). ∂f/∂y = 3x²y² (x² is a constant factor).

f = e^(xy)∂f/∂x = y·e^(xy) (Chain Rule: outer is eᵘ, inner is xy, inner derivative w.r.t. x is y). ∂f/∂y = x·e^(xy).

f = sin(x²+y)∂f/∂x = cos(x²+y)·2x. ∂f/∂y = cos(x²+y)·1 = cos(x²+y).

f = ln(x/y)= ln x − ln y. ∂f/∂x = 1/x. ∂f/∂y = −1/y.

f = x²y + xy²∂f/∂x = 2xy + y². ∂f/∂y = x² + 2xy. Note the symmetry in this case: ∂f/∂x at (a,b) ≠ ∂f/∂y at (a,b) in general.

f = arctan(y/x)∂f/∂x = [1/(1+(y/x)²)]·(−y/x²) = −y/(x²+y²). ∂f/∂y = [1/(1+(y/x)²)]·(1/x) = x/(x²+y²).

Geometric Interpretation

For z = f(x,y), the surface lives in 3D space. ∂f/∂x at (a,b) is the slope of the curve obtained by slicing the surface with the plane y = b — the slope in the x-direction at the point (a, b, f(a,b)). ∂f/∂y at (a,b) is the slope of the curve obtained by slicing with the plane x = a — the slope in the y-direction. The gradient ∇f = (∂f/∂x, ∂f/∂y) combines both into one vector pointing in the direction of steepest ascent on the surface.

Higher-Order and Mixed Partial Derivatives

fₓₓ = ∂²f/∂x² — differentiate twice with respect to x. fₓᵧ = ∂²f/∂y∂x — differentiate first with respect to x, then y. (Note: Leibniz notation reads right-to-left for mixed partials, so ∂²f/∂y∂x means differentiate x first, then y — matching subscript notation fₓᵧ.) Clairaut's Theorem guarantees fₓᵧ = fᵧₓ for continuously differentiable functions — mixed partials commute.

The Multivariable Chain Rule in Full

If z = f(x, y) and both x = x(t) and y = y(t) are functions of t, then: dz/dt = (∂f/∂x)·(dx/dt) + (∂f/∂y)·(dy/dt). Each path from t to z contributes one term: t affects x, which affects z; t also affects y, which affects z. The total rate is the sum of these two contributions. This generalises naturally: if z = f(x₁, ..., xₙ) and each xᵢ = xᵢ(t), then dz/dt = Σᵢ (∂f/∂xᵢ)·(dxᵢ/dt) — a sum over all paths.

Applications in Machine Learning

In a neural network with weight matrix W, the loss L is a function of all weights simultaneously — L(W). Training requires computing ∂L/∂wᵢⱼ for every individual weight wᵢⱼ. These are partial derivatives of L with respect to each weight, holding all others fixed. Backpropagation computes all these partial derivatives efficiently using the multivariable chain rule, propagating gradients backwards through the network layer by layer. Understanding partial derivatives is therefore a prerequisite for understanding why neural networks train the way they do.

Frequently Asked Questions

Why does Clairaut's Theorem matter?▼

Mixed partials being equal (fₓᵧ=fᵧₓ) means the order of differentiation doesn't matter for well-behaved functions. This simplifies computation enormously. There do exist pathological functions where mixed partials are unequal, but they are not encountered in standard applications.

What notation is used for partial derivatives?▼

∂f/∂x (Leibniz), fₓ (subscript notation), Dₓf (operator notation). In physics, ∂ is nearly universal. In pure mathematics, subscript notation is common for compactness. All are equivalent.

#partial-derivatives #Clairauts-theorem #chain-rule

References & Further Reading

Stewart, J. (2015). Multivariable Calculus, §14.3. Cengage.
Rudin, W. (1976). Principles of Mathematical Analysis, Ch. 9. McGraw-Hill.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, §4.3. MIT Press.

Dr. Aisha Malik, PhD Mathematics

Senior Lecturer in Applied Mathematics · 12 years teaching calculus at university level

Dr. Malik holds a PhD in Applied Mathematics from the University of Edinburgh and has taught calculus to over 4,000 students at both undergraduate and postgraduate level. Her research focuses on numerical methods for differential equations. She has reviewed this article for mathematical accuracy and pedagogical clarity.

Technically reviewed by: Prof. James Chen, Stanford Mathematics Department