What is the Chain Rule?
The Chain Rule handles composed functions — functions inside other functions. If y = f(g(x)), then dy/dx = f'(g(x)) · g'(x). In words: differentiate the outer function (leaving the inner alone), then multiply by the derivative of the inner function.
The Leibniz Form
In Leibniz notation: dy/dx = (dy/du)·(du/dx). This looks exactly like fraction cancellation — the du terms 'cancel'. This mnemonic is not a rigorous proof but it is extremely useful for remembering the rule and setting up chain rule applications.
Step-by-Step Method
- 1. Identify the outer function and inner function.
- 2. Differentiate the outer function, leaving the inner intact.
- 3. Multiply by the derivative of the inner function.
Worked Examples
Example 1: d/dx[sin(x²)]. Outer = sin(u), inner = x². d/dx[sin(u)] = cos(u). d/dx[x²] = 2x. Answer: cos(x²)·2x = 2x·cos(x²).
Example 2: d/dx[e^(3x+1)]. Outer = eᵘ, inner = 3x+1. Derivative: e^(3x+1)·3.
Example 3: d/dx[(x²+5)⁶]. Outer = u⁶, inner = x²+5. Derivative: 6(x²+5)⁵·2x = 12x(x²+5)⁵.
Chain Rule with Product Rule
When a composed function is also multiplied by something, use both rules. d/dx[x·sin(x²)] requires Product Rule AND Chain Rule on sin(x²). Result: sin(x²) + x·cos(x²)·2x = sin(x²) + 2x²cos(x²).
The Chain Rule states: if y = f(g(x)), then dy/dx = f'(g(x)) · g'(x). You differentiate the outer function leaving the inner intact, then multiply by the derivative of the inner function.
Why the Chain Rule Exists
Every differentiation rule so far — Power, Product, Quotient — handles functions built by arithmetic operations. But what happens when functions are composed? When you nest one function inside another, like sin(x²) or e^(3x+1) or (x²+5)⁶, none of those rules apply directly. The Chain Rule was created specifically for this situation.
The intuition: if a small change in x causes a change in u = g(x), and that change in u causes a change in y = f(u), then the total rate of change dy/dx is the product of these two individual rates. This is why the Leibniz form dy/dx = (dy/du)·(du/dx) looks like fraction cancellation — it captures exactly this chaining of rates.
Identifying When to Use It
Before computing any derivative, ask: is the argument of every function just plain x? If the answer is no — if you see sin(3x), or e^(x²), or √(x+1), or (2x−5)⁸ — the Chain Rule is required. A reliable diagnostic: if you were to evaluate the function at x=2 by hand, would you need to compute something inside before applying the outer function? If yes, that "inside computation" is your inner function g(x).
- sin(x²) → outer: sin(·), inner: x². Chain Rule required.
- sin(x) · x² → two separate functions multiplied. Product Rule required (no composition).
- sin(x²) · eˣ → Product Rule on the whole, Chain Rule on sin(x²).
- x³ → argument is plain x. No Chain Rule needed.
The Three-Layer Method
For complex expressions with multiple nested layers, work strictly from outside to inside. Each layer contributes one factor in the final product.
Chain Rule with Implicit Differentiation
The Chain Rule is the mechanism behind implicit differentiation. When you differentiate y² with respect to x, you are applying the Chain Rule: the outer function is (·)², the inner function is y(x). Result: 2y · (dy/dx). This is why every y-term in implicit differentiation picks up a dy/dx factor — it is the Chain Rule, every time.
Common Chain Rule Mistakes
- Forgetting the inner derivative: d/dx[sin(x²)] = cos(x²) is incomplete. Must multiply by 2x → correct answer: 2x·cos(x²).
- Applying to non-compositions: d/dx[sin(x)·x²] is NOT a Chain Rule problem. It needs the Product Rule.
- Wrong order: Always differentiate from outside inward, not inside outward.
- Forgetting deeper layers: d/dx[(sin x²)³] needs three applications — (·)³, then sin(·), then x².
Real-World Context
The Chain Rule is the mathematical foundation of backpropagation in neural networks — the algorithm that trains every modern AI system. When a neural network computes a prediction, it chains together dozens of composed functions (activation layers). Computing how the loss changes with respect to each weight requires applying the Chain Rule recursively through every layer. Every time ChatGPT or any language model was trained, billions of Chain Rule computations were performed.
In physics, the Chain Rule underlies the relationship between different coordinate systems. Converting between Cartesian and polar coordinates, or between laboratory and rotating frames of reference, requires the Chain Rule applied to coordinate transformations.
Practice Problems
- d/dx[cos(5x³)] — identify outer and inner, then apply.
- d/dx[ln(x²+1)] — what is the outer function? What is the inner?
- d/dx[(3x−1)⁷] — pure power with linear inner function.
- d/dx[e^(sin x)] — exponential outer, trig inner.
- d/dx[arctan(2x)] — inverse trig outer, linear inner.
Answers: −15x²sin(5x³) · | · 2x/(x²+1) · | · 21(3x−1)⁶ · | · cos(x)·e^(sin x) · | · 2/(1+4x²)
- Stewart, J. (2015). Calculus, §3.4. Cengage.
- Spivak, M. (2006). Calculus, Ch. 10. Publish or Perish.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, §6.5. MIT Press.