Before defining the derivative of a function, let's begin with two motivating examples.

Example: Driving

Imagine motoring along down highway 61 leaving Minnesota on the way to New Orleans; though lost in listening to music, still mindful of the speedometer and odometer, both prominently placed on the dashboard of the car.

The speedometer reads 60 miles per hour, what is the odometer doing? Besides recording total distance traveled, it is incrementing dutifully every hour by 60 miles. Why? Well, the well-known formula relating distance, time and rate of travel is

\[ ~ \text{distance} = \text{ rate } \times \text{ time.} ~ \]

If the rate is a constant $60$ miles/hour, then in one hour the distance traveled is 60 miles.

Of course, the odometer isn't just incrementing once per hour, it is incrementing once every 1/10th of a mile. How much time does that take? Well, we would need to solve $1/10=60 \cdot t$ which means $t=1/600$ hours, better known as once every 6 seconds.

Using some mathematical notation, would give $x(t) = v\cdot t$, where $x$ is position at time $t$, $v$ is the velocity and $t$ the time traveled in hours. A simple graph of the first three hours of travel would show:

using CalculusWithJulia
using Plots
position(t) = 60 * t
plot(position, 0, 3)
plot(sin, 0, 2pi)

Oh no, we hit traffic. In the next 30 minutes we only traveled 15 miles. We were so busy looking out for traffic, the speedometer was not checked. What would the average speed have been? Though in the 30 minutes, the displayed speed may have varied, the average speed would simply be the change in distance over the change in time, or $\Delta x / \Delta t$. That is


Now suppose that after $6$ hours of travel the GPS in the car gives us a readout of distance traveled as a function of time. The graph looks like this:

We can see with some effort that the slope is steady for the first three hours, is slightly less between $3$ and $3.5$ hours, then is a bit steeper for the next half hour. After that, it is flat for the about half an hour, then the slope continues on with same value as in the first 3 hours. What does that say about our speed during our trip?

Based on the graph, what was the average speed over the first three hours? Well, we traveled 180 miles, and took 3 hours:


What about the next half hour? Squinting shows the amount traveled was 15 miles (195 - 180) and it took 1/2 an hour:


And the half hour after that? The average speed is found from the distance traveled, 37.5 miles, divided by the time, 1/2 hour:

37.5 / (1/2)

Okay, so there was some speeding involved.

The next half hour the car did not move. What was the average speed? Well the change in position was 0, but the time was 1/2 hour, so the average was 0.

Perhaps a graph of the speed is a bit more clear. We can do this based on the above:

function speed(t)
  if     0   < t <= 3
  elseif 3   < t <= 3.5
  elseif 3.5 < t <= 4
  elseif 4   < t <= 4.5
plot(speed, 0, 6)

The jumps, as discussed before, are artifacts of the graphing algorithm. What is interesting, is we could have derived the graph of speed from that of x by just finding the slopes of the line segments, and we could have derived the graph of x from that of speed, just using the simple formula relating distance, rate, and time.

Example: Galileo's ball and ramp experiment

One of history's most famous experiments was performed by Galileo where he rolled balls down inclined ramps, making note of distance traveled with respect to time. As Galileo had no ultra-accurate measuring device, he needed to slow movement down by controlling the angle of the ramp. With this, he could measure units of distance per units of time. (Click through to Galileo and Perspective Dauben.)

Suppose that no matter what the incline was, Galileo observed that in units of the distance traveled in the first second that the distance traveled between subsequent seconds was $3$ times, then $5$ times, then $7$ times, ... This table summarizes.




















A graph of distance versus time could be found by interpolating between the measured points:

ts = [0,1,2,3,4, 5]
xs = [0,1,4,9,16,25]
plot(ts, xs)

The graph looks almost quadratic. What would the following questions have yielded?

(9-0) / (3-0)  # (xs[4] - xs[1]) / (ts[4] - ts[1])
(9-4) / (3-2)  # (xs[4] - xs[3]) / (ts[4] - ts[3])

From the graph, we can tell that the slope of the line connecting $(2,4)$ and $(3,9)$ will be greater than that connecting $(0,0)$ and $(3,9)$. In fact, given the shape of the graph (concave up), the line connecting $(0,0)$ with any point will have a slope less than or equal to any of the line segments.

The average speed between $k$ and $k+1$ for this graph is:

xs[2]-xs[1], xs[3] - xs[2], xs[4] - xs[3], xs[5] - xs[4]
(1, 3, 5, 7)

We see it increments by $2$. The acceleration is the rate of change of speed. We see the rate of change of speed is constant, as the speed increments by 2 each time unit.

Based on this - and given Galileo's insight - it appears the acceleration will be a constant and the position as a function of time will be quadratic.

The slope of the secant line

In the above examples, we see that the average speed is computed using the slope formula. This can be generalized for any univariate function $f(x)$:

The average rate of change between $a$ and $b$ is $(f(b) - f(a)) / (b - a)$. It is typical to express this as $\Delta y/ \Delta x$, where $\Delta$ means "change".

Geometrically, this is the slope of the line connecting the points $(a, f(a))$ and $(b, f(b))$. This line is called a secant line, which is just a line intersecting two specified points on a curve.

Rather than parameterize this problem using $a$ and $b$, we let $c$ and $c+h$ represent the two values for $x$, then the secant-line-slope formula becomes

\[ ~ m = \frac{f(c+h) - f(c)}{h}. ~ \]

The slope of the tangent line

The slope of the secant line represents the average rate of change over a given period, $h$. What if this rate is so variable, that it makes sense to take smaller and smaller periods $h$? In fact, what if $h$ goes to $0$?

The slope of each secant line represents the average rate of change between $c$ and $c+h$. As $h$ goes towards $0$, we recover the slope of the tangent line, which represents the instantatneous rate of change.

The graphic suggests that the slopes of the secant line converge to the slope of a "tangent" line. That is, for a given $c$, this limit exists:

\[ ~ \lim_{h \rightarrow 0} \frac{f(c+h) - f(c)}{h}. ~ \]

We'll define the tangent line at $(c, f(c))$ to be the line through the point with the slope from the limit above - provided that limit exists. Informally, the tangent line is the line through the point that best approximates the function.

The tangent line is the best linear approximation to the function at the point $(c, f(c))$. As the viewing window zooms in on $(c,f(c))$ we can see how the graph and its tangent line get more similar.

The tangent line is not just a line that intersects the graph in one point, nor does it need only intersect the line in just one point.


What is the slope of the tangent line to $f(x) = \sin(x)$ at $c=0$?

We need to compute the limit $(\sin(c+h) - \sin(c))/h$ which is the limit as $h$ goes to $0$ of $\sin(h)/h$. We know this to be 1.

f(x) = sin(x)
c = 0
tl(x) = f(c) + 1 * (x - c)
plot(f, -pi/2, pi/2)
plot!(tl, -pi/2, pi/2)

The derivative

The limit of the slope of the secant line gives an operation: for each $c$ in the domain of $f$ there is a number (the slope of the tangent line) or it does not exist. That is, there is derived function from $f$. Call this function the derivative of $f$. There are many notations for this, here we use the "prime" notation:

\[ ~ f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h}. ~ \]

The limit above is identical, only it uses $x$ instead of $c$ to emphasize that we are thinking of function now, and not just a value at a point.

Some basic derivatives

@vars x h real=true
n = 5
ex = expand((x+h)^n - x^n)
\begin{equation*}h^{5} + 5 h^{4} x + 10 h^{3} x^{2} + 10 h^{2} x^{3} + 5 h x^{4}\end{equation*}

All terms have an h in them, so we cancel it out:

cancel(ex/h, h)
\begin{equation*}h^{4} + 5 h^{3} x + 10 h^{2} x^{2} + 10 h x^{3} + 5 x^{4}\end{equation*}

We see the lone term 5x^4 without an $h$, so as we let $h$ go to $0$, this will be the limit. That is, $f'(x) = 5x^4$.

For general integer-value, positive $n$, the binomial theorem gives an expansion $(x+h)^n = x^n + nx^{n-1}\cdot h^1 + n\cdot(n-1)x^{n-2}\cdot h^2 + \cdots$. Subtracting $x^n$ then dividing by $h$ leaves just the term $nx^{n-1}$ without a power of $h$, so the limit, in general, is just this term. That is $[x^n]' = nx^{n-1}$.

It isn't a special case, but when $n=0$, we also have the above formula applies, as $x^0$ is the constant $1$, and all constant functions will have a derivative of $0$ at all $x$. We will see that in general, the power rule applies for any $n$ where $x^n$ is defined.

We need to consider the difference $\sin(x+h) - \sin(x)$:

ex = sympy.expand_trig(sin(x+h) - sin(x))  # expand_trig is not exposed in `SymPy`
\begin{equation*}\sin{\left(h \right)} \cos{\left(x \right)} + \sin{\left(x \right)} \cos{\left(h \right)} - \sin{\left(x \right)}\end{equation*}

That used the formula $\sin(x+h) = \sin(x)\cos(h) + \sin(h)\cos(x)$.

We could then rearrange the secant line slope formula to become:

\[ ~ \cos(x) \cdot \frac{\sin(h)}{h} + \sin(x) \cdot \frac{\cos(h) - 1}{h} ~ \]

and take a limit. If the answer isn't clear, we can let SymPy do this work:

limit((sin(x+h) - sin(x))/ h, h=>0)
\begin{equation*}\cos{\left(x \right)}\end{equation*}

\[ ~ \frac{\log(x+h) - \log(x)}{h} = \frac{1}{h}\log(\frac{x+h}{x}) = \log((1+h/x)^{1/h}). ~ \]

Earlier we saw that the limit as $u$ goes to $0$ of $f(u) = (1 + u)^{1/u}$ is $e$. Re-expressing the above we can get $1/x \cdot \log(f(h/x))$. The limit as $h$ goes to $0$ of this is found from the composition rules for limits: as $\lim_{h \rightarrow 0} f(h/x) = e$, and since $\log(e)$ is $1$ we get this expression has a limit of $1/x$.

We verify through:

limit((log(x+h) - log(x))/h, h=>0)

Rules of derivatives

We could proceed in a similar manner to find other derivatives, but let's not. If we have a function $f(x) = x^5 \sin(x)$, it would be nice to leverage our previous work on the derivatives of $f(x) =x^5$ and $g(x) = \sin(x)$, rather than derive an answer from scratch.

As with limits and continuity, it proves very useful to consider rules that make the process of finding derivatives of combinations of functions a matter of combining derivatives of the individual functions in some manner.

Sum rule

Let's consider $k(x) = a\cdot f(x) + b\cdot g(x)$, what is its derivative? That is, in terms of $f$, $g$ and their derivatives, can we express $k'(x)$?

We can rearrange $(k(x+h) - k(x))$ as follows:

\[ ~ (a\cdot f(x+h) + b\cdot g(x+h)) - (a\cdot f(x) + b \cdot g(x)) = a\cdot (f(x+h) - f(x)) + b \cdot (g(x+h) - g(x)). ~ \]

Dividing by $h$, we see that this becomes

\[ ~ a\cdot \frac{f(x+h) - f(x)}{h} + b \cdot \frac{g(x+h) - g(x)}{h} \rightarrow a\cdot f'(x) + b\cdot g'(x). ~ \]

This holds two rules: the derivative of a constant times a function is the constant times the derivative of the function; and the derivative of a sum of functions is the sum of the derivative of the functions.

Product rule

Other rules can be similarly derived. SymPy can give us them as well. Here we define to symbolic functions u and v and let SymPy derive a formula for the derivative of a product of functions:

u,v = SymFunction("u,v") # make symbolic functions
f(x) = u(x) * v(x)
limit((f(x+h) - f(x))/h, h=>0)
\begin{equation*}u{\left(x \right)} \left. \frac{d}{d \xi_{1}} v{\left(\xi_{1} \right)} \right|_{\substack{ \xi_{1}=x }} + v{\left(x \right)} \left. \frac{d}{d \xi_{1}} u{\left(\xi_{1} \right)} \right|_{\substack{ \xi_{1}=x }}\end{equation*}

The output uses some new notation to represent that the derivative of $u(x) \cdot v(x)$ is the $u$ times the derivative of $v$ plus $v$ times the derivative of $u$. A common shorthand is $[uv]' = u'v + uv'$.

Quotient rule

The derivative of $f(x) = u(x)/v(x)$ - a ratio of functions - can be similarly computed. The result will be $[u/v]' = (u'v - uv')/u^2$.

Chain rule

Finally, the derivative of a composition of functions can be computed. This gives a rule called the chain rule. Before deriving, let's give a slight motivation.

Consider the output of a factory for some widget. It depends on two steps: an initial manufacturing step and a finishing step. The number of employees is important in how much is initially manufactured. Suppose $x$ is the number of employees and $g(x)$ is the amount initially manufactured. Adding more employees increases the amount made by the made-up rule $g(x) = \sqrt{x}$. The finishing step depends on how much is made by the employees. If $y$ is the amount made, then $f(y)$ is the number of widgets finished. Suppose for some reason that $f(y) = y^2.$

How many widgets are made as a function of employees? The composition $u(x) = f(g(x))$ would provide that.

What is the effect of adding employees on the rate of output of widgets? In this specific case we know the answer, as $(f \circ g)(x) = x$, so the answer is just the rate is $1$.

In general, we want to express $\Delta f / \Delta x$ in a form so that we can take a limit.

But what do we know? We know $\Delta g / \Delta x$ and $\Delta f/\Delta y$. Using $y=g(x)$, this suggests that we might have luck with the right side of this equation:

\[ ~ \frac{\Delta f}{\Delta x} = \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta x}. ~ \]

Interpreting this, we get the average rate of change in the composition can be thought of as a product: The average rate of change of the initial step ($\Delta y/ \Delta x$) times the average rate of the change of the second step evaluated not at $x$, but at $y$, $\Delta f/ \Delta y$.

Re-expressing using derivative notation with $h$ would be:

\[ ~ \frac{f(g(x+h)) - f(g(x))}{h} = \frac{f(g(x+h)) - f(g(x))}{g(x+h) - g(x)} \cdot \frac{g(x+h) - g(x)}{h}. ~ \]

The left hand side will converge to the derivative of $u(x)$ or $[f(g(x))]'$.

The right most part of the right side would have a limit $g'(x)$, were we to let $h$ go to $0$.

It isn't obvious, but the left part of the right side has the limit $f'(g(x))$. This would be clear if only $g(x+h) = g(x) + h$, for then the expression would be exactly the limit expression with $c=g(x)$. But, alas, except to some hopeful students and some special cases, it is definitely not the case in general that $g(x+h) = g(x) + h$ - that right parentheses actually means something. However, it is nearly the case that $g(x+h) = g(x) + kh$ for some $k$ and this can be used to formulate a proof (one of the two detailed here and here).

We can verify this using SymPy:

limit(( u(v(x+h)) - u(v(x)) ) / (v(x+h) - v(x)), h=>0)
\begin{equation*}\frac{d}{d v{\left(x \right)}} u{\left(v{\left(x \right)} \right)}\end{equation*}

Combined, we would end up with:

The chain rule: $[f(g(x))]' = f'(g(x)) \cdot g'(x)$. That is the derivative of the outer function evaluated at the inner function times the derivative of the inner function.

To see that this works in our specific case, we assume the general power rule that $[x^n]' = n x^{n-1}$ to get: $g'(x) = (1/2)x^{-1/2}$, $f'(x)=2x$, and $f'(g(x)) = 2(\sqrt{x})$. Together, the product is:

\[ ~ 2\sqrt{x} \cdot (1/2) 1/\sqrt{x} = 1 ~ \]

Which is the same as the derivative of $x$ found by first evaluating the composition.

Proof of the Chain Rule

A function is differentiable at $a$ if the following limit exists $\lim_{h \rightarrow 0}(f(a+h)-f(a))/h$. Reexpressing this as: $f(a+h) - f(a) - f'(a)h = \epsilon_f(h) h$ where as $h\rightarrow 0$, $\epsilon_f(h) \rightarrow 0$. Then, we have:

\[ ~ g(a+h) = g(a) + g'(a)h + \epsilon_g(h) h = g(a) + h', ~ \]

Where $h' = (g'(a) + \epsilon_g(h))h \rightarrow 0$ as $h \rightarrow 0$ will be used to simplify the following:

\[ ~ \begin{align} f(g(a+h)) - f(g(a)) &= f(g(a) + g'(a)h + \epsilon_g(h)h) - f(g(a)) \\ &= f(g(a)) + f'(g(a)) (g'(a)h + \epsilon_g(h)h) + \epsilon_f(h')(h') - f(g(a))\\ &= f'(g(a)) g'(a)h + f'(g(a))(\epsilon_g(h)h) + \epsilon_f(h')(h'). \end{align} ~ \]


\[ ~ f(g(a+h)) - f(g(a)) - f'(g(a)) g'(a) h = f'(g(a))\epsilon_g(h))h + \epsilon_f(h')(h') = (f'(g(a)) \epsilon_g(h) + \epsilon_f(h')( (g'(a) + \epsilon_g(h))))h = \epsilon(h)h, ~ \]

where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0$ into one. This is the alternative definition of the derivative, showing $(f\circ g)'(a) = f'(g(a)) g'(a)$ when $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$.

More examples

This is a product of functions, using $[u\cdot v]' = u'v + uv'$ we get:

\[ ~ 5x^4 \cdot \sin(x) + x^5 \cdot \cos(x) ~ \]

This is a quotient of functions. Using $[u/v]' = (u'v - uv')/v^2$ we get

\[ ~ (5x^4 \cdot \sin(x) - x^5 \cdot \cos(x)) / (\sin(x))^2. ~ \]

\[ ~ \cos(x^5) \cdot 5x^4. ~ \]

\[ ~ 5(\sin(x))^4 \cdot \cos(x) ~ \]

We can verify these with SymPy. Rather than take a limit, we will use SymPy's diff function to compute derivatives.

diff(x^5 * sin(x))
\begin{equation*}x^{5} \cos{\left(x \right)} + 5 x^{4} \sin{\left(x \right)}\end{equation*}
\begin{equation*}- \frac{x^{5} \cos{\left(x \right)}}{\sin^{2}{\left(x \right)}} + \frac{5 x^{4}}{\sin{\left(x \right)}}\end{equation*}
\begin{equation*}5 x^{4} \cos{\left(x^{5} \right)}\end{equation*}

and finally,

\begin{equation*}5 \sin^{4}{\left(x \right)} \cos{\left(x \right)}\end{equation*}

\[ ~ \frac{e^{x+h} - e^x}{h} = \frac{e^x \cdot(e^h -1)}{h}. ~ \]

If we know that $\lim_{h \rightarrow 0}(e^h - 1)/h = 1$, we get $[e^x]' = e^x$, that is it is a function satisfying $f'=f$.

\[ ~ 1 = (\frac{1}{e^x}) \cdot [e^x]'. ~ \]

Or solving, $[e^x]' = e^x$. This is a general strategy to find the derivative of an inverse function.

\[ ~ [x^n]' = [e^{n\log(x)}]' = e^{n\log(x)} \cdot (n \frac{1}{x}) = n x^n \cdot \frac{1}{x} = n x^{n-1}. ~ \]


The power rule expresses $f=u\cdot v$. With $u(x)=x^3$ and $v(x)=(1-x)^2$ we get:

\[ ~ u'(x) = 3x^2, \quad v'(x) = 2 \cdot (1-x)^1 \cdot (-1), ~ \]

the last by the chain rule. Combining with $u' v + u v'$ we get: $f'(x) = (3x^2)\cdot (1-x)^2 + x^3 \cdot (-2) \cdot (1-x)$.

Otherwise, the polynomial can be expanded to give $f(x)=x^5-2x^4+x^3$ which has derivative $f'(x) = 5x^4 - 8x^3 + 3x^2$.

We could expand this, as above, but instead will use the product rule. Let's work symbolically first and treat the case of $3$ products:

\[ ~ [u\cdot v\cdot w]' =[ u \cdot (vw)]' = u' (vw) + u [vw]' = u'(vw) + u[v' w + v w'] = u' vw + u v' w + uvw'. ~ \]

This pattern generalizes, clearly to $~ [f_1\cdot f_2 \cdots f_n]' = f_1' f_2 \cdots f_n + f_1 \cdot f_2' \cdot (f_3 \cdots f_n) + \dots + f_1 \cdots f_{n-1} \cdot f_n'. ~$

There are $n$ terms, each where one of the $f_i$s have a derivative. Were we to multiply top and bottom by $f_i$, we would get each term looks like: $f \cdot f_i'/f_i$.

With this, we can proceed. Each term $x-i$ has derivative $1$, so the answer to $f'(x)$, with $f$ as above, is $f'(x) = f(x)/(x-1) + f(x)/(x-2) + f(x)/(x-3) + f(x)/(x-4) + f(x)/(x-5)$.

Using the product rule and then the chain rule, we have:

\[ ~ \begin{align} f'(x) &= [x \cdot e^{-x^2}]'\\ &= [x]' \cdot e^{-x^2} + x \cdot [e^{-x^2}]'\\ &= 1 \cdot e^{-x^2} + x \cdot (e^{-x^2}) \cdot [-x^2]'\\ &= e^{-x^2} + x \cdot e^{-x^2} \cdot (-2x)\\ &= e^{-x^2} (1 - 2x^2). \end{align} ~ \]

Using the product rule and then the chain rule, we have:

\[ ~ \begin{align} f'(x) &= [e^{-ax} \cdot \sin(x)]'\\ &= [e^{-ax}]' \cdot \sin(x) + e^{-ax} \cdot [\sin(x)]'\\ &= e^{-ax} \cdot [-ax]' \cdot \sin(x) + e^{-ax} \cdot \cos(x)\\ &= e^{-ax} \cdot (-a) \cdot \sin(x) + e^{-ax} \cos(x)\\ &= e^{-ax}(\cos(x) - a\sin(x)). \end{align} ~ \]

Rules of derivatives and some sample functions

This table summarizes the rules of derivatives that allow derivatives of more complicated expressions to be computed with the derivatives of their pieces.


Power rule

$[x^n]' = n\cdot x^{n-1}$


$[cf(x)]' = c \cdot f'(x)$


$[f(x) \pm g(x)]' = f'(x) \pm g'(x)$


$[f(x) \cdot g(x)]' = f'(x)\cdot g(x) + f(x) \cdot g'(x)$


$[f(x)/g(x)]' = (f'(x) \cdot g(x) - f(x) \cdot g'(x)) / g(x)^2$


$[f(g(x))]' = f'(g(x)) \cdot g'(x)$

This table gives some useful derivatives:

$x^n \text{ all } n$

Higher-order derivatives

The derivative of a function is an operator, it takes a function and returns a new, derived, function. We could repeat this operation. The result is called a higher-order derivative. The Lagrange notation uses additional "primes" to indicate how many. So $f''(x)$ is the second derivative and $f'''(x)$ the third. For even higher orders, sometimes the notation is $f^{(n)}(x)$ to indicate an $n$th derivative.


Find the second derivative of $e^{-x^2}$.

We need the chain rule and the product rule:

\[ ~ [e^{-x^2}]'' = [e^{-x^2} \cdot (-2x)]' = (e^{-x^2} \cdot (-2x) \cdot(-2x) + e^{-x^2} \cdot (-2)) = e^{-x^2}(4x^2 - 2). ~ \]

This can be verified:

diff(diff(e^(-x^2))) |> simplify
\begin{equation*}2 \left(2 x^{2} - 1\right) e^{- x^{2}}\end{equation*}

Having to iterate the use of diff is cumbersome. An alternate notation is either specifying the variable twice: diff(ex, x, x) or using a number after the variable: diff(ex, x, 2):

diff(e^(-x^2), x, x) |> simplify
\begin{equation*}2 \left(2 x^{2} - 1\right) e^{- x^{2}}\end{equation*}
Example: Details on symbolic derivatives

The ability to breakdown an expression into operations and their arguments is necessary when trying to apply the differentiation rules. Such rules are applied from the outside in. Identifying the proper "outside" function is usually most of the battle when finding derivatives.

The SymPy program has a function that can identify the outside function of a symbolic expression. Basic math expressions can be parsed as a function being called with one or more arguments. The function SymPy.Introspection.funcname will return the function, and SymPy.Introspection.args will return the arguments.

To illustrate, we define some variables and look at the operations:

args, funcname = SymPy.Introspection.args, SymPy.Introspection.funcname # args, funcname are not exported
@vars x y z
(x, y, z)

This shows how addition is identified:

ex = x + y + z

The arguments are all three variables being added:

(x, y, z)

Similarly, with expressions of a single variable

ex = x^2 + sin(x) + sqrt(x^2 + 2)

and now the arguments are the expressions being added:

(x^2, sqrt(x^2 + 2), sin(x))

When addition is the "outside" function, the next step would be to apply the sum rule.

What about multiplication?

ex = x * y * z


(x, y, z)

Like addition, multiplication is not just a binary operation, as it is typically viewed mathematically. The output of args can be 2 or more expressions.

The power rule is an easy to use derivative rule. SymPy recognizes when the "outer" function is a power:

ex = x^4

The power rule would be used if the variable is not in the exponent, of course, as here.

Exploring the quotient rule offers a surprise:

ex = x/y

The surprise? We may have expected division, but see multiplication. SymPy uses a different form to represent division, which can be gleaned from looking at the resulting arguments:

(x, 1/y)

And the expression 1/y is really a power:

ex = 1/y

So the quotient rule would be implemented here as a combination of the product rule and the power rule.

To differentiate, we need to recursively apply derivative rules, peeling back one level at a time. When would you stop? When a number or a free symbol is encountered, as with:




In each of these cases, args returns an empty container:

args(x), args(Sym(3))
((), ())

We can put all this together to create a function to differentiate an expression. This function will be called diffex (to distinguish itself from the more powerful, built-in, diff function). The first task for diffex is to identify the wrapping, outside function, if any, and apply the corresponding rule to the arguments. Julia's Val types are used in the following to specialize based on the type of "outside" function. This is technical, but is exactly what is needed as different types of functions need different rules of differentiation.

function diffex(ex)
    n = length(free_symbols(ex))

    n == 0 && return Sym(0)
    n > 1 && error("This is a simple example, use diff")

    _diff(Val{Symbol(funcname(ex))}, args(ex)...)
diffex (generic function with 1 method)

This function identifies the number of symbols in the expression to differentiate. If none, $0$ is returned; if more than one, an error is thrown, and if one, the outside function is identified.

For sums, we want the sum rule. This is the sum of the individual derivatives, simply implemented as:

_diff(::Type{Val{:Add}}, xs...) = sum(diffex(ex) for ex in xs)
_diff (generic function with 1 method)

The product rule is a bit more advanced. We use a formula derived above: if $f = f_1 \cdot f_2 \cdots f_n$ then $f' = \sum f'_i/f_i \cdot f$:

function _diff(::Type{Val{:Mul}}, fs...)
    f = prod(fs)
    sum(diffex(fi)/fi * f for fi in fs)
_diff (generic function with 2 methods)

The power rule has three cases that could be considered depending on where the variable exists (in the base, exponent, or both). The following avoids this by reexpressing $a^b = \exp(b \ln(a))$ and then applying the chain rule:

_diff(::Type{Val{:Pow}}, a, b) = sympy.powsimp(a^b * diffex(b * log(a)))
_diff (generic function with 3 methods)

The powsimp function is used to encourage SymPy to simplify $a^b/b$ into $a^{b-1}$, which may not otherwise occur. (It is not exposed, so must be qualified by the underlying sympy Python object.)

Now to catch some special cases. First, when we encounter the symbol, we want to return 1, as $d/dx[x] = 1$:

_diff(::Type{Val{:Symbol}}) = Sym(1)
_diff (generic function with 4 methods)

For numbers the derivative is $0$. As there are quite a few number types, we define a catch all distinguishing them by the fact that the args call is empty.

_diff(::Any) = Sym(0)
_diff (generic function with 5 methods)

Now for some specific functions. Here is where the chain rule comes in. For example, the derivative when the log function is encountered has an extra product of diffex(ex):

_diff(::Type{Val{:log}}, ex) =  1/ex * diffex(ex)
_diff (generic function with 6 methods)

Similarly, here is the rule for exp:

_diff(::Type{Val{:exp}}, ex) = exp(ex) * diffex(ex)
_diff (generic function with 7 methods)

Finally, we define values for some trigonometric functions. Many more special cases, as these, would be needed to fully implement this approach:

_diff(::Type{Val{:sin}}, ex) =  cos(ex)   * diffex(ex)
_diff(::Type{Val{:cos}}, ex) = -sin(ex)   * diffex(ex)
_diff(::Type{Val{:tan}}, ex) =  sec(ex)^2 * diffex(ex)
_diff (generic function with 10 methods)

(The chain rule uses fall into a pattern that could be exploited to make the code only depend on a mapping of terms like :sin -> cos.)

Here we try it out:

diffex(x^5 + x^4 + x^3 + x^2 + x + 1)
\begin{equation*}5 x^{4} + 4 x^{3} + 3 x^{2} + 2 x + 1\end{equation*}
diffex(sin(x^2) + cos(1/x))
\begin{equation*}2 x \cos{\left(x^{2} \right)} + \frac{\sin{\left(\frac{1}{x} \right)}}{x^{2}}\end{equation*}


diffex(x^x * exp(x))
\begin{equation*}x^{x} \left(\log{\left(x \right)} + 1\right) e^{x} + x^{x} e^{x}\end{equation*}



The derivative at $c$ is the slope of the tangent line at $x=c$. Answer the following based on this graph:

fn = x -> -x*exp(x)*sin(pi*x)
plot(fn, 0, 2)

At which of these points $c= 1/2, 1, 3/2$ is the derivative negative?

Which value looks bigger from reading the graph:

At $0.708 \dots$ and $1.65\dots$ the derivative has a common value. What is it?


Consider the graph of the airyai function (from SpecialFunctions) over $[-5, 5]$.

At $x = -2.5$ the derivative is postive or negative?

At $x=0$ the derivative is postive or negative?

At $x = 2.5$ the derivative is postive or negative?


Compute the derivative of $e^x$ using limit. What do you get?


Compute the derivative of $x^e$ using limit. What do you get?


Compute the derivative of $e^{e\cdot x}$ using limit. What do you get?


In the derivation of the derivative of $\sin(x)$, the following limit is needed:

\[ ~ L = \lim_{h \rightarrow 0} \frac{\cos(h) - 1}{h}. ~ \]

This is


Let $f(x) = (e^x + e^{-x})/2$ and $g(x) = (e^x - e^{-x})/2$. Which is true?


Let $f(x) = (e^x + e^{-x})/2$ and $g(x) = (e^x - e^{-x})/2$. Which is true?


Consider the function $f$ and its transformation $g(x) = a + f(x)$ (shift up by $a$). Do $f$ and $g$ have the same derivative?

Consider the function $f$ and its transformation $g(x) = f(x - a)$ (shift right by $a$). Do $f$ and $g$ have the same derivative?

Consider the function $f$ and its transformation $g(x) = f(x - a)$ (shift right by $a$). Is $f'$ at $x$ equal to $g'$ at $x-a$?

Consider the function $f$ and its transformation $g(x) = c f(x)$, $c > 1$. Do $f$ and $g$ have the same derivative?

Consider the function $f$ and its transformation $g(x) = f(x/c)$, $c > 1$. Do $f$ and $g$ have the same derivative?

Which of the following is true?


The rate of change of volume with respect to height is $3h$. The rate of change of height with respect to time is $2t$. At at $t=3$ the height is $h=14$ what is the rate of change of volume with respect to time when $t=3$?


Which equation below is $f(x) = \sin(k\cdot x)$ a solution of ($k > 1$)?


Let $f(x) = e^{k\cdot x}$, $k > 1$. Which equation below is $f(x)$ a solution of?