Derivative at a Discontinuity (opens in new tab)

(alok.github.io)

134 pointsyuppiemephisto1y ago62 comments

62 comments

43 comments · 11 top-level

dhosek1y ago· 7 in thread

One minor nit: A function can be differentiable at a and discontinuous at a even with the standard definition of the derivative. A trivial example would be the function f(x) = (x²-1)/(x-1) which is undefined at x=1, but f'(1)=1 (in fact derivatives have exactly this sort of discontinuity in them which is why they’re defined via limits). In complex analysis, this sort of “hole” in the function is called a removable singularity¹ which is one of three types of singularities that show up in complex functions.

⸻

1. Yes, this is mathematically the reason why black holes are referred to as singularities.

bikenaga1y ago

I'm not understanding what you're saying. The standard definition of the derivative of f at c is

f'(c) = lim_{h → 0} (f(c + h) - f(c))/h

The definition would not make sense if f wasn't defined at c (note the "f(c)" in the numerator). For instance, it can't be applied to your f(x) = (x² - 1)/(x - 1) at x = 1, because f(1) is not defined.

And it's a standard result (even stated in Calc 1 classes) that if a function is differentiable at a point, then it's continuous there. For example:

5.2 Theorem. Let f be defined on [a, b]. If f is differentiable at a point x ∈ [a, b], then f is continuous at x.

(Walter Rudin, "Principles of Mathematical Analysis", 3rd edition, p. 104)

Or:

Theorem 2.1 If f is differentiable at x = a, then f is continuous at x = a.

(Robert Smith and Roland Minton, "Calculus -Early Transcendentals", 4th edition, p. 140)

It's true that your f(x) = (x² - 1)/(x - 1) has a removable discontinuity at x = 1, since if we define g(x) = f(x) for x ≠ 1 and g(1) = 2, then g is continuous. Was this what you meant?

terminalbraid1y ago

This is correct. You cannot have a discontinuity with any accepted definition of a derivative (and your definition is explicit about this: the value f(c) must exist). Just allowing the limits on both sides to be equal already has a mathematical definition which is that of a functional limit, the function in this case being (f(x) - flim(c))/ (x-c) where flim(c) is the value of a (different) functional limit of f(x): x->c (as f(c) doesn't exist).

and yes, by defining a new function with that hole explicitly filled in with a defined value to make it continuous is the typical prescription. It does not imply the derivative exists for the other function as the other post posits.

dwattttt1y ago

https://en.m.wikipedia.org/wiki/Classification_of_discontinu... is responsive and quite accessible. It notes that there doesn't have to be an undefined point for a function to be discontinuous (and that terminology often conflates the two), and matches what I recall of determining that if the limit of the derivative from both sides of the discontinuity exists and is equal, the derivative exists.

1 more reply

smokedetector11y ago

The standard definition of a derivative c involves the assumption that f is defined at c.

However, you could also (probably) define the derivative as lim_{h->0} (f(c+h) - f(c-h))/2h, so without needing f(c) to be defined. But that's not standard.

2 more replies

Tainnor1y ago

> this sort of “hole” in the function is called a removable singularity

It's called "removable" because it can be removed by a continuous extension - the original function itself is still formally discontinuous (of course, one would often "morally" treat these as the same function, but strictly speaking they're not). An important theorem in complex analysis is that any continuous extension at a single point is automatically a holomorphic (= complex differentiable) extension too.

dawnofdusk1y ago

I don't think it makes sense to allow derivatives of a function f to have a larger domain than the domain of f.

>which is why they’re defined via limits

They're defined via studying f(x+h) - f(x) with a limit h -> 0. But, your example is taking two limits, h->0 and x->1, simultaneously. This is not the same thing.

vouaobrasil1y ago

You are wrong. In order for you to make sense of what you are saying, you first must REDEFINE f(x) to be f(x) = (x^2 - 1)(x - 1) when x != 1 and define f(1) = 2. Of course, then f will be continuous at x = 1 also.

A function is continuous at x = a if it is differentiable at x = a.

You do understand the concept, but your precision in the definitions is lacking.

ogogmad1y ago· 6 in thread

I think you can get a generalisation of autodiff using this idea of "nonstandard real numbers": You just need a computable field with infinitesimals in it. The Levi-Civita field looks especially convenient because it's real-closed. You might be able to get an auto-limit algorithm from it by evaluating a program infinitely close to a limit. I'm not sure if there's a problem with numerical stability when something like division by infinitesimals gets done. Does this have something to do with how Mathematica and other CASes take limits of algebraic expressions?

-----

Concerning the Dirac delta example: I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta. Terry Tao has some nice blog posts where he shows that a lot of NSA can be translated into sequences, either in a high-powered way using ultrafilters, or in an elementary way using passage to convergent subsequences where necessary.

An interesting question is: What does distribution theory really accomplish? Why is it useful? I have an idea myself but I think it's an interesting question.

srean1y ago

> I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta.

That can give wrong answers because derivative of the limit is not always the limit of the derivative.

When modeling phenomena with Dirac delta, I think the question becomes do I really need a discontinuity to have a useful model or can I get away with smoothening the discontinuity out.

cjfd1y ago

Distribution theory has lots of applications in physics. The charge density of a point particle is the delta function.

Also when Fourier transforming over the whole real line (not just an interval where the function is periodic), one has identities that involve delta functions. E.g. \int dx e^(i * k1 * x) e^(-i * k2 * x) = 2 * pi * delta (k1 - k2).

ogogmad1y ago

The article showed that Dirac deltas could be defined WITHOUT distributions. You ignored the article when answering my question.

The question is why distribution theory is a particularly good approach to notions like the Dirac delta.

elcritch1y ago

That's fascinating about charge density of a particle being a dirac delta function. Is that a mathematical convenience or something deeper in the theory?

1 more reply

srean1y ago

Thanks a bunch for pointing me towards Levi-Civita field. Where can I learn more ? Any pedagogic text ?

yuppiemephistoOP1y ago

See my code at the end. The Wikipedia article is pretty good too. I can send you more if you like.

1 more reply

mturmon1y ago· 5 in thread

I really appreciated this piece. Thank you to OP for writing and submitting it.

The thing that piqued my interest was the side remark that the Dirac delta is a “distribution“, and that this is an unfortunate name clash with the same concept in probability (measure theory).

My training (in EE) used both Dirac delta “functions” (in signal processing) and distributions in the sense of measure theory (in estimation theory). Really two separate forks of coursework.

I had always thought that the use of delta functions in convolution integrals (signal processing) was ultimately justified by measure theory — the same machinery as I learned (with some effort) when I took measure theoretic probability.

But, as flagged by the OP, that is not the case! Mind blown.

Some of this is the result of the way these concepts are taught. There is some hand waving both in signal processing, and in estimation theory, when these difficult functions and integrals come up.

I’m not aware of signal processing courses (probably graduate level) in which convolution against delta “functions” uses the distribution concept. There are indeed words to the effect of either,

- Dirac delta is not a function, but think of it as a limit of increasingly-concentrated Gaussians;

- use of Dirac delta is ok, because we don’t need to represent it directly, only the result of an inner product against a smooth function (i.e., a convolution)

But these excuses are not rigorously justified, even at the graduate level, in my experience.

Separately from that, I wonder if OP has ever seen the book Radically Elementary Probability Theory, by Edward Nelson (https://web.math.princeton.edu/~nelson/books/rept.pdf). It uses nonstandard analysis to get around a lot of the (elegant) fussiness of measure theory.

The preface alone is fun to read.

creata1y ago

> But these excuses are not rigorously justified, even at the graduate level, in my experience.

Imo, the informal use is already pretty close to the formal definition. Formally, a distribution is defined purely by its inner products against certain smooth functions (usually the ones with compact support) which is what the OP alluded to when he said:

> The formal definition of a generalized function is: an element of the continuous dual space of a space of smooth functions.

That "element of the continuous dual space" is just a function that takes in a smooth function with compact support f, and returns what we take to be the inner product of f with our generalized function.

So (again, imo) "we don’t need to represent it directly, only the result of an inner product against a smooth function" isn't that distant to the formal definition.

mturmon1y ago

I hear you, and I admit I'm drawing a fuzzy line (is the conventional approach “rigorous”).

Here are two “test functions”-

- we learned much about impulse responses, and sometimes considered responses to dipoles, etc. However, if I read the Wikipedia article correctly (it’s not great…), the theory implies that a distribution (in the technical sense) has derivatives of any order. I’m not sure I really knew that I could count on that. A rigorous treatment would have given me that assurance.

- if I understand correctly, the concept of introducing an impulse to a system that has an identity impulse response, which implies an inner product of delta with itself, is not well-defined. Again, I’m not sure if we covered that concept. (Admittedly, it’s been a long time.)

1 more reply

dannyz1y ago

While the limit of increasingly concentrated Gaussian's does result in a Dirac delta, but it is not the only way the Dirac delta comes about and is probably not the correct way to think about it in the context of signal processing.

When we are doing signal processing the Dirac delta primarily comes about as the Fourier transform of a constant function, and if you work out the math this is roughly equivalent to a sinc function where the oscillations become infinitely fast. This distinction is important because the concentrated Gaussian limit has the function going to 0 as we move away from the origin, but the sinc function never goes to 0, it just oscillates really fast. This becomes a Dirac delta because any integral of a function multiplied by this sinc function has cancelling components from the fast oscillations.

The poor behavior of this limit (primarily numerically) is the closely related to the reasons why we have things like Gibbs phenomenon.

1 more reply

marcosdumay1y ago

The Dirac delta is a unitary vector when represented on a vectorial basis it's a component of.

I don't know what kind of justification you expect. There's a Dirac delta sized "hole" on linear algebra, that mathematicians need a name for. It's not like we can just leave it there, unfilled.

yuppiemephistoOP1y ago

Thanks! And yeah I’m familiar with Nelson

shwouchk1y ago· 4 in thread

It is an interesting piece but to claim that no heavy machinery is used is a bit disingenuous at best. You have defined some purely algebraic operation “differentiation”. This operation involves a choice of infinitesimal. Is it trivial to show that the definition is independent of infinitesimal? especially if we are deriving at a hyperreal point? I doubt it and likely you would need to do more complicated set theoretic limits rather analytic limits. How do you calculate the integral of this function? Or even define it? Or rather functions, since it’s an infinite family of logistic functions? To even properly define this space you need to go quite heavily into set theory and i doubt many would find it simpler, even than working with distributions

bubblyworld1y ago

The machinery of mathematics goes arbitrarily deep. I think the interesting thing here is that with relatively little training you can start to compute with these numbers, which is definitely not the case with analysis on distributions.

Or put differently - here you can kinda ignore the deeper formalities and still be productive, whereas with distributions you actually need to sit down and pore over them before you can do anything.

That said, I'm curious why infinitesmals never took off in physics. This kind of quick, shut-up-and-calculate approach seems right up their alley.

shwouchk1y ago

> I think the interesting thing here is that with relatively little training you can start to compute with these numbers, which is definitely not the case with analysis on distributions.

I don’t know, this feels like a math “hold my beer” moment. Math is infinitely deep and interconnected, but you have to start somewhere, on solid ground.

I was not being facetious above - the issues that i mentioned above are actual problems when you make calculations. But let’s ignore those issues for a second.

So you found the “derivative” of a single, arbitrary chosen representative of an infinite family of functions. What if you chose (tanh(Nx)+1)/2? What if you chose Logistic(N^2 x) instead of Logistic(N x)? You’d get different derivatives. In fact any function (up to additive constant) whose integral of the neighborhood of 0 is 1 would work there. What use are the values you are calculating if they reflect your choice and not anything inherent to the problem?

As for distributions, i picked up and read a small 100 page penguin “leaflet” from the library during my undergrad that went through the subject rigorously (and with plenty of examples). It’s not that different from working rigorously with probability or real analysis. And at the end, in applications we indeed are usually interested in integrals, not derivatives which we have not even defined. At the end of the day, you have a [X=weak L^infinity(R)] function (heavyside). You look at the dual space and since we established don't really need the deep theory, believe me when i tell you that the correct space is the space of test function on R (X’=infinitely smooth, compact support, bounded integral). Each of those conditions is simple for our simple example of R. The inner product is via integral.

Formally speaking elements of X are equivalence classes of sequences of functions and are not really defined pointwise, but neither was the NSA example. There we had to choose an arbitrary representative hyperreal function and here we may identify pointwise defined functions with the classes of the constant sequences of those functions.

using integration by parts it is simple to show that <F,G’> = <F’,G> if F is continuously differentiable on G’s support. Let us formally define in this way the weak derivative for functions that are not traditionally differentiable, if such an element exists an is unique that satisfies all the integral relations. However note that differentiation is an linear isomorphism on the space of test functions and so weak derivative indeed exists and is unique. Furthermore

We can also define elements of X poinwise by identifying F(x) with the limit <Txn,F> as n grows if it exists and is independent of the sequence Txn where Txn is a sequence of functions with support tending to {x} and constant integral 1. It is a simple exercise to show that for “normal” functions this holds, and by above we can poinwise define derivatives this way as well.

What about our H(x)? it is an exercise to check that by pointwise we get what we should outside of 0. What about the derivative at 0? Well, do the exercise above with <T0n’,H> and we see that it is penrose undefined. Decidedly not even necessarily infinite, just undefined. However, integration by parts shows that <T,DH>=T(0) ie dirac delta at 0.

Aside from all the theory that i kinda gave handwavingly much like OP in the post, the mechanics are simple integration by parts to get the only stuff that’s “real” here, which are the integrals. in NSA we haven’t even defined those. How will knowing what infinity i will get at 0 given an arbitrarily chosen representative for H help me?

Do your results depend on ZFC? stronger axioms? At what level of infinity do we stop? You can brush aside the formalities but then what better is this approach than physicists?

1 more reply

Tainnor1y ago

Even just defining the hyperreals and showing why statements about them are also valid for the reals needs to go through either ultrafilters (which are some rather abstract objects) or model theory. Of course you can just handwave all of that away but then I guess you can also do that with standard analysis.

yuppiemephistoOP1y ago

There are theories like SPOT and Internal Set Theory that don’t require filters.

Plus the ancient mathematicians did very well with just their intuition. And more to the point, I cared much more about building (hyper)number sense than some New Math “let’s learn ultrafilters before we’ve even done arithmetic”.

1 more reply

Animats1y ago· 3 in thread

Hm. Back when I was working on game physics engines this might have been useful.

In impulse/constraint mechanics, when two objects collide, their momentum changes in zero time. An impulse is an infinite force applied over zero time with finite energy transfer. You have to integrate over that to get the new velocity. This is done as a special case. It is messy for multi-body collisions, and is hard to make work with a friction model. This is why large objects in video games bounce like small ones, changing direction in zero time.

I wonder if nonstandard analysis might help.

ogogmad1y ago

The following is just my opinion:

Integration can be done with its own special arithmetic: Interval arithmetic. I base this suggestion on the fact that this is apparently the only way of automatically getting error bounds on integrals. It's cool that it works.

NSA does not work with a computable field so it's not directly useful. But at the end of the article, there's a link to some code that uses the Levi-Civita field, which is a "nice" approximation to NSA because it's computable and still real-closed. You might be able to do an "auto-limit" using it, in a kind of generalisation of automatic differentiation. This might for instance turn one numerical algorithm, like Householder QR, into another one, like Gaussian elimination, by taking an appropriate limit.

I don't know if these two things interact well in practice: Levi-Civita for algebraic limits and interval arithmetic for integrals. They might! This might suggest rather provocatively that integration is only clumsily interpreted as a limit of some function. Finally tbh, I'm not sure if this is the best solution to the friction/collision detection problem you're describing.

btilly1y ago

Making it work in finite but short time should fix that. A large object generally can deform a larger distance. This makes all collisions inelastic, with large ones being different than small ones.

If you can get realistic billiards breaks, you're on the right track.

lupire1y ago

Nonstandard analysis is the mathematical description of your special case. Same thing.

tzs1y ago· 2 in thread

Differentiation turns out to be a deeper subject than most people expect even if you just stick to the ordinary real numbers rather than venturing into things like hyperreals.

I once saw in an elementary calculus book a note after the proof of a theorem about differentiation that the converse of the theorem was also true but needed more advanced techniques than were covered in the book.

I checked the advanced calculus and real analysis books I had and they didn't have the proof.

I then did some searching and found mention of a book titled "Differentiation" (or something similar) and found a site that had scans for the first chapter of that book. It proved the theorem on something like page 6 and I couldn't understand it at all. Starting from the beginning I think I got through maybe a page or two before it got to my deep with my mere bachelor's degree in mathematics level of preparation.

I kind of wish I'd bought a copy of that book. I've never since been able to find it. I've found other books with the same or similar title but they weren't it.

perihelions1y ago

Do you remember what the theorem was?

tzs1y ago

Nope.

plus1y ago· 2 in thread

I've personally always thought of the Dirac delta function as being the limit of a Gaussian with variance approaching 0. From this perspective, the Heaviside step function is a limit of the error function. I feel the error function and logistic function approaches should be equivalent, though I haven't worked through to math to show it rigorously.

yuppiemephistoOP1y ago

All these would be infinitely close in the nonstandard characterization. I just picked logistic because it was easy and step is discontinuous so it shows off the approach’s power. If I started with delta instead I would have done Gaussian and integrated that and ended up with erf.

thrance1y ago

It is, in a way. The whole point of distributions is to extend the space of functions to one where more operations are permitted.

The limit of the Gaussian function as variance goes to 0 is not a function, but it is a distribution, the Dirac distribution.

Some distributions appear in intermediate steps while solving differential equations, and then disappear in the final solution. This is analogous to complex numbers sometimes appearing while computing the roots of a cubic function, but not being present in the roots themselves.

thaumasiotes1y ago· 2 in thread

> The Number of Pieces an Integral is Cut Into

> You’re probably familiar with the idea that each piece has infinitesimal width, but what about the question of ‘how MANY pieces are there?’. The answer to that is a hypernatural number. Let’s call it N again.

Is that right? I thought there was an important theorem specifying that no matter the infinitesimal width of an integral slice, the total area will be in the neighborhood of (= infinitely close to) the same real number, which is the value of the integral. That's why we don't have to specify the value of dx when integrating over dx... right?

yuppiemephistoOP1y ago

The number N in question will adjust with dx (up to infinitesimal error anyway). So if dx is halved, N will double. But both retain their character as infinitesimal and hyperfinite.

thaumasiotes1y ago

But they don't retain their status as hypernaturals! dx does not need to evenly divide the interval over which the integral is taken. Whenever it doesn't, the number of slices in the integral will fail to be a hypernatural number, because one of the slices will extend beyond the interval boundary.

The theorem tells us that the area of the extended interval that uses a hypernatural number of slices has the same real part as the area of the exact interval. It doesn't tell us that the exact interval contains a hypernatural number of slices.

1 more reply

agnosticmantis1y ago· 1 in thread

Related to the Hyperreal numbers mentioned in the article is the class of Surreal numbers which have many fun properties. There's a nice book describing them authored by Don Knuth.

yuppiemephistoOP1y ago

The hyperreals and surreals are actually isomorphic under a mild strengthening of the axiom of choice (NBG).

https://mathoverflow.net/questions/91646/surreal-numbers-vs-...

See Ehrlich’s answer.

chii1y ago

Wow, it never occurred to me that the step function and the dirac delta are related in this way! but now that i see it, it's obvious!

I've never learnt this level of maths formally, but it's been an interest of mine on and off. And this post explained it very well, and pretty understandably for the laymen.

hoseja1y ago

>We’ll use the hyperreal numbers from the unsexily named field of nonstandard analysis

There it is.

j / k navigate · click thread line to collapse