⸻
1. Yes, this is mathematically the reason why black holes are referred to as singularities.
f'(c) = lim_{h → 0} (f(c + h) - f(c))/h
The definition would not make sense if f wasn't defined at c (note the "f(c)" in the numerator). For instance, it can't be applied to your f(x) = (x² - 1)/(x - 1) at x = 1, because f(1) is not defined.
And it's a standard result (even stated in Calc 1 classes) that if a function is differentiable at a point, then it's continuous there. For example:
5.2 Theorem. Let f be defined on [a, b]. If f is differentiable at a point x ∈ [a, b], then f is continuous at x.
(Walter Rudin, "Principles of Mathematical Analysis", 3rd edition, p. 104)
Or:
Theorem 2.1 If f is differentiable at x = a, then f is continuous at x = a.
(Robert Smith and Roland Minton, "Calculus -Early Transcendentals", 4th edition, p. 140)
It's true that your f(x) = (x² - 1)/(x - 1) has a removable discontinuity at x = 1, since if we define g(x) = f(x) for x ≠ 1 and g(1) = 2, then g is continuous. Was this what you meant?
and yes, by defining a new function with that hole explicitly filled in with a defined value to make it continuous is the typical prescription. It does not imply the derivative exists for the other function as the other post posits.
However, you could also (probably) define the derivative as lim_{h->0} (f(c+h) - f(c-h))/2h, so without needing f(c) to be defined. But that's not standard.
It's called "removable" because it can be removed by a continuous extension - the original function itself is still formally discontinuous (of course, one would often "morally" treat these as the same function, but strictly speaking they're not). An important theorem in complex analysis is that any continuous extension at a single point is automatically a holomorphic (= complex differentiable) extension too.
>which is why they’re defined via limits
They're defined via studying f(x+h) - f(x) with a limit h -> 0. But, your example is taking two limits, h->0 and x->1, simultaneously. This is not the same thing.
A function is continuous at x = a if it is differentiable at x = a.
You do understand the concept, but your precision in the definitions is lacking.
-----
Concerning the Dirac delta example: I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta. Terry Tao has some nice blog posts where he shows that a lot of NSA can be translated into sequences, either in a high-powered way using ultrafilters, or in an elementary way using passage to convergent subsequences where necessary.
An interesting question is: What does distribution theory really accomplish? Why is it useful? I have an idea myself but I think it's an interesting question.
That can give wrong answers because derivative of the limit is not always the limit of the derivative.
When modeling phenomena with Dirac delta, I think the question becomes do I really need a discontinuity to have a useful model or can I get away with smoothening the discontinuity out.
Also when Fourier transforming over the whole real line (not just an interval where the function is periodic), one has identities that involve delta functions. E.g. \int dx e^(i * k1 * x) e^(-i * k2 * x) = 2 * pi * delta (k1 - k2).
The question is why distribution theory is a particularly good approach to notions like the Dirac delta.
The thing that piqued my interest was the side remark that the Dirac delta is a “distribution“, and that this is an unfortunate name clash with the same concept in probability (measure theory).
My training (in EE) used both Dirac delta “functions” (in signal processing) and distributions in the sense of measure theory (in estimation theory). Really two separate forks of coursework.
I had always thought that the use of delta functions in convolution integrals (signal processing) was ultimately justified by measure theory — the same machinery as I learned (with some effort) when I took measure theoretic probability.
But, as flagged by the OP, that is not the case! Mind blown.
Some of this is the result of the way these concepts are taught. There is some hand waving both in signal processing, and in estimation theory, when these difficult functions and integrals come up.
I’m not aware of signal processing courses (probably graduate level) in which convolution against delta “functions” uses the distribution concept. There are indeed words to the effect of either,
- Dirac delta is not a function, but think of it as a limit of increasingly-concentrated Gaussians;
- use of Dirac delta is ok, because we don’t need to represent it directly, only the result of an inner product against a smooth function (i.e., a convolution)
But these excuses are not rigorously justified, even at the graduate level, in my experience.
*
Separately from that, I wonder if OP has ever seen the book Radically Elementary Probability Theory, by Edward Nelson (https://web.math.princeton.edu/~nelson/books/rept.pdf). It uses nonstandard analysis to get around a lot of the (elegant) fussiness of measure theory.
The preface alone is fun to read.
Imo, the informal use is already pretty close to the formal definition. Formally, a distribution is defined purely by its inner products against certain smooth functions (usually the ones with compact support) which is what the OP alluded to when he said:
> The formal definition of a generalized function is: an element of the continuous dual space of a space of smooth functions.
That "element of the continuous dual space" is just a function that takes in a smooth function with compact support f, and returns what we take to be the inner product of f with our generalized function.
So (again, imo) "we don’t need to represent it directly, only the result of an inner product against a smooth function" isn't that distant to the formal definition.
Here are two “test functions”-
- we learned much about impulse responses, and sometimes considered responses to dipoles, etc. However, if I read the Wikipedia article correctly (it’s not great…), the theory implies that a distribution (in the technical sense) has derivatives of any order. I’m not sure I really knew that I could count on that. A rigorous treatment would have given me that assurance.
- if I understand correctly, the concept of introducing an impulse to a system that has an identity impulse response, which implies an inner product of delta with itself, is not well-defined. Again, I’m not sure if we covered that concept. (Admittedly, it’s been a long time.)
When we are doing signal processing the Dirac delta primarily comes about as the Fourier transform of a constant function, and if you work out the math this is roughly equivalent to a sinc function where the oscillations become infinitely fast. This distinction is important because the concentrated Gaussian limit has the function going to 0 as we move away from the origin, but the sinc function never goes to 0, it just oscillates really fast. This becomes a Dirac delta because any integral of a function multiplied by this sinc function has cancelling components from the fast oscillations.
The poor behavior of this limit (primarily numerically) is the closely related to the reasons why we have things like Gibbs phenomenon.
I don't know what kind of justification you expect. There's a Dirac delta sized "hole" on linear algebra, that mathematicians need a name for. It's not like we can just leave it there, unfilled.
Or put differently - here you can kinda ignore the deeper formalities and still be productive, whereas with distributions you actually need to sit down and pore over them before you can do anything.
That said, I'm curious why infinitesmals never took off in physics. This kind of quick, shut-up-and-calculate approach seems right up their alley.
I don’t know, this feels like a math “hold my beer” moment. Math is infinitely deep and interconnected, but you have to start somewhere, on solid ground.
I was not being facetious above - the issues that i mentioned above are actual problems when you make calculations. But let’s ignore those issues for a second.
So you found the “derivative” of a single, arbitrary chosen representative of an infinite family of functions. What if you chose (tanh(Nx)+1)/2? What if you chose Logistic(N^2 x) instead of Logistic(N x)? You’d get different derivatives. In fact any function (up to additive constant) whose integral of the neighborhood of 0 is 1 would work there. What use are the values you are calculating if they reflect your choice and not anything inherent to the problem?
As for distributions, i picked up and read a small 100 page penguin “leaflet” from the library during my undergrad that went through the subject rigorously (and with plenty of examples). It’s not that different from working rigorously with probability or real analysis. And at the end, in applications we indeed are usually interested in integrals, not derivatives which we have not even defined. At the end of the day, you have a [X=weak L^infinity(R)] function (heavyside). You look at the dual space and since we established don't really need the deep theory, believe me when i tell you that the correct space is the space of test function on R (X’=infinitely smooth, compact support, bounded integral). Each of those conditions is simple for our simple example of R. The inner product is via integral.
Formally speaking elements of X are equivalence classes of sequences of functions and are not really defined pointwise, but neither was the NSA example. There we had to choose an arbitrary representative hyperreal function and here we may identify pointwise defined functions with the classes of the constant sequences of those functions.
using integration by parts it is simple to show that <F,G’> = <F’,G> if F is continuously differentiable on G’s support. Let us formally define in this way the weak derivative for functions that are not traditionally differentiable, if such an element exists an is unique that satisfies all the integral relations. However note that differentiation is an linear isomorphism on the space of test functions and so weak derivative indeed exists and is unique. Furthermore
We can also define elements of X poinwise by identifying F(x) with the limit <Txn,F> as n grows if it exists and is independent of the sequence Txn where Txn is a sequence of functions with support tending to {x} and constant integral 1. It is a simple exercise to show that for “normal” functions this holds, and by above we can poinwise define derivatives this way as well.
What about our H(x)? it is an exercise to check that by pointwise we get what we should outside of 0. What about the derivative at 0? Well, do the exercise above with <T0n’,H> and we see that it is penrose undefined. Decidedly not even necessarily infinite, just undefined. However, integration by parts shows that <T,DH>=T(0) ie dirac delta at 0.
Aside from all the theory that i kinda gave handwavingly much like OP in the post, the mechanics are simple integration by parts to get the only stuff that’s “real” here, which are the integrals. in NSA we haven’t even defined those. How will knowing what infinity i will get at 0 given an arbitrarily chosen representative for H help me?
Do your results depend on ZFC? stronger axioms? At what level of infinity do we stop? You can brush aside the formalities but then what better is this approach than physicists?
Plus the ancient mathematicians did very well with just their intuition. And more to the point, I cared much more about building (hyper)number sense than some New Math “let’s learn ultrafilters before we’ve even done arithmetic”.
In impulse/constraint mechanics, when two objects collide, their momentum changes in zero time. An impulse is an infinite force applied over zero time with finite energy transfer. You have to integrate over that to get the new velocity. This is done as a special case. It is messy for multi-body collisions, and is hard to make work with a friction model. This is why large objects in video games bounce like small ones, changing direction in zero time.
I wonder if nonstandard analysis might help.
Integration can be done with its own special arithmetic: Interval arithmetic. I base this suggestion on the fact that this is apparently the only way of automatically getting error bounds on integrals. It's cool that it works.
NSA does not work with a computable field so it's not directly useful. But at the end of the article, there's a link to some code that uses the Levi-Civita field, which is a "nice" approximation to NSA because it's computable and still real-closed. You might be able to do an "auto-limit" using it, in a kind of generalisation of automatic differentiation. This might for instance turn one numerical algorithm, like Householder QR, into another one, like Gaussian elimination, by taking an appropriate limit.
I don't know if these two things interact well in practice: Levi-Civita for algebraic limits and interval arithmetic for integrals. They might! This might suggest rather provocatively that integration is only clumsily interpreted as a limit of some function. Finally tbh, I'm not sure if this is the best solution to the friction/collision detection problem you're describing.
If you can get realistic billiards breaks, you're on the right track.
I once saw in an elementary calculus book a note after the proof of a theorem about differentiation that the converse of the theorem was also true but needed more advanced techniques than were covered in the book.
I checked the advanced calculus and real analysis books I had and they didn't have the proof.
I then did some searching and found mention of a book titled "Differentiation" (or something similar) and found a site that had scans for the first chapter of that book. It proved the theorem on something like page 6 and I couldn't understand it at all. Starting from the beginning I think I got through maybe a page or two before it got to my deep with my mere bachelor's degree in mathematics level of preparation.
I kind of wish I'd bought a copy of that book. I've never since been able to find it. I've found other books with the same or similar title but they weren't it.
The limit of the Gaussian function as variance goes to 0 is not a function, but it is a distribution, the Dirac distribution.
Some distributions appear in intermediate steps while solving differential equations, and then disappear in the final solution. This is analogous to complex numbers sometimes appearing while computing the roots of a cubic function, but not being present in the roots themselves.
> You’re probably familiar with the idea that each piece has infinitesimal width, but what about the question of ‘how MANY pieces are there?’. The answer to that is a hypernatural number. Let’s call it N again.
Is that right? I thought there was an important theorem specifying that no matter the infinitesimal width of an integral slice, the total area will be in the neighborhood of (= infinitely close to) the same real number, which is the value of the integral. That's why we don't have to specify the value of dx when integrating over dx... right?
The theorem tells us that the area of the extended interval that uses a hypernatural number of slices has the same real part as the area of the exact interval. It doesn't tell us that the exact interval contains a hypernatural number of slices.
https://mathoverflow.net/questions/91646/surreal-numbers-vs-...
See Ehrlich’s answer.
I've never learnt this level of maths formally, but it's been an interest of mine on and off. And this post explained it very well, and pretty understandably for the laymen.
There it is.