Show HN: MathLingua – The Language of Mathematics | Better HN

46 comments

34 comments · 10 top-level

CatsAreCoolOP5y ago· 10 in thread

Hello, this is the creator of MathLingua.

MathLingua is still in development, and the documentation still needs work.

However, I wanted to get feedback earlier rather than later to help guide development and the writing of the docs.

Note that MathLingua is the language that powers https://mathlore.org.

uryga5y ago

could you explain the usecase a bit? i skimmed the docs and don't quite get what writing definitions and theorems in this format gets me, apart from an automated "go to definition". what can the tooling do, or what functionality are you planning to implement?

btw "The Language of Mathematics" sounds a bit over the top ;) how about "A Language for Mathematics"?

CatsAreCoolOP5y ago

Thanks for the feedback. I'll update the docs to make this more clear.

The goal is to allow the building of a system so someone could ask, for example, "what are the known theorems that describe when a function uniformly continuous?" or "under what conditions is a group solvable?".

To enable this, it is necessary to build a system that catalogs precise statements of math theorems, definitions, and other statements (https://mathlore.org is the start of this system that uses MathLingua but currently only uses keywords and some structure for searching).

This is similar to https://mathworld.wolfram.com or the math pages in Wikipedia except the statements are not text or LaTeX, that is hard for a computer to understand the meaning of. Instead the MathLingua language is used to identify math statements by meaning.

Right now the tooling can render math statements, find duplicate definitions, and can find missing definitions. Further https://mathlore.org uses MathLingua and has some basic searching.

Further, experimental work is also underway so that the MathLingua tooling can expand a math statement by expanding the definition of items used in that statement.

Yeah, I thought about using something like "A Language for Mathematics" but I thought "The Language of Mathematics" might draw more attention :) But it is only the title of my post on hacker news, not the title of the MathLingua website.

Cool work. You should make it very loud and clear that your toolchain does not verify proofs, since otherwise folks may think that your computer-assisted proof layout system does just that; it's rare to see tooling in this subfield which does not verify proofs.

CatsAreCoolOP5y ago

Thanks for the feedback. I will definitely do that.

it's cool but why not just Latex? Genuine question.

I am interested in learning more abt your rationale for creating MathLingua? What drove u? What existing faults do you see in existing system that made you do it.

CatsAreCoolOP5y ago

Thanks for the question. I thought about using LaTeX, however, it doesn't explain what a math statement means. Instead, it describes that it looks like. With MathLingua, the meaning is made precise so that a computer can understand the statement. (I'll make this more clear in the docs).

TLDR: Having a computer understand what a statement written in LaTeX means is a very difficult Machine Learning (Natural Language Processing) problem (if not impossible since math statements often have lots of hidden context).

For more details:

In LaTeX, what does "if $f$ is continuous on $[a, b]$, $f \geq 0$ there, and $\int_a^b f = 0$ then $f = 0$?

It looks clear at first, but what type of thing is $f$. From context of reading analysis books, it could be clear that $f$ must be a function because "continuous" describes functions (but can also describe other things).

Next, what does $\int_a^b f = 0$ mean? It is the integral of $f$ from $a$ to $b$, but is it the Riemann integral, the Lebesgue integral, the Darboux integral? Does the difference matter?

What exactly does $f = 0$ mean. Does it mean $f$ as a function is the same as $0$ as a function, or does it mean $f(x) = 0$ for all $x$? Are those two the same or different? Does the distinction matter?

All of these questions, show up after a computer has correctly parsed the sentence and identified the structure. However, just that step itself is a very hard natural language processing problem.

In fact, if someone comfortable in reading math was given a random theorem from a random paper and was told to explain what it means, they would likely have problems and would need to look at the context of the paper (or reference other papers) to understand the syntax. This just gives a feel for the difficultly to build a machine learning model.

That is why I decided to build a custom language. In MathLingua, for example, the theorem could be described as:

  Theorem:
  . for: f(x), a, b
    where:
    . 'a, b is \real'
    . 'a \leq b'
    . 'f is \continuous.function:on{\closed.interval{a, b}}'
    . 'f is \nonnegative.function:on{\closed.interval{a, b}}'
    . '\riemann.integral_a^b[x]{f(x)} = 0'
    then:
    . 'f is \identically.zero.function:on{\closed.interval{a, b}}'

It is more verbose, but gives a more precise meaning of what each item means. Then definitions can be made to describe what each referenced math definition means with a description of how its associated notation is written to make the rendering look similar to how it is written on paper.

For example, one could have the definition:

  [\identically.zero.function:on{A}]
  Defines: f(x)
  assuming: 'A is \set'
  means:
  . for: a
    where: 'a \in A'
    then: 'f(a) = 0'
  Metadata:
  . written: "f? = 0 \textrm{ on } A?"

This makes the definition more clear and the statement `g is \identically.zero.function:on{X}`, for example, is rendered as the familiar $g = 0 \textrm{ on } X$.

Other definitions can be written too, to make everything clear (or left for later. MathLingua doesn't require you to define the entire math universe before defining a more complex concept).

My response was a little long, but that is because there are a lot of subtleties that show up when trying to get LaTeX to work for describe the meaning of math statements. I hope this helps.

auggierose5y ago

There is a conference related to that topic starting today actually (CICM 2020). Registration might still be available. The subcategory that concerns things like math lingua is called mathematical knowledge management (MKM).

CatsAreCoolOP5y ago

Cool. Thanks for letting me know.

mkl5y ago

I spotted a typo on the linked page: "vistual"

The dot indentation syntax seems very strange. Why not just indent? Is the intention to reduce nested indentation a little?

CatsAreCoolOP5y ago

Thanks for letting me know about the typo.

The dot space is used to mark arguments since some arguments span multiple lines but are connected. For example, in

  Theorem:
  . for: x
    where: y
    then: z
  . 'some statement'

the 'for:where:then' group together is one argument, while the 'some statement' item is the second argument. The dot is needed to distinguish from

  Theorem:
    for: x
    where: y
    then: z
    'some statement'

where it is hard to tell where one argument stops and another starts.

fspeech5y ago· 5 in thread

Documenting mathematics rigorously is actually hard. For example see the Formal Abstracts (https://formalabstracts.github.io) project by Thomas Hales. Hales' other writings on the subject may be informative too: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-c...

CatsAreCoolOP5y ago

Thanks for the links. I was not aware of Thomas Hale's research even though I searched extensively to try to find related work.

His descriptions of what he wants to attain align with what I would like, a formalized way to describe what is known in mathematics.

He also describes how this can be used for machine learning, and area that I am also pursuing, but my results are still in an exploratory state.

Either way, I am really glad others are looking into this area too. I think it is something that will be very beneficial to the math community.

This is an area of my interest too, esp. if one can find something that is actually useful for a practicing mathematician. I mentioned Hales' project but based on their github project progress (a postdoc of his did some exploratory work) they seemed to have hit some road blocks, which is an example of this problem being hard. I don't know what your background is. If you are interested in further discussion I would be happy to p.m. over email.

This also reminds me of mmt https://uniformal.github.io/ The primary developer of that recently gave a talk about it here https://vimeo.com/421123419

jhanschoo5y ago

Also see: https://en.wikipedia.org/wiki/QED_manifesto

CatsAreCoolOP5y ago

I have read through this before, but thanks for sharing the link.

fdej5y ago· 2 in thread

Nice project! I'm working on something similar with http://fungrim.org/grim/ and http://fungrim.org/

Pretty cool. Are you aware that there is a content markup part of MathML https://www.w3.org/TR/MathML3/chapter4.html ? There are other efforts geared towards math authoring as well if that is your focus，for example people have authored open textbooks using https://pretextbook.org .

fdej5y ago

Indeed. I decided not to use MathML for various reasons:

1. The syntax is horrible (for writing by hand).

2. I'm more interested in computation than presentation, so something closer to S-expressions along the lines of Mathematica feels more comfortable to me.

3. The progress so far on defining semantics for content MathML has failed to impress me.

Concretely, I think if you are trying to do semantic mathematics, you need an implementation of the semantics in software, not just a hand-written spec. Ideally this would be backed by a formal theorem proving system. So far I just do randomized testing using symbolic evaluation and interval arithmetic. See this writeup for more details: https://arxiv.org/abs/2003.06181

xvilka5y ago· 2 in thread

Is it able to transpile to Coq/Lean/Agda/etc?

CatsAreCoolOP5y ago

It is already difficult to have a formalized math language with a level of formality that matches that found in papers.

The goal of MathLingua, at this time, is to match such a level of formality that is in math books and papers.

However, the syntax was designed to allow support for transpiling to a theorem proving language to be added at a later time. It is something, in the future, that I would like to do, if possible, but I don't have a time estimate if/when it will be available.

xvilka5y ago

I recommend to check also Decucteam[1][2] projects, they also convert between different formats. Even as they focus on mathematical logic mostly, it still can be useful to check.

[1] http://deducteam.gforge.inria.fr

[2] https://github.com/Deducteam

xiaodai5y ago· 2 in thread

It's cool and all but what's wrong with Latex?

Doesn't answer the "why" question at all.

CatsAreCoolOP5y ago

Thanks for the question. I have answered it in another question in the comments, and I'll update the docs to make the difference more clear.

magnio5y ago

Searchability I guess? LaTex is for displaying mathematics, whereas this looks like an AST for math.

jlelonm5y ago· 1 in thread

This is really cool!! I hope this becomes popular. Imagine having all theorems etc indexed like this!

CatsAreCoolOP5y ago

Thanks for the support! Yes, that is one of the goals, to make it really easy to find what is known in mathematics and contribute to that knowledge base. I have https://mathlore.org which is a demonstration of how such as system could work.

Vivtek5y ago· 1 in thread

This feels like "mathematics pseudocode" and I kind of love it. Indexable math Zettelkasten notes.

CatsAreCoolOP5y ago

Thanks! I wasn't sure how people would feel about the syntax, and I'm glad you like it.

bionhoward5y ago· 1 in thread

Would it be possible to put the theorem name (and possibly a description) at the top instead of buried at the bottom in a metadata block?

Theorem: “Fundamental Theorem of Calculus”

CatsAreCoolOP5y ago

I thought of this, but couldn't find a nice way to fit this into the syntax.

That is,

  Theorem: "some text"

is the same as

  Theorem:
  . "some text"

Thus it would be hard to distinguish a named theorem with one statement

  Theorem:
  . "some name"
  . "statement 1"

from an un-named statement with two statements:

  Theorem:
  . "statement1"
  . "statement2"

So for now I'm leaving the name in the metadata section, but I'm open to putting the title higher if I can come up with a nice way to do so.

syats5y ago

This is an awesome effort!

Regarding a comment on possible use cases.. One component that would really compliment this would be a programming-language specific checker of definitions. In the documentation's example of "integer" it would be straight forward to write in python a function that checks if a given variable is an integer. The same can be for other definitions such as "invertible matrix" or "set of orthonormal vectors".. other languages could provide such checks for more symbolic stuff like "commutative ring" etc.

Once you have this in place, you can load your nice data into a (e.g.) numpy.ndarray object, and checks would be triggered for all possible properties. This would automatically tag your data as, e.g. "stochastic matrix" and then you could even get suggestions of the sort of "build a time-discrete stochastic system based on this matrix?". This is would be very helpful for "data scientists" who are a bit far from theory.. at least I know I would use this very often.

There is of course the issue of expressivity to be taken into account. Much of mathematical progress involves invention of new notation, not all of which is neatly written in "left to right English with only binary operators" (think of commutative diagrams for example). While the language can of course be expanded in this direction, there will be a very big temptation (and I would guess frequent pushes from the future community) to change to more and more general languages. In this case, please always keep in mind the very expressive but useless notation of Russell & Whiteheads's Principia. There is a big theory of language and expresivity, and there are also practical examples (think of OWL or other semantic-web languages) of how to reach a compromise that allows for enough expressivity for the use cases of interest.

CatsAreCoolOP5y ago

A Google Group has been created to discuss MathLingua and Mathematical Knowledge Management at https://groups.google.com/d/forum/mathlingua-discuss.

Everyone is welcome to join.

j / k navigate · click thread line to collapse