The other thing is that W|A has trillions of data points and what I am only allowed to describe as the beginnings of a semantic network for inferring relations between them. It was a vastly overcomplicated system that was difficult to work with, so I am quite confident that some day there will be an open-source alternative that anyone can contribute to.
The other thing to take into account is how this affects the future. Wolfram Research thought they were going to "disrupt the calculator" (I heard this ridiculous statement once at a meeting). In reality, Wolfram Alpha queries are more often for the sake of fun than for the sake of discovery (I know this because there was a big TV in the break room that would keep displaying things that people searched on Wolfram Alpha). Is it really that useful to be able to have a computer give you an answer to "I have two apples, Jill has three apples. How many apples do we both have?"
http://www.wolframalpha.com/input/?i=I+have+two+apples%2C+Ji...
Or is it more useful to make something that can take in symptoms of your current ailment and tell you which disease you are most likely to have? Wolfram Alpha does this as well.
http://www.wolframalpha.com/input/?i=I+have+a+fever+and+a+ru...
Although the results are difficult to interpret. In my time at Wolfram Research, I was certainly convinced by the idea of knowledge engines and their ultimate emergence, but I think the way this will be accomplished is in more in a Google-esque fashion where their knowledge engine results are displayed alongside a real search algorithm. Best of luck to the people on this project, I hope you make the first step into creating an open source knowledge engine.
http://www.wolframalpha.com/input/?i=I+have+two+mangoes%2C+J...
Doesn't work, even though Wolfram Alpha knows what mangoes are enough to give me the nutritional value of three of them as its response instead.
'Oranges' as the noun works quickly. 'Pens' doesn't work at all, so it's hardly surprising that 'doowats' also fails. Surprisingly, even 'pears' is a fruit too far.
On the other hand, "I have two apples and one orange, Jill has three oranges. How many oranges do we both have?" works very well.
So I have to ask, is this just a trick? I mean, did you program it to handle apples and oranges specifically, without attempting to do any sort of semantic comprehension? Because this is a big failing of Wolfram Alpha for me. It 'knows' things but it doesn't know them. It knows that a mango has 84 calories but not that it's countable. Or perhaps it does know that and that knowledge just doesn't propagate to more complex queries.
I have two mangoes, Jill has three mangoes. Alan has
three oranges, and I have two apples and a carrot. How
many mangoes do we have? How many fruits do we all have?
How long can we survive?
One would think it did some basic parsing, and managed to file away three oranges and two apples in a parse tree of some kind -- and if those were associated with "fruits" -- it should be able to answer?And if it were able to answer, one might be able to have it hand out advice on diets ("Give me 30 examples of a 3000 calorie diet featuring no red meat") and a lot of other things that would be "easy" to answer based on ingesting some pretty standard databases.
1. Word problem solving is shoddy at best. There is a lot of functionality of Wolfram Alpha that exists for that minuscule chance that somebody will actually query it, so queries matching "[person has object]*, what is the total" is easily understood but it fails when the database doesn't have "mango" or the plural "mangoes" tagged as an object. This is largely due to poor database design decisions made in the past, but I know that this will improve in the future based on the project I worked on in improving the standard data format.
2. Some functionality is literally only for the sake of demonstration. I remember one time when Stephen was demonstrating a variety of cool queries into Wolfram Alpha during the yearly all-staff meeting, and when I got back to my desk I tried all the same queries with slight variations and almost all of them failed.
3. Mathematica has powerful string manipulation and regex matching functionality, to the point where lazy engineers are easily tempted to join the dark side. So it is very possible that this particular word problem only works because of a regex match. I know it sounds crazy, but I honestly wouldn't be surprised if some lazy engineer added in a literal match for "([subject] (have|has) [number] (apple|apples|orange|pear|peach|peaches))+", fed it into a simple extraction function, and output the result.
In advanced math classes (upper undergraduate or graduate) it is almost impossible to check results or do a complex operation on a simple calculator.
In many cases I have to turn to a tool like Mathematica/Wolfram Alpha. I have a W|A Pro account and it has worked wonders for me. For example, entering "integral from 0 to infinity of (ye^(-y)((-1/y)(e^(-t))+1/y)) with respect to y" into Wolfram Alpha is so much easier than doing the same with a TI-89. I can copy/paste and adjust very easily and the software produces multiple interpretations/representations which is very useful.
It took me a minute to realize that during that time, every college student is on vacation.
I'm working on this too, but I'm tackling it from the non-math side (ie, NLP+Knowledge Graph).
one of the most challenging technical problems we faced was free-form input
Yes, it's a horrible problem(!) I'm using Quepy[1] (which in turn uses NLTK), and it does a decent job. It's still not automatically general purpose (you need to write code to map classes of queries), but it can handle questions like "Who directed The Social Network"[2].
The other thing is that W|A has trillions of data points and what I am only allowed to describe as the beginnings of a semantic network for inferring relations between them. It was a vastly overcomplicated system that was difficult to work with, so I am quite confident that some day there will be an open-source alternative that anyone can contribute to.
This is interesting to me (for obvious reasons).
Can you expand on what made it so complicated (I assume beyond the standard RDF-style inference engines)?
[1] http://quepy.machinalis.com/
[2] http://quepy.machinalis.com/#Who%20directed%20The%20Social%2...?
One great decision the Wolfram Alpha people made was to put together an excellent set of internal documentation on how to add new parsing capabilities to the language. So suppose you were tasked with adding queries about something like pregnancy data, you would just write a fairly straightforward module that would capture queries like "I am 6 months pregnant" and return a list of pods (a pod is the computed interpretation of your query, most Wolfram Alpha queries will return at least 5 of them). For pregnancy data, there is a pod that shows you how big the fetus should be, another for how much amniotic fluid there is, and so on. You would then write some Mathematica code to either scrape a website with pregnancy data or integrate with some data set that was curated by a data curator. This is not difficult to do, and I know of several WA-like projects that have accomplished this already. The problem with this is that data gets siloed, and data curators have a weak standard for how data and its relations should be expressed.
This leads to difficulty arising when you are tasked with handling a complex query like "Which country has the greatest ratio of population to GDP?" Now you're talking about interoperability between two data sets, and although it can be done quite easily using Mathematica's CountryData function:
In[1] := First[Sort[# -> CountryData[#, "GDP"]/CountryData[#, "Population"] & /@ CountryData[], Last[#1] > Last[#2] &]]
Out[1] := "Monaco" -> 226860.
... it is nearly impossible to handle these kinds of situations for general queries that could ask about ratios of anything. A possible solution was for some time to make "ratio of population to GDP" a column in the database table, but ostensibly this leads to an exponential explosion of columns if you are trying to answer general queries.By the time I had joined the Alpha team (after working for 2 years on Mathematica) they were already moving some of their most poorly designed data sets into a much better system that used a more rigid set of standards for describing things, places, concepts, relations, etc. I wish I could elaborate more on this because it was really very cool technology running in the background, but Wolfram Research has a real track record of suing people who violate their NDA (Matthew Cook). What I can say is that it fixed some absolutely ridiculous database design decisions - for example, in one table storing athlete performance, there were multiple rows for athletes who played multiple years where the name would be BabeRuth1942, BabeRuth1943, BabeRuth1944, and so on. It was then up to the developer to know that the name and year need to be separated, and that their code needs to handle athletes who play one year and multiple years separately.
tl;dr: Don't over-glorify Wolfram Alpha - it gets things done, but the poor performance and unpredictable results are caused by bad planning and poor organization within. If I was going to make my own knowledge engine, I would spend a long time drawing up an incredibly detailed schema about how every single thing would be represented and how a developer would write a new module for it before writing a single line of code. These are the lessons gleaned from spending two and a half years wallowing in a Big Ball of Mud (http://laputan.org/mud/).
Have you looked at any of the research to come out of http://start.csail.mit.edu/index.php?
I've also been working with Quepy a bit lately. Very cool stuff. Are you able to comment at all on what you're working on, or is it "super secret stealth mode" stuff?
For us, we already do semantic concept extraction using Apache Stanbol, against content that flows into our enterprise social network product, and then store the associated triples in an RDF triplestore. We have a primitive search feature exposed, which lets you query using SPARQL, but realistically, we know "normals" will never, ever, ever, ever write SPARQL queries, so the big push is to do automated translation from natural language (even if it's a slightly restricted natural language) into SPARQL so users don't have to think about triples and what-not.
If you're not in super-secret stealth mode and ever want to compare notes to talk about this stuff offline, feel free to shoot me an email.
I would expect the significance of queries on any free engine to adhere to a steep Pareto distribution. For what it's worth I, as an algebraic thinker, find Wolfram Alpha's symbolic manipulation and easy syntax refreshingly useful.
1. factor the number 100: Bing intercepts it and gives you 2^2 x 5^2, while Google doesn't
2. I have two apples etc.: neither one handles this specially
3. I have a fever and a runny nose: Google intercepts it and suggests several ailments (see https://support.google.com/websearch/answer/2364942?p=g_symp), while Bing doesn't
Note that lines of Mathematica code tend to do a lot of processing, so this would be the equivalent of many times more lines in another language. It is quite an interesting process hooking in a new feature to Wolfram Alpha, and some developers described it as the "mud bowl" because when you break things, you just had to throw more mud at it.
I'm not allowed to disclose details about the technology stack, but I can say that the database querying functionality was kept separate from the actual parsing and semantic analysis, and was implemented in a different language.
If you're interested in NLP, which by the way is a wonderful and exciting field with mysteries abound, Mathematica is indeed a great way to get started quickly. Although I recommend to everyone to do their work in an open-source-able way with a popular language like Python or Java, I built a Swahili translator during my freshman year of college with Mathematica. Here it is on github:
https://github.com/keshavsaharia/Swahili-Translator
Best of luck, feel free to shoot me an email at the email address listed on my github if you have any questions or need help getting started.
For me, this is because WA starts asking me for money when I try to use it for useful things. I suspect an open-source version will have different numbers.
http://www.goofram.com/ does exactly that, albeit somewhat crudely. You can get some unexpectedly interesting stuff on the Wolfram Alpha side of things, though usually it's irrelevant unless you're "thinking" in Wolfram query mode.
The attempt is greatly appreciated. I can fully see that this is a hard problem.
I would like to see an open source W|A in Julia instead of Python. It could be a great addition to e.g. Wikipedia.org
If I were to write a Wolfram Alpha alternative in Julia right now, I'd do all of the symbolic manipulation using PyCall and sympy.
I used it to simplify some kmaps for class the other day and it's very nice: prints out a truth table and various types of minimal forms.
As a Mathematica and WApro user, I find most of the utility in not having to import random datasets myself. As every researcher, I have a disgusting library of scripts that often involve curl, groovy, awk, sed, etc., to pull info into mma. It's nice when that becomes SEP[1].
http://www.wolframalpha.com/input/?i=+%28x+%E2%88%A8+y%29+%E...
[1] Someone Else's Problem
http://www.sympygamma.com/input/?i=5+gallons+%2F+12+fl+oz.
http://www.sympygamma.com/input/?i=how+many+calories+in+a+cu...
I wouldn't call it an "alternative" just yet. (And I'm an ex-W|A employee).
http://www.wolframalpha.com/input/?i=%28Calories+in+2300000+... https://en.wikipedia.org/wiki/Boston_Molasses_Disaster
So you can do unit conversions with all sorts of weird facts, like the nutritional content of molasses.
The important thing to note about SymPy Gamma is that it does only the mathematics part of WolframAlpha. It's also relatively new. There is no natural language input. There are no non-mathematical capabilities. The syntax should match Python syntax for the most part, though there are extensions to allow things like "sin x" or "x^2" or "2 x". All this will hopefully improve in the future (and pull requests are welcome!).
Most of the code was written by David Li (who is actually a high school student). You can watch a presentation about it here: http://conference.scipy.org/scipy2013/presentation_detail.ph.... It started out as a "because we can" toy, and it's gotten much better.
The real benefit of SymPy Gamma over WolframAlpha is that there are no barriers around it, since it's entirely (BSD) open source. For example, if you start computing something interesting and want to try more, you can move to SymPy Live (http://live.sympy.org/) and compute in a more session like environment. Or you can use SymPy locally on your own computer.
Regarding the comments that wolfram is mostly used for play, I'm not so sure about it. Wolfram is invaluable to students as a calculator. Sure Google can compute 100 * pi, but it falls apart when you try to compute integrate(sin(x) * x, x). When I was in college (which was last year), I saw people use it all the time. It's been very successful in making computer algebra accessible to virtually everyone.
By the way, probably the best feature of SymPy Gamma right now is the integration steps. See for instance the "integral steps" section of http://www.sympygamma.com/input/?i=integrate%28sin%28x%29*x%.... This is a feature that used to be free at WolframAlpha, and it's extremely useful if you are learning integration in calculus. It doesn't work for all integrals, because not all integrals are computed the way you would by hand.
Please tell so clearly so you don't get bad credit for non-implemented features.
I think this would do much better without the comparison to Wolfram Alpha.
I wrote a python bridge, which was actually pretty cool. It's probably the neatest, cleanest, most CS-y code I've written (it converted Python objects to Mathematica objects over MathLink. It integrated with Numeric Python.
https://code.google.com/p/pymathematic/source/browse/trunk/e...
I'm a Linux user. I have bc and units installed. I even have some shell script wrappers to make those utilities actually helpful for casual use. I can open a terminal and calculate expressions and convert units...so long as I ask nicely. The big win for W|A is that it doesn't require me to ask nicely. This is helpful for quick 'n' dirty queries as well as for queries where the work required isn't in doing the calculation so much as reducing the query into a simple expression in the first place.
In other words, simpygamma solves a problem that by and large doesn't exist.
But it'd convert to SymPy for evaluation.
In the end I called it Calculize and evaluated it with JS instead.