We've been working on something similar, but the opposite direction (for world trade data). Instead of trying to NLP our way out of the problem, we pre-generate and index a bunch of possible questions, and let full text search handle the rest.
It's interesting, because theoretically NLP should be able to "understand" what you mean but in reality I find that even if you parse sentence structure and extract some meaning, you're still at some level hardcoding the possible things that can be queried into the code.
So it's a neat tradeoff of whether it's more worth it to create a mini query language, or go full natural language, or go somewhere in between.
...
Anyway, try it out by clicking on the title (keeping it a bit hidden for now for testing purposes): http://atlas.cid.harvard.edu/explore/tree_map/export/usa/all...
Things you can try (mix and match too!):
- "wine italy" - "france" - "germany spain" - "germany export wine 2002 to 2012" - "turkey feasible"
...
If you want to see the code, check out our github:
https://github.com/cid-harvard/atlas-economic-complexity/blo... (search view)
https://github.com/cid-harvard/atlas-economic-complexity/blo... (indexer)
Apologies for any mess, I recently joined and we're undergoing a huge overhaul right now.
a. have structured queries that can be precisely parsed
b. provide query completion to guide the user while entering the query
It turned out quite well, if I say so myself. But I never got the time to market it.
It can be seen in action here: http://nlq.lavadip.com/servlet/demo
Not that it's a bad application, it's nice, it's just if one extended this model, one would wind-up with a query language, not something new.
This is a good point.
One thing that I'd claim is that NLP is less useful than you'd think for single queries. I'd just suggest the thought experiment of being able to ask a highly knowledge person one question with no follow-ups. Even someone with a mastery of English and the data probably won't start out with the same idioms and approaches, so you'd have to phrase the question carefully and exactly ... just the way you have to spend a lot of effort writing a SQL or similar query.
If you can interact with such a knowledgeable person, you'd learn their expression style and they'd learn yours - after some interaction one has a very powerful effect. Until then, things are rather limited.
Small issue: you're calling Google fonts over HTTP, but your site uses HTTPS, so Chrome is blocking the request. This leads to your fonts rendering in the default (Times New Roman here).
[blocked] The page at 'https://www.pennywhale.com/' was loaded over HTTPS, but ran insecure content from 'http://fonts.googleapis.com/css?family=Montserrat:400,700': this content should also be loaded over HTTPS.
Have to say, love the logo. Think it would drop into a favicon nicely too..
Congrats
- Found it a bit slow, but guess that's partially the HN effect.
- As per above, I'd make the queries ajax rather than refreshing the whole page every search, should improve the UX a fair bit.
- I expected autopredict when I started to type $... to predict a ticker, would be nice to guess company names as well if no ticker is entered. Probably an essential feature for non-pros.
- You need a clear signup link. And the redirect when you run out of free credits should go to the signup page with small link to login, not the other way around.
- Also look at highcharts.com, they have a stock chart product as well. Hands down the best JS charts, mobile friendly, highly customisable etc.
- none of your query results have a date attached! While pretty, an earnings number without a date is like a chart without an axis.
- Just how natural can the queries be? "$AAPL sales estimates" did not return anything. Neither did "$CAT cost of capital". "quarterly $CAT earnings growth" gave me a single EBIDTA number", e.g. not what asked. Is it easier to start with simple dropdowns?
- Why the stocktwits $ prefix? Why cant a company name be used?
- Another "sad face" feature - every not found/no results query takes away a "free query" :(
On more positive note, things look really clean - good job on the design! A great start. Is the long-term idea here to build an accessible place for fundamental data research?
What value added are you providing in your premium product (aside from heatmap) over just going over to EDGAR archive?
Ping me directly - would love to provide more feedback!
Edit: fix formatting :)
Aside from as-of date on each datapoint, ideally you should provide attribution as well (anyone who worked in finance knows that all data sources provide different data for same query at any given time :)
This is really cool. One idea is it would be great if you could interface with EDGAR and gather the links to the various financial statements. The EDGAR database search/navigation is horrendous.
As another commenter said, it would be great if it could detect the company name or at least the ticker without the $.
Also, a failed query subtracts from the guest query allotment. That doesn't seem ideal as I'm first learning how to phrase my requests.
Lastly, mutual fund data would be awesome. Morning star rating, fund performance, fund assets, etc.
We're definitely looking into interfacing with EDGAR as well as getting rid of the 'cashtag'. Mutual fund data is a great idea too - shouldn't be too difficult to tie in.
I get:
"We're sorry, but something went wrong.
If you are the application owner check the logs for more information."
Is this due to the usual HN overload? or did I hit a bad spot?
This looks extremely interesting!
Would that be too hard to implement? I would love being able to pull ratios out of (almost) any arbitrary data.
I can already do this stuff in it (free) and just use the built in GoogleFinance calls (and their limit is 1000 per sheet).
Plus I have all the other advantages of a spreadsheet program.
=GoogleFinance("GOOG", "price")
may not be all that complicated. Plus it helps that it's free.Examples:
Function list - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydEV...
Watch list - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydG5...
Watchlist + holdings - https://docs.google.com/spreadsheet/pub?key=0Ault2FD3uBwydDl...
From what I've seen, people in the financial services industry have gotten very good at using the Bloomberg Terminal commands to examine this kind of data.
https://www.pennywhale.com/app/queries/execute?utf8=%E2%9C%9...
perhaps check out http://www.premise.com/ or https://kensho.com/
Can I spin off a search and get pinged on updates?
Love to learn more.
We'd love to chat more and get some feedback! My email is jay@pennywhale.com
Two bugs: "$brkb book value per share" gives a number that's way off. "$brk.a book value per share" and "$_x EV/EBITDA" yield apparent server side errors.
What's a real question like? here are examples which your system doesn't even try to handle.
* companies in S&P500 with P/E less than 15
* companies in S&P500 with more than 20% of their revenue coming from China
* $TSLA Beta with technology sector (returns the wrong answer).
* https://www.pennywhale.com/app/queries/execute?utf8=%E2%9C%9... (this produced a serious error on your server)
If you want to answer trivial questions your product will fail. nothing you've done isn't done better by Google RIGHT NOW.
Not trying to discourage you but this just doesn't solve any existing problem. All you have is a pretty interface to a tool that does one thing poorly.
If you can actually solve 15% of the real problem with converting natural language questions into financial queries and return accurate, well formatted data, you will have a serious product. Right now you are nowhere close.
PennyWhale lets you search, manipulate, and compare financial data with natural language queries along the lines of "how much cash does $GOOG have" and "show me the PE ratio for $AAPL".
Seems like a really good research tool!
how much cash does $APPL have vs $GOOG