In 2014, I saw a demo of the original Discovery Advisor, which was at the time the closest commercial equivalent to the "Jeopardy system." This demo took in Wikipedia as a corpus, and a question was asked: "what country produced the greatest amount of wheat in 2012?" The system returned a list of countries as answers, so it wasn't quite nonsensical, but it was clear the answers were incorrect. The answers were countries like "England," "Norway," or "Zimbabwe." This system also returned passages from Wikipedia as supporting evidence, but the passages weren't about wheat production. Instead, they were about quotes that contained the word wheat... such as "let's cut the wheat from the chaff."
So of course, some smart-alec in the room Googles the same question, and this was before Google had the ability to return factual answers to factual questions, so instead we got a list of web results. The top result, interestingly, was a Wikipedia article titled "Wheat Production by Country." Opening that article presented a table that clearly showed that China produced the greatest amount of wheat in 2012.
Unfortunately, that Watson system at the time didn't read information from tables. I'm not sure if it does now, but I do know that reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system is quite difficult. I'm not as focused on the space as I once was, so I'm not sure if the problem has been well solved yet. If not, I'd say it's a worthy area to invest in a solution.
I saw a presentation on this paper at SIGKDD this year. https://dl.acm.org/doi/10.1145/3394486.3406468 "Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web"
Google's TAPAS system deals with natural language queries on tabular data:
https://ai.googleblog.com/2020/04/using-neural-networks-to-f...
There are other strands of research too - just finding which tables are relevant to a query is a real problem.
what country produced the greatest amount of wheat in 2012
If you ask the suggested
country produced most wheat
You do get the table you talk about.
In R you can read data from tables like this:
df<-htmltab::htmltab("http://en.wikipedia.org/wiki/Upper_Peninsula_of_Michigan",3)
In google sheets =ImportHtml("http://en.wikipedia.org/wiki/Upper_Peninsula_of_Michigan","table",3)
In Python+Pandas df=pandas.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)[0]I also don't understand the incentive for users to contribute to a knowledge base that is then being sold: https://golden.com/pricing
This was also why I could never really see this becoming a serious product. It’s an SEO trick for ICOs looking to hype themselves, a far cry from a knowledge base. Mismatched incentives will screw up the value prop.
But in general, Golden seems to have a very strong tech bias - which is fine, but limits what it can be used for.
One of the quotes from the home page - supposedly written by a person who's tried it out - is as follows:
"18 years later, a startup to take on @Wikipedia --> @golden" [0]
This really confuses me. What exactly would it mean to take on Wikipedia in a meaningful sense? You'd have to build the community of Wikipedia with the same kind of values and ethics governing it, then you'd have to get the novel software (the "AI" thing they claim to be using) implemented in a way that doesn't violate said ethics, nor lowers the quality of the information.
Seems like it'd be much better just to work towards putting the AI into use at Wikipedia. Whatever innovation does end up resulting from this company, it will likely get lost if/when it gets acquired, so in the end few if any people will be able to enjoy its benefits.
When looking at Golden's value prop, it becomes clear that Google has actually been somewhat, ehm, lazy when it comes to making search better, relying almost 100% on UGC to provide answers instead of trying to structure them in a concise way.
Very curious to see where this leads!
Not saying what you described doesn’t happen, but both are problems.
Looking through Golden's website they seem to want to do all of the same, but using their own (also user-contributed) content, aiming to make it accessible and valuable enough that companies will pay $1000 per month per employee (!) for it. I know almost nothing about the product so will hold off too much judgement, but that sounds like a pipe dream.
Well they took VC. So it won't lead anywhere interesting other than value enclosure and exit to surveillance capitalism.
For example on the top topic AI most of the sections are unfilled / or lightweight aside from a list of companies with minimal contributions over the last few months: https://golden.com/wiki/Cluster%3A_Artificial_intelligence-J...
There is nothing on AI ethics (indeed this is the only thing on ethics: https://golden.com/wiki/Ethics-BJW8)
Looking at some comments above that maybe crypto was the angle I looked at a few articles and under Ethereum it says that the Constantinople release date was still to be determined (actual release date was 28th February 2019).
It seems there is a decent amount of company data in there (a la pitchbook, crunchbase), but in terms of practical, useful knowledge that is authoritative per the front page, what are some good examples now there?
[1] https://medium.com/bloated-mvp/golden-is-a-bloated-mvp-27971...
> our vision to build an extensive database and graph of knowledge for humanity, including practical commercial tools and community features to aid discovery and decisions.
So you, and your, what? half dozen phds? want to produce a graph of human knowledge. Okay fine. That's a lofty goal, let's set you up like Harry Seldon and check in on you in a thousand years. Oh! You're going to do the almost impossible and have practical commercial tools in our life time. Ok.
Look, I'm not saying that Golden is going to be unsuccessful, it's probably going to be very succesful, they've got those guys that backed that misogynistic online frat house behind them, so there's a certain level of assured successs. I just question why blatantly lying about your goals is a pre-requisit for funding in silicon valley.
This is taught in business school. It’s not a Silicon Valley thing.
Now what, without marketing speak, does it do?
Perhaps their CEO will have the moral fiber to sign a commitment guaranteeing a nontrivial (eg. double digit percentage) of annual expenses (not earnings, I'd wager this is a long way from generating profit) going directly to open source database projects they draw from. I'd wager not.
Searched for Falcon, apparently it's a company in the AI industry, has a website falcon.com.cn and is a genus of birds.
Also searched "Apple", got the company, good knowledge base. The fruit Apple is a page that says it's a fruit tree, with the CEO Tim Cook, former CEO Tim Cook, and Timothy Cook.
It seems to just be completely wrong, minus maybe a 100 articles.
I feel like basic encyclopedia information should have at least been pre-populated.
A search for Merkel delivers an arms company before Angela Merkel, GDP delivers some companies.
Relationships between persons don't seem to be present.
A search for Mercedes Benz doesnt deliver anything too great. Snowden requires an Edward to find him, NSA is a company, Chrome has no info.
Maybe I'm looking for the wrong terms, but it seems like they basically just imported companies from public domain and some random stuff on the side, which mostly is just the title of something.
It'll end in a sell-and-bury exactly as Freebase did, for exactly the same reason: venture capital + knowledge service = only one possible eventual outcome. It's always just a matter of time before the money corrupts the service. The demand for an exit / return (outsized at that, typically) by the owners who have put up a large amount of money forces the matter. Now that big venture capital controls them, they have to pursue revenue and profit as their long-term primary goal for existing, rather than knowledge being at the center of the mission (initially they'll pretend knowledge is at the center of their mission, that will pivot as the return pressure builds on them over time).
When's the IPO? But but but we're a knowledge service, we're here to help humanity. Where's my return? When do I get a 1,000% return on my $10m? But but but we're a knowledge service, we just want to spread knowledge for the betterment of all. Breaking news, July 2024: Golden purchased by Verizon Media [insert big corporate swamp monster here] for $586 million in a fire sale. July 2026, Verizon Media quietly buries Golden.
Andreessen in particular seems bent on driving as many interesting knowledge concepts into the ground as he can. His magic knowledge service touch was all over Rap Genius as well (with dreams of annotate-everything going back to the Netscape days [1]).
There hasn't been a single prominent knowledge service in the history of the Web that has escaped destruction once they've taken big venture capital, except for Stack Exchange and they're starting to teeter on the edge where the owners start to push it in a way that begins the rolling corruption phase (with Stack that inevitable process was delayed for a long time by the influence of its founders and the decisions they made, but eventually papa VC wants his fat return).
The only for-profit knowledge services that survive with their soul intact, are slim independent operations like wikiHow that are not commanded by venture capital and the never-ending need to force an exit.
[1] https://genius.com/Marc-andreessen-why-andreessen-horowitz-i...
"But that's just the start. It turns out that Rap Genius has a much bigger idea and a much broader mission than that. Which is: Generalize out to many other categories of text... annotate the world... be the knowledge about the knowledge... create the Internet Talmud."
"Back in 1993, when Eric Bina and I were first building Mosaic, it seemed obvious to us that users would want to annotate all text on the web"
Bullshit.
But all that being said, there's always at least a chance that the organization somehow bucks the trend. Or, even if the organization eventually becomes dominated by the profit motive in the long run, that's not to say that it won't build really beneficial things before that happens. Freebase eventually sold and stopped maintaining it, but it built a free database that anyone in the world could use (and still could use). It pioneered a concept. I don't know what Rap Genius is up to these days, but I thought their annotations ux was really innovative and I'm sure pioneered a whole lot of other sites. So even if an organization's mission eventually takes a back seat to profit, it can create ton of value along the way.
Personally I find this startup very interesting and am excited to see where they go.
I agree about the corrupting influence of VC. The following isn't a super popular opinion on HN lately, but this is exactly why I've been a believer in Medium since they launched their subscription service. It's the rare startup where I could see their financial incentives and also think those incentives would be good for me as a reader. They made the knowledge the product and removed the incentive to use the knowledge as a sales pitch for some other product, i.e. content marketing. And they have to constantly push for articles that qualify as subscription worthy. That means focus on quality. I don't think they've tipped over yet, but what I've seen so far is that the more subscribers Medium gets the more they spend to get better and better articles. And as the payouts to authors get better, better authors come on board.
If Golden had a harder-to-type domain name, would they get the same level of momentum and SEO juice?
Is there is any data on this?