If you just want a bash script to look things up on wikipedia, you can always use something like
function wp { curl "http://en.wikipedia.org/wiki/$(echo "$@" | tr ' ' '_')" | gunzip | html2text }
which will work for basic queries (needs url encoding and words to be properly capitalized).
A full api reference is here (http://en.wikipedia.org/w/api.php).
Thanks for bringing this to my attention. I've added a disclaimer to the GitHub page regarding Pywikipediabot and plan to make changes to fully comply with MediaWiki API etiquette. The last thing I'd want to do is inadvertently cause problems for the site or foundation.
The wikipedia data is hosted here: https://bigquery.cloud.google.com/table/publicdata:samples.w...
Here is a sample query, searching for all articles that start with Positive:
SELECT id,title FROM [publicdata:samples.wikipedia] WHERE (REGEXP_MATCH(title,r'^Positive*')) LIMIT 10
Query complete (2.0s elapsed, 9.13 GB processed
1| 464347| Positive airway pressure
2| 10008223| Positive behavior support
3| 464347| Positive airway pressure
4| 1354851| Positivism in Poland
5| 1023857| Positive set theory
6| 5154273| Positivism dispute
7| 2871407| Positivism
8| 17179765| Positive psychological capital
9| 9033239| Positive Action Group
10| 4163012| Positive K
Here is the python API documentation: https://developers.google.com/api-client-library/python/Are there any apis for other languages ? tried to query using unicode strings and it worked but I only got English content.
Disclaimer: I'm the main author, and there's other implementations, but this seems to have become the most popular one.
- https://github.com/dcramer/py-wikimarkup (coverts wikitext to HTML using Python, would need to extract text with BeautifulSoup or something
- http://wiki.eclipse.org/Mylyn/Incubator/WikiText (also to HTML, but in Java)
- https://github.com/earwig/mwparserfromhell
I'm sure if you did a bit more digging you could find a C# library that does this, or you could roll your own pretty easily using the others as a model.
So, I wrote one that was dead simple, but have only implemented the API features I needed, which is mostly related to deleting things, as I use it for mass spam deletion. If you're interested, I'll upload it somewhere, but like I said, it's very incomplete.
This is Hacker News at it's best. Highlighting creation.
Interesting to note that this was submitted a few days ago and received only 3 points - I mention this because this is the sort of thing that should have made the front page the first time it was posted!
I'm thinking of writing a weekly post that highlights the things people created and posted to hacker news to deaf ears (thankfully this is not the case with this post!).
In fact I'm going to go and write it now (and create!).
Edit 1: If anyone is on Medium, here's my draft.
https://medium.com/p/e394f6d917d3?kme=collabEmail.clicked&km...
Edit 2: And while I have the chance for something to not to fall to dead ears, here's something I just wrote that would be interesting to anyone who's annoyed with recruiters and would rather work at SpaceX than Snapchat.
https://medium.com/p/de5c73174a4e?kme=collabEmail.clicked&km...
Also, I wistfully disagree with you about the "highlighting creation" bit. I mean, FSM knows I want to agree with you but my experience so far says otherwise. In all the time I've spent on HN (most of it as a lurker) I've found HN to be quite snobbish about the show-and-tell attempts.
Then again, maybe I am experiencing sour grapes since my own Show HN posts seem to disappear rapidly even before I can say, "Hey HN, loo-"...
:(
I'm thinking of creating an HN spin-off for young, upcoming devs to do a Show-And-Tell about their recent attempts at learning/developing. Heck I've been dying to give discourse ~~(the django-based discussions platform)~~ a try, maybe I'll finally get around to it now. In fact, I'm going to go and write it now... (Sorry, couldn't resist. ;) Not meant as a dig.)
Question is, should I do a Show HN, when it is done? :P
EDIT: Turns out discourse is rails-based, not django-based. Still gonna give it a try, I guess... :(
I'll work in some changes tonight. Let's start with PEP8, shall we? :)
Also, if you have a better way of doing it, please totally fork and request a pull!
To engineers who would like to use this library, I would give a caveat that there are way more API actions than the ones enumerated here. So if you're looking to make some contributions to a project, this one is rife with possible pull requests.
In terms of article access and analysis, I'd recommend looking at Pattern (https://github.com/clips/pattern) before starting with this library. Not only do you get access to the rest of Pattern's IR/text analysis capabilities, but the approach in Pattern is written to support any site built off of MediaWiki, and not just English Wikipedia. This means not only foreign-language Wikipedia instances, but all 350k wikis on Wikia, and Project Gutenberg (which, interestingly enough, runs MW 1.13).