>>> r = Recurrence('June-July 2014').intersect(Recurrence('Monday to Friday'))
>>> datetime.date(2014, 6, 13) in r
True
>>> datetime.date(2014, 6, 14) in r
False
If anyone is interested in hacking on it with me, send me an email and I'll try to get it up on github.It didn't understand if I used `afternon`. Perhaps as an improvement it could try selecting the most likely word from misspellings.
On iOS, "tomorrow at 6" is parsed as 6PM instead of 6AM. This makes sense because usually people really mean 6PM. This is context dependent--in chat logs etc this is desirable.
Semantically speaking, the Duckling library does the right thing by parsing it at 6AM, but if the goal is ultimately to parse human expressions, then the iOS approach is probably better.
EDIT:
Another issue I ran into is that it correctly parses:
"tomorrow evening at 6"
but fails with:
"see you tomorrow evening at 6"
It would be nice to pass it the entire sentence since that's how most people will intend to use it.
From the Limitations section:
> ... we only display the closest upcoming time, if any, or the closest past time otherwise. It can result in surprising outcomes, like “one year after Christmas” will be actually analyzed as “one year after last Christmas”
So this could be the interaction:
> User: "one year after Christmas"
> Computer: "OK, one year after last Christmas" // putting emphasis on what could be ambiguous
> User: "no, after next Christmas" // the application expected that next vs last could be ambiguous, so this is understood correctly
EDIT regarding 6pm vs 6am, applications will have control over the assumptions made by the system. See https://news.ycombinator.com/item?id=8397113
I also like that this project was attempted by the layman (no offense intended). I feel that a lot of academic projects have this "if you haven't been studying ngrams for 20 years don't bother" feel to them, and people don't seek to deeply understand, instead of just handwaving "somebody smart thought of this". That kind of thinking reduces new thought in a given field.
Will be using the library in my personal projects for sure, extra points for using Clojure (in my book), as I've been recently learning about it and getting into it.
This is probably not the case everywhere, which is why duckling uses next and this interchangeably.
I actually bumped up against legacy time/date issues while working on SSL cert parsing. An old Perl interpreter's 32-bit limits kept resetting my dates! Rather than upgrade perl or my architecture, I wrote my own perl methods to calculate infinite time (sorta?) on 32-bit systems with old perls.
For those that haven't worked with date parsing before: timezones are surprisingly complex, leap years are stupid, daylight savings is really stupid, and leap seconds are impossible without a regularly updated leap second database (similar to timezones, but worse). (The math to calculate dates correctly is rather simple, but you need to be pretty good at math to optimize it) https://github.com/psypete/public-bin/blob/public-bin/src/ne...
Feedback: I shared it to a friend and his reaction was "bah, it doesn't even work with the example suggested".
Meaning, he saw the placeholder and pressed enter.
It could at least detect that the input is empty instead of saying it failed to parse the input!
As an aside, I noticed it was renamed from "Picsou" (https://github.com/wit-ai/duckling/commit/0d9f666ae4da114803...)
Were you worried about getting scrooged by Disney? :)
The original name was Picsou (Uncle Scrooge's name in French) because the parsing strategy is super greedy. We liked the name, but when we decided to open source it we thought it may be hard to pronounce, so we switched to Duckling (keeping the duck link...).
tomorrow at three thirty people are coming over
you may be a half hour late.[1] https://wit.ai
It can be:
tomorrow at three thirty, people are coming over
or:
tomorrow at three, thirty people are coming overI have looked at SUTime, HeidelTime, natty and some others. I am trying to parse (among others) expressions of the type "the first week of the previous month", "The last week of September". The only library that can parse this type of query is SUTime.
Can you comment on why you implemented a home grown solution instead of using SUTime or some other library readily available. Have you measured the performance of Duckling vs the state of the art in temporal tagging ?
Duckling seems very well made with good docs but unfortunately for me will be hard to make work on Android.
- To my knowledge SUTime only supports English
- We wanted something that's easy to extend. SUTime is somewhat hard to extend, especially if you are not into Java
- We needed not just temporal expressions, but also monetary data, temperatures, quantities...
That being said, Duckling is still young and certainly not as proven as SUTime yet.
"ie." means "that is" (as in "restating...")
"eg." means "example".
I wrote it to parse dates and times in news articles and blog posts. Still a work-in-progress, but someone might find it useful!
We've mostly used it on short sentences, but it should work on larger inputs, like articles. I'd recommend splitting very large inputs into sentences though.
Would it be possible to port this into JavaScript using ClojureScript and use it on the client side?
Unless you are checking my IP address to guess the best convention...
You can also try the same sentence in French, you'll get February first.
What surprised me was "1-2-2014":
From Thursday, 2 October 2014 at 1:02:00 +0000 (UTC)
to Wednesday, 1 January 2014 at 0:00:00 +0000 (UTC)
On top of the "where did it get those timestamps from", time flows backwards in that interval.Oh yes, loads of them. Lots more than use m/d/y anyway. See https://en.wikipedia.org/wiki/Date_format_by_country
Canada looks the most hellish, eg: "Immigration Canada Stamps use DD/MM/YYYY and Canada Customs Stamps use MM/DD/YYYY." eek!