RNN-Based Handwriting Recognition in Gboard (opens in new tab)

(ai.googleblog.com)

109 pointsrey12rey7y ago21 comments

21 comments

20 comments · 9 top-level

reubenmorais7y ago· 3 in thread

The "Making it Work, On-device" paragraph makes it seem like TensorFlow Lite will easily get your model running fast on-device, but in reality RNNs aren't currently supported by the TFLite Converter and the TFLiteLSTMCell example is super slow for training, so this is actually based on proprietary code not available to mere mortals using open source TensorFlow. If you were to actually try reproducing this work, you'd have to use several workarounds, dig deep into the TensorFlow source code, and possibly still end up with a suboptimal TFLite model.

Don't get me wrong, in terms of deployability and flexibility for production usage, TensorFlow/TFLite is really good, specially compared to other frameworks, but Google tends to oversell the abilities of open-source TensorFlow significantly in their marketing material, and you only find out when you go and try doing it yourself.

lawrenceyan7y ago

For industry/real world work, TensorFlow is best in class. It is far superior to any other existing framework. I agree that there are always areas for improvement, but the way you worded your comment makes it almost sound like TF is pretty subpar compared to other offerings.

The reality is more, TensorFlow is really the only option you have if you don’t want to build everything from scratch again. Whether that’s a good or bad thing, well at least it’s because TensorFlow is actually a good product and not because Google is preventing others from building their own / pushing others down.

ru999gol7y ago

what are you even talking about, tf is a mess. pytorch, mxnet, caffee2, etc. are all superior fameworks

1 more reply

perone7y ago

That is an important point, Google is the master of releasing things in half, this was a common practice in Tensorflow since the initial release, they basically removed a lot of things to release it and it became a Frankenstein base of code. Bazel is another example, inside Google it works amazingly, but the open source project is a pain in the ass.

appleflaxen7y ago· 2 in thread

this is so awesome!

but how is it that we have RNN solutions for handwriting when we don't even have a standard, canned RNN for OCR?

I know tesseract and related projects exist, but when I've tried them they have been fairly brittle with lower accuracy than I was expecting. Accuracy was especially problematic for letter combinations like "-ing" that would consistently be recognized as "-mg".

Is there a good ML OCR library I'm missing?

pfortuny7y ago

Just a side comment: take into account that (as per the paper) there is temporal input in Gboard (i.e. the timestamp of each stroke is important).

You do not have that for ing, so the software does not know that the dot is “independent”).

ocrcustomserver7y ago

The reason is that online OCR (this particular case) is entirely different from offline OCR.

Online OCR is when you input the strokes directly on the tablet/phone, so it becomes a sequence of XY coordinates with an associated timestamp. It takes into account where you start and where you end the stroke on the canvas, along with the intermediate points (information galore).

Offline OCR is when you take a photo of your handwriting in your notebook, so you just get the raw pixels of a image. In offline OCR, you'd also have to properly segment and binarize the image before the OCR step.

With that being said, tesseract (version 4) uses an LSTM.

dajohnson897y ago· 2 in thread

I just switched from Android to iPhone, and Gboard on iPhone doesn't have the translation function. It also doesn't have multiple languages -- if I want to switch languages I have to exit out of Gboard and use the default iOS keyboard. Anyone know why these features for Gboard are missing on iOS?

colde7y ago

It does have multiple languages. Clicking the settings icon right between the button to switch to numbers and the emoji button switches languages. You can also hold it down to go to settings.

jjtheblunt7y ago

these features are not missing: check settings

modeless7y ago· 1 in thread

Wow, it is really surprising to me that bezier curve control points produced by an optimization process would be good inputs to a neural net model. Small perturbations to the inputs could produce radically different bezier control points depending on the decisions made by the curve optimizer, so this forces the neural network to learn about the characteristics of the optimizer as well as the input.

Neural nets usually thrive on raw high dimensional inputs, so dramatically reducing the dimensionality of the input seems like a strange decision. I'm sure it improves speed, but I would expect higher accuracy by processing the raw input.

nielsbot7y ago

I don't know ML, but I was surprised too, since converting a series of points into Bezier paths of a certain degree seems arbitrary...

AlphaWeaver7y ago· 1 in thread

Really cool stuff! My phone isn't big enough to do handwriting on, so I'm not really sure where this is supposed to be used? On a tablet I guess?

yorwba7y ago

I just tried tout for refirsttime, and although the keyboardspace on my phone is barely large enough to cran five characters in there, the input scrolls sideraysautomatically if you litt your finger long enoy gh. So longer words can be entered as well. It doesn't seem to reevaluate previously decoded segments baselon what follows, though, so you can end up with weird misspellings at the beyinning of words. I dont think! Im going to use it from now on, beaurthere cognition is balenow ghto requiresigniti cant editing and the frictonisatittooncomfortable vihout a stylus.

Edited with the QWERTY keyboard:

I just tried it out for the first time, and although the keyboard space on my phone is barely large enough to cram five characters in there, the input scrolls sideways automatically if you lift your finger long enough. So longer words can be entered as well. It doesn't seem to reevaluate previously decoded segments based on what follows, though, so you can end up with weird misspellings at the beginning of words. I don't think I'm going to use it from now on, because the recognition is bad enough to require significant editing and the friction is a bit too uncomfortable without a stylus.

dvh7y ago· 1 in thread

Isn't swiping inherently faster? With sweeping you need 1 angle (corner) for letter. Typical letter uses much more than 1 corner.

cbhl7y ago

Only if you're writing words in the dictionary, and you're using the English alphabet.

Handwriting recognition is way more impactful for users in, say, Chinese.

fxfan7y ago· 1 in thread

I used it on windows phone 5 years back for chinese- wonder if this is new to android?

(on iPhone right now)

yorwba7y ago

Handwriting recognition has been available on Android for a while (especially for Chinese). This article is "only" about incremental improvements.

arbie7y ago

How can I train an RNN to OCR my scribbles? It would be the perfect mix of physical paper and digital notes.

geophertz7y ago

I miss the time when in Gboard you could use the slide typing to type multiple words at once. This was so useful and made people like me who are unable to type quickly on a virtual keyboard (touch) to type very fast.

j / k navigate · click thread line to collapse

21 comments

20 comments · 9 top-level

reubenmorais7y ago· 3 in thread

lawrenceyan7y ago

ru999gol7y ago

what are you even talking about, tf is a mess. pytorch, mxnet, caffee2, etc. are all superior fameworks

1 more reply

perone7y ago

appleflaxen7y ago· 2 in thread

this is so awesome!

but how is it that we have RNN solutions for handwriting when we don't even have a standard, canned RNN for OCR?

Is there a good ML OCR library I'm missing?

pfortuny7y ago

Just a side comment: take into account that (as per the paper) there is temporal input in Gboard (i.e. the timestamp of each stroke is important).

You do not have that for ing, so the software does not know that the dot is “independent”).

ocrcustomserver7y ago

The reason is that online OCR (this particular case) is entirely different from offline OCR.

With that being said, tesseract (version 4) uses an LSTM.

dajohnson897y ago· 2 in thread

colde7y ago

It does have multiple languages. Clicking the settings icon right between the button to switch to numbers and the emoji button switches languages. You can also hold it down to go to settings.

jjtheblunt7y ago

these features are not missing: check settings

modeless7y ago· 1 in thread

nielsbot7y ago

I don't know ML, but I was surprised too, since converting a series of points into Bezier paths of a certain degree seems arbitrary...

AlphaWeaver7y ago· 1 in thread

Really cool stuff! My phone isn't big enough to do handwriting on, so I'm not really sure where this is supposed to be used? On a tablet I guess?

yorwba7y ago

Edited with the QWERTY keyboard:

dvh7y ago· 1 in thread

Isn't swiping inherently faster? With sweeping you need 1 angle (corner) for letter. Typical letter uses much more than 1 corner.

cbhl7y ago

Only if you're writing words in the dictionary, and you're using the English alphabet.

Handwriting recognition is way more impactful for users in, say, Chinese.

fxfan7y ago· 1 in thread

I used it on windows phone 5 years back for chinese- wonder if this is new to android?

(on iPhone right now)

yorwba7y ago

Handwriting recognition has been available on Android for a while (especially for Chinese). This article is "only" about incremental improvements.

arbie7y ago

How can I train an RNN to OCR my scribbles? It would be the perfect mix of physical paper and digital notes.

geophertz7y ago

j / k navigate · click thread line to collapse