Google Noto Fonts (opens in new tab)

(google.com)

589 pointsoneplusone11y ago132 comments

132 comments

109 comments · 31 top-level

w1ntermute11y ago· 21 in thread

Hopefully this will be a big step forward in solving the problem of Han unification: http://en.wikipedia.org/wiki/Han_unification

It's infuriating how many Japanese sites still don't use Unicode, purportedly because of this issue (though I suspect that it's just another example of Japan lagging when it comes to web/computer tech).

gioele11y ago

Please understand that Han unification is _the_ problem. It is clean that Unicode needs to realize that the Han unification is wrong and accepts what the native writers of those languages think about their scripts.

To make the problem more understandable to the people that are used to alphabetic scripts, suppose that tomorrow an Asian committee starts creating Uniword, a repertoire that maps complete words to numerical IDs. At a certain point they get to "colour".

Uniword committee: Well, that word shares meaning and origin with the other word "color", for which we have already a codepoint, so we will encode them under the same codepoint.

GB, Australia and Canada: Ehi! No! To us those are different words; especially, we do not want Mr. Colours to appear as Mr. Color.

Uniword commitee: No problem, just add some out-of-band information like "nationality" or "<span lang='en-GB'>"

"colour"-people: that will not work, there are so many cases in which this can go wrong. Whenever I copy a field from a DB I also have to extract this extra information?

Uniword: yes, that is the problem? C'mon!

"colour"-people: but do you need to do that in your applications?

Uniword: no, we have one code for every single word in our languages, including codes for very old languages that exist only in two palimpsests.

"colour"-people: and why cannot we have the same level of granularity?

Uniword: because you have too many words!!! And we started we had only 100k available integers.

"colour"-people: and now?

Uniword: now we have 2^32. But, yeah, that is not the point; just do how we suggest. This dialog is getting to long.

"colour"-people: "dialogue", please.

patio1111y ago

The only way I could improve on this dialogue would be accusing the Australians of anti-American prejudice for refusing to accept the English unification.

That was perceived as happening more than a few times in the Han Unification debate.

haberman11y ago

This is a great summary, thanks for that. I'd never had this explained in a way I could personally relate to.

I remember being concerned about Han unification around the time Ruby 1.9 was released, since this seemed to be one of Ruby's major reasons for being encoding-independent instead of standardizing on Unicode. But I hadn't heard about this issue in a while, except to hear occasionally someone say it's not a problem (maybe it was a Chinese person instead of a Japanese person -- the Wikipedia page says that the Chinese aren't as concerned about Han unification since Traditional Chinese didn't get unified with Simplified Chinese).

ksec11y ago

Thanks, I replied a bit to early without reading. I think you capture the problem nicely.

patio1111y ago

Every discussion we ever had about going to Unicode: "Is it bidirectional compatible with SJIS?" "No." "OK, so when it breaks, what characters does it break on?" "People's names, mostly." "... And why is this being considered?" "It is very, very convenient for white people. Almost all of their stuff works out of the box."

yongjik11y ago

I wouldn't say it's convenient for white people. Being able to write Japanese on all the non-Japanese people's stuff (basically, most websites and open-source softwares) should be mostly useful to the Japanese.

ejr11y ago

It may not be so infuriating when you consider how important names are to Japanese people. Breaking your name is unacceptable and breaking it because of a technical convenience chosen during development would be deeply offensive on top of being unacceptable.

Until Unicode stops breaking people's names, it will continue to be the one standard for Japanese systems on and offline. Even when(if?) it stops breaking Japanese names, it will take a very, very long time to roll over existing systems and that's precluding unforeseen problems during the conversion.

We should stop before we take the "not following standard" = "broken" ideology. Especially when we consider whom the standard serves best.

Edit: By "it will continue to be the one standard for Japanese" in the 2nd paragraph, I meant ShiftJIS not Unicode. That looked a bit unclear.

fenomas11y ago

What do you mean by saying that Unicode breaks people's names? The problem isn't that there's anything wrong with Unicode, the problem is that it's not possible to unambiguously convert text from SJIS to unicode and back again, because SJIS has some duplicate mappings for historical reasons (compatibility with different pre-existing encodings as I recall). The same would presumably be true of converting SJIS to any other encoding that didn't have the same duplicated code points.

2 more replies

GFK_of_xmaspast11y ago

As a monolingual white person, my name is kind of important to me too.

rurounijones11y ago

The amount of Japanese systems that are SJIS only is staggering, basically all traditional IT (banks etc) uses it and does not support, nor will ever support, unicode.

This, naturally, has an impact beyond those systems' borders.

yuubi11y ago

Oddly, the following more modest proposal hasn't gotten much traction: characters that share history but have divergent graphical representations in the various dialects of alphabetic script shall share codepoints, and a mechanism beyond the scope of Unicode (like lang attributes or plain guesswork) shall be used to decide whether a given codepoint means L or ᴫ or Λ or whatever.

fenomas11y ago

Actually I think that's roughly how things work today. It's not my area, but here's how this was explained to me recently by a colleague (the lead font guy at Adobe Japan):

There are two ways of dealing with glyphs that share code points. The first is TTC (truetype collection) fonts. A TTC is basically one set of glyphs with several sets of mappings (i.e. which code point maps to which glyph). When you install it, assuming your computer groks ttc, your system shows you a separate font for each mapping. Taking for example Source Han Sans, which adobe just released - if you go to the download page[0] and get the complete version (the "OTC" one), you get a bunch of files like "SourceHanSans-Bold.ttc". If you install one of them you'll see four new fonts: "Source Han Sans J", K, SC, and TC. Then when you use the font, depending on which font name you used the system will change which mapping it applies to the combined set of glyphs. (Hence the choice of font name is the selection mechanism you described.)

The second way is that TrueType fonts have a way to build locale settings into the font. I'm less clear on the details here but apparently it's similar to TTC behind the scenes, except that the mappings are associated with locales - so in an app that supports TT locales, even if you select "Foo J" as your font, when the locale was simplified Chinese you'd get the SC glyph. Of course now the selection mechanism is whether the application knows what locale the content is. (And also whether it supports the mechanism - I don't know how widespread this is.) Either way though, in principle you get different glyphs for the same code point, depending on context.

Or anyway that's the understanding I took away as a font layperson - happy to be corrected.

[0] http://sourceforge.net/projects/source-han-sans.adobe/files/

1 more reply

anon411y ago

Then you break copy-paste. Or maybe we could add locale markers in unicode, then encode the different symbols as <locale><codepoint>. It will only take something like up to 12 bytes per character in UTF-8, no big deal, right?

Osmium11y ago

Surely there are enough unicode code points for this not be a problem? Can you use the historical character + combining mark (which shows which 'newer' version of the character to use), where the combining mark is ignored if the computer doesn't understand it, and only then it falls back onto guesswork/lang attributes.

I don't know if that's a decent solution, but just guesswork doesn't sound like a good idea, because there are bound to be edge cases where it wouldn't work, and then we're back where we started...

jessaustin11y ago

Well done. I admit, this passed over my head on first reading.

ksec11y ago

I can, partially understand why the Japanese refuse to support Unicode. And while most just adopt unicode simply because of its convenient, it doesn't actually solve the problem behind the Hans Characters in different form and glyph.

Over the years, i am starting to think Han Unification is western ways of hacking the CJK Hans problem rather then actually solving it.

fenomas11y ago

I wouldn't say Japanese refuse to support unicode. I'd say that legacy encodings like EUC-JP/JIS/SJIS are still used because it's hard/unfeasible to convert systems and data that were built for earlier encodings to anything newer. But it's not like they offer some particular technical advantage over Unicode - indeed they only reason they don't suffer from CJK issues is that they have no support for C or K. ;)

But speaking as a front-end webdev guy, it's been a looong time since I came in contact with any encoding here besides utf-8.

euske11y ago

Eh, most Japanese people don't know/care about the problem of Han unification. It's mostly because of the legacy data. The thing is that ASCII and UTF-8 is kinda compatible with minor annoyances, while Shift JIS and UTF-8 are entirely different. People don't want to convert a trove of documents to another encoding which might not be supported yet in some apps. Slow software upgrade is another reason. As someone else pointed out, the default encoding of Windows is still Shift JIS, which is totally understandable for compatibility sake.

Edit: Besides, a TTF font doesn't have to always use Unicode internally. It supports an arbitrary mapping from bytes (could be in UTF-8 or SJIS) to a glyph number. People who really care about the looks (i.e. printing) have been using a charset for each specific language, such as Adobe-Japan1, which is different from both UTF-8 or Shift JIS.

est11y ago

There's some CJK sample on Adobe Typekit blog

http://blog.typekit.com/2014/07/15/introducing-source-han-sa...

unsignedint11y ago

Part of the problem here is that Japanese Windows using SJIS at its front end, perhaps for "legacy" compatibility issue... (Backend, like filesystem on modern Windows is Unicode; it was apparent when adding files containing Japanese characters into Git; it would work if I'm retrieving the file from another Windows machine, but horribly corrupted when moving to other platforms.)

Fortunately, Git guys added fix to convert it into Unicode internally.

glandium11y ago

Another problem is that while most japanese characters take 2 bytes in SJIS, they take 3 in UTF-8.

jdmitch11y ago· 15 in thread

I wonder how the decisions for inclusion of languages were made, as there are some very odd decisions. For example, Osmanya is a script created for the Somali language that was hardly ever used (Somali literacy was only widespread after the latin alphabet was adopted - previously Arabic was commonly used). The population of actual users of this script is pretty indisputably 0. 100,000 would be a wildly ambitious estimate of the number of people who had ever actually even seen the script.

On the other hand, Oriya, which has over 33 million native speakers, including 80% of India's Odisha state, does not appear to be supported.

lstamour11y ago

In their defense, when you click India and scroll down, it does say, "not supported yet". Which leads me to believe they picked both languages with few characters (or straightforward to render?) and those most common, and they'll get to the rest shortly. :)

Oriya appears to be quite complicated to render: http://www.microsoft.com/typography/OpenTypeDev/oriya/intro....

Meanwhile, I wonder if this means we'll see OCR and ePubs for all kinds of scripts now; or if this will help enable Google Translate in more languages? ;-)

ultimoo11y ago

"Oriya is similarly structured to Devanagari and is used to write the Oriya language in Indian state of Orissa."

Devanagri is what Hindi, Marathi, and Sanskrit use, so I am certain that it isn't any more complex to render than those languages.

soperj11y ago

Also maybe this was a %20 time thing and the programmer who started it just wanted to do those languages (probably because they couldn't be found elsewhere).

2 more replies

kijin11y ago

It's probably just a matter of whether or not there's somebody in the relevant team(s) who is familiar with, or at least has heard of, any given script.

I wouldn't be surprised if there happens to be an Osmanya geek in Google, but none of his teammates has ever heard of Oriya. For the same reason, I wouldn't surprised if they added a bunch of geeky fictional languages before actual ones.

sandGorgon11y ago

have an upvote if you file an enhancement request for Tengwar [1] at https://code.google.com/p/noto/issues/list !!

[1] http://www.omniglot.com/writing/tengwar.htm

1 more reply

fzerorubigd11y ago

And the fun thing is, there is two OLD! persian script available, (Pahlavi, OldPersian Both dead for almost 1500 year) and the current Persian is not supported :)))

Fuxy11y ago

I'm guessing their going for the ones nobody would ever put the effort into supporting first and get to the current ones later.

aquilaFiera11y ago

Perhaps even more curious is the inclusion of Deseret, a toy language developed by the Mormons in the early 1800s. It never caught on and few books other than the Book of Mormon were ever translated into it.

talideon11y ago

Deseret wasn't a language, it was simply a script to go along with a reformed version of English with simplified spelling.

grrowl11y ago

Good pick, but I also see the value in preservation — maybe 500 years from now, the Noto fonts will be the best or only representation of many dead or forgotten scripts and languages.

ubernostrum11y ago

Although they are non-prescriptive, the Unicode Osmanya table:

http://www.unicode.org/charts/PDF/U10480.pdf

already contains reference glyphs. If you want to preserve scripts, preserve Unicode tables instead of making fonts.

1 more reply

harty6511y ago

I noticed the inclusion of Cornish. As of 2011 there were 557 people that claimed Cornish as their primary language.

peteretep11y ago

Sure, but, there are no Cornish glyphs. Cornish is entirely writeable with the same characters you write English with, so you get it for free

sahoo11y ago

Hi fellow Oriyan, google has very bad support for Oriya,since IT is not that great as in R&D in odisha, I work in IIIT hyd, which is the leading NLP lab in India and I dont see anything in Oriya.

snambi11y ago

Good observation.

teddyh11y ago· 11 in thread

In these enlightened Unicode days, why are fonts still “for” a language?

bazzargh11y ago

Well, one reason would be that your web font would be 134Mb or so? (looking at the size of the comprehensive Noto download)

The other is simple practicality - these things take time to develop, you can either wait until all the glyphs are done, or release subsets that cover languages as you work; a subset that covers part of a language isn't very useful but subsets that cover whole languages are.

ivanca11y ago

That is a technical problem to solve. There should be a way for browsers to only download the characters being rendered in the current page; so even if the file is 134Mb it could get only the little pieces of it that it needs.

1 more reply

kijin11y ago

They're not really "for" a language. They are optimized subsets of the same font that only contain glyphs from one or more languages to minimize the file size.

If your website is written in English and an occasional accented character from other Western languages, there's no need to load a 50MB web font containing all the Tradntional Chinese characters.

kps11y ago

Part of the answer is that Unicode has a lot of characters, and web pages use only a few, so for web fonts it makes sense to have the user download only ones likely to be used.

Another part is that some CJK characters look somewhat different in C, J, and K. http://www.unicode.org/faq/han_cjk.html#3

kalleboo11y ago

CJK unification in Unicode means that you don't know how to render a Unicode codepoint without also knowing the language of the text - the same Unicode codepoint looks different when rendered in Chinese, Japanese or Korean.

ygra11y ago

There are technical reasons for this. For example m, OpenType only supports 64k glyphs within a font. Which is enough for the BMP but nothing else (and counting ligatures that are necessary for, e.g. Arabic, it might not even suffice for the BMP).

Then there are practical considerations. While Latin, Greek and Cyrillic are similar enough to warrant the same styles (serif, sans-serif, script, italic, and various weights) not all of them make very much sense for, say, CJK or a variety of other scripts. So having different fonts for different scripts that are still designed to go together is actually not that bad a solution.

It does mean that for good typography you need a matrix of fonts based on style and script. Word includes two fonts per style for this, to treat CJK differently, which might not be enough, depending on the numbers of different scripts involved in a single document. But a) several dozen scripts per document are somewhat rare apart from Wikipedia's language list per article and font demonstrations; and b) good typography needs effort, this won't change.

wldcordeiro11y ago

I would guess that it's because of the amount of work required to design glyphs for every character that retain the font's style while still capturing the character's look and meaning from its language.

rabbyte11y ago

According to the site all the fonts together are 134MB compressed. Maybe that's the reason or maybe because it's a work in progress so works out better in segments.

rurounijones11y ago

wouldn't it be reasonable to expect, at some point in the near future, that these fonts are pre-installed on all operating systems?

1 more reply

archagon11y ago

A typographer typically works within a small set of languages. It would be unfeasible for a single type foundry to cover every possible glyph with the same consistency.

cloudwalking11y ago

A lot of fonts don't support a lot of unicode characters.

mirzmaster11y ago· 7 in thread

Still no Nastaliq [1] for Urdu and Persian script. There's a great piece on Medium [2] about the death of the Urdu script at the hands of the more structured Arabic Naskh font.

[1] https://en.wikipedia.org/wiki/Nasta%CA%BFl%C4%ABq_script

[2] https://medium.com/@eteraz/the-death-of-the-urdu-script-9ce9...

sandGorgon11y ago

Could you file a bug at https://code.google.com/p/noto/issues/list ? This is a great place for @eteraz to get involved.

sandGorgon11y ago

Since nobody did it, I filed a bug at https://code.google.com/p/noto/issues/detail?id=39 through my phone.

the title is messed up, but I hope the message is clear.

1 more reply

scrollaway11y ago

I saw the article before and I too was deeply moved by it, but don't you think you could word this better so as not to make it sound so blasé? "Pah, no nastaliq, useless!" -- this is an amazing project.

mynameisvlad11y ago

+1. It sounds incredibly condescending the way it's written right now. Like the omission of Nastaliq is a great tragedy and Google should be deeply ashamed.

1 more reply

mikehotel11y ago

You can find a nice Nastaliq font at http://urdu.ca/1 (direct link: http://urdu.ca/UrduFonts.zip). More are available at http://www.urdujahan.com/font.html. You will want to find out how they are licensed before using them in a commercial capacity.

cies11y ago

Big thanks to Google for this effort...

But true, no Nastaliq (yet). Sadly not even mentioned as "unsupported".

capex11y ago

The Urdu font they currently have there is not too bad. I won't mind reading a passage in this font.

janlukacs11y ago· 3 in thread

Might be just me but i don't like the Sans Serif font at all, renders really bad in Safari.

_delirium11y ago

The Serif looks questionable to me as well. The vertical line on the lowercase 'h' in particular looks like it has artifacts, both too thick and oddly blurred. The Greek serif has a lot of the same artifacts as well. The sans-serif looks fine, but something looks pretty wrong with the serif. I also thought it might be a client-side rendering issue (I'm using a Mac), but the demo page has the text prerendered into an image.

edit: Found a different demo page that renders the webfont client-side instead of showing images, and looks much better to me: http://www.google.com/fonts/specimen/Noto+Serif. Maybe it's just that the pre-rendered specimens are made with a poor rendering engine?

sebnukem211y ago

It looks awfully blurry in Firefox on Linux.

peedy11y ago

On the website, they are pre rendered into images. Example, http://www.google.com/get/noto/images/samples/noto-sans_en_4...

tokenadult11y ago· 2 in thread

I like the implementation of CJK fonts in Noto, which was just released this week. I particularly like that I can illustrate that the various Sinitic languages ("Chinese dialects") do NOT all use the same written characters, so that Chinese people who travel to different dialect regions sometimes find written signs that they cannot read, even if they are literate in Modern Standard Chinese. (I have seen this regional illiteracy on the part of native speakers of Chinese in several contexts.)

How you might write the conversation

"Does he know how to speak Mandarin?

"No, he doesn't."

他會說普通話嗎？

他不會。

in Modern Standard Chinese characters contrasts with how you would write

"Does he know how to speak Cantonese?

"No, he doesn't."

佢識唔識講廣東話？

佢唔識。

in the Chinese characters used to write Cantonese. As will readily appear even to readers who don't know Chinese characters (if you have a good Unicode implementation enabled as you read Hacker News), many more words than "Mandarin" and "Cantonese" differ between those sentences in Chinese characters.

jlebar11y ago

I thought Han Unification meant that (most of the common) CJK characters were represented by the same Unicode code points, and that the way to differentiate like this is by specifying metadata that indicates the "language".

Obviously I'm wrong, because these are just regular Unicode characters, without an HTML "lang" attribute.

What gives?

fenomas11y ago

Two separate issues. Pan-CJK fonts like Noto solve a problem that arises from the fact that many CJK characters are only ever used in certain locales. Since most CJK fonts are made for a given locale, they tend not to include any of the (many) CJK characters never used in that locale. Hence, if you render mixed CJK text in (say) a Japanese font, any characters that don't appear in Japanese won't render at all. That's what (I believe) this comment refers to.

The Han Unification problem arises from the inverse case - characters that are used in several languages but rendered differently depending on locale[0]. For those characters, they'll render even without a pan-CJK font, but the problem is they'll render in a way that's not appropriate for their locale.

[0] Another way to phrase this would be "distinct characters which share a code point becaus Unicode mistakenly thinks they're a single character whose rendering differs by locale". The difference is basically subjective.

1 more reply

CitizenKane11y ago· 2 in thread

This is incredible and is going to be very useful for people developing applications for use in Eastern Asia. Nailing typefaces for Chinese, Japanese, and Korean is a huge challenge. Noto and the accompanying Source Han Sans is going to be a huge boon for people in Eastern Asia and hopefully it will have widespread adoption.

Sadly, it's probably still not possible to use as a Webfont. A single font weight is over 8mb, but there is a distinct possibility this could go into mobile devices and operating systems which would be awesome.

twerquie11y ago

I wonder it could be split up, and the necessary segment(s) could be dynamically loaded based on OS language preferences?

footpath11y ago

It is possible. There is a technology called dynamic subsetting that only loads the necessary glyphs on a web page:

http://www.monotype.com/services/screen-imaging-solutions/dy...

http://en.justfont.com/

shared4you11y ago· 2 in thread

I have been using Noto fonts for a more than 6 months now (mostly Indic fonts) and quite pleased with them. And just saw that they have "Noto Sans Brahmi" in the pipeline. Although Brahmi script (ancient Indian script used around 300 BC) entered Unicode in 2010, there is not a single font available that covers Brahmi.

I also couldn't find any font that covers mathematical symbols from the SMP.

EDIT: Just downloaded the zip archive. Unix permissions for the Bengali and Gurmukhi fonts are different from the rest of them.

sp33211y ago

For math Symbols: Cambria Math and DejaVu Sans should have them. http://www.alanwood.net/unicode/fontsbyrange.html#u1d400

tokenadult11y ago

That was an interesting comment. May I ask, as a follow-up comment, what you meant by

mathematical symbols from the SMP

as I did the expected Google search, and I am not sure that the search results I see refer to what you were referring to.

rurounijones11y ago· 2 in thread

Is there any reason that these could not be included as standard fonts in windows, linux, mac, android, IOS etc at some point in the near future?

marcoms11y ago

No technical reasons, but it would be unlikely for Apple or Microsoft to adopt a font made by Google, when their own alternatives exist, despite technical superiority. Android already uses Roboto, which has been heavily invested in for Android 4+ and now with Material design too. Of course, by Linux I'm assuming you mean the popular distributions of it like Ubuntu, but even Ubuntu has its own font which it is unlikely to change - other non-"branded" distros probably would be the only ones who might.

rurounijones11y ago

Was hoping for a moment that we could all come together in harmony and enjoy universal access to fonts for all languages without relying on webfont kludges... hope springs eternal

suyash11y ago· 2 in thread

Anyone know what license they are released under and if it is ok to use them freely for commercial projects?

pbhjpbhj11y ago

NotoSans/NotoSerif downloads have a LICENSE file which starts "Apache License Version 2.0, January 2004 http://www.apache.org/licenses/".

AlyssaRowan11y ago

Apache 2.0, so yes.

theandrewbailey11y ago· 2 in thread

I really like Noto Sans. From what I can tell, it's a fork of Open Sans. For the Latin alphabet it's mostly the same, but with a single story lowercase g.

abrowne11y ago

Which itself is a redrawing of Droid Sans. This Typophile thread has some comparisons, and a couple comments by the designer, Steve Matteson: http://typophile.com/node/101655

dutchbrit11y ago

It's just a shame that Noto Sans doesn't have a nice range of weights. They would of been better off improving Open Sans instead of creating yet another font.

mahmoudhossam11y ago· 2 in thread

I have a question. Why would anyone list Greek under "Egypt"?

Tortoise11y ago

Greek was spoken in Egypt for 1000 years. I imagine it's not that common today.

GFK_of_xmaspast11y ago

https://www.youtube.com/watch?v=1oTEQf1d9Iw

vincentchan11y ago· 2 in thread

Will Noto be available in Google Web Fonts later? That will be awesome.

kijeda11y ago

It has been for quite some time (at least a year).

https://www.google.com/fonts/specimen/Noto+Sans

vincentchan11y ago

But this is for english only, not for other languages.

1 more reply

smrtinsert11y ago· 1 in thread

I am in love. Why don't they offer a monospace programming version? Noto Sans outdoes my Consolas easily for clarity. No easy feat! Please release a Noto Sans Code Google!

amass11y ago

If they released a monospace version, I would switch from Inconsolata! Although it is probably more important to continue work on supporting more languages.

zvrba11y ago· 1 in thread

I looked at sample serif font, and it renders both blurry and jagged. At that size, this is quite an "achievement".

f05511y ago

I have the same effect on Safari for Mac. This is lame.

hownottowrite11y ago· 1 in thread

This page seemed really slow on mobile. I thought it was just me but...

http://developers.google.com/speed/pagespeed/insights/?url=h...

tripzilch11y ago

On my netbook, Firefox warned me about an unresponsive script.

marcoms11y ago· 1 in thread

Nice to see Material design in use on their sites for one of the first times!

scotty7911y ago

Not so sure. Commandeering of my scroll-bar and slight enlargement of clicked element along with dark outer glow didn't sit well with me.

dnqthao11y ago· 1 in thread

Nice, we look forward in the future for Chu Nom scripts.

sandGorgon11y ago

I suggest you file a bug https://code.google.com/p/noto/issues/list

jpatokal11y ago

This is brilliant, particularly the newly released Noto CJK: http://www.google.com/get/noto/cjk.html

I'm not aware of any other font that does a decent job of handling all of Simplified Chinese, Traditional Chinese, Japanese, and Korean simultaneously, and with light, bold, thin etc variants to boot. Most existing fonts, even expensive commercial ones, are lucky to support two, and even then usually regular text only.

keehun11y ago

I think this is amazing. I have never seen Cherokee glyphs that beautifully rendered before. Apparently there are still missing scripts, but this is a great step forward. This couldn't have come cheap, and I'm happy that Google is investing effort into this.

abrowne11y ago

The Google Code page used to have a comment on the origins of the name. Noto is short for 'no tofu', tofu being the rectangles you get when you don't have a font covering that glyph.

idoco11y ago

"All human beings are born free and equal in dignity and rights" I love the fact that they use The Universal Declaration of Human Rights as the text for showcasing the fonts, using every opportunity to stand for human rights!

jzzocc11y ago

https://www.ruby-lang.org has used Noto for a while and it looks great.

waitingkuo11y ago

Nice, so glad that it support for Chinese!

greenpresident11y ago

Note that this neatly integrates into their plan of digitizing all books ever written. Next: Brahimi Captchas.

deskamess11y ago

I find the Canada->Cree glyphs very interesting (geometrical). The art from the area is also very beautiful. If you are ever in Ottawa a trip to the Canadian Museum of History (was Civilization) is well worth it.

Cherokee (US) is one fine looking set of glyphs.

SimeVidas11y ago

Largest .ttf in collection: 762KB

Smallest .otf in collection: 4093KB

insky11y ago

Took me a while to work out just how to preview fonts on that page.

callesgg11y ago

My safari on my ipad crashes when I visit that page.

cihangirsavas11y ago

it is like NATO :)

wahsd11y ago

Interesting behavior when you do an in-page search for a language.

j / k navigate · click thread line to collapse

132 comments

109 comments · 31 top-level

w1ntermute11y ago· 21 in thread

Hopefully this will be a big step forward in solving the problem of Han unification: http://en.wikipedia.org/wiki/Han_unification

gioele11y ago

Uniword committee: Well, that word shares meaning and origin with the other word "color", for which we have already a codepoint, so we will encode them under the same codepoint.

GB, Australia and Canada: Ehi! No! To us those are different words; especially, we do not want Mr. Colours to appear as Mr. Color.

Uniword commitee: No problem, just add some out-of-band information like "nationality" or "<span lang='en-GB'>"

"colour"-people: that will not work, there are so many cases in which this can go wrong. Whenever I copy a field from a DB I also have to extract this extra information?

Uniword: yes, that is the problem? C'mon!

"colour"-people: but do you need to do that in your applications?

Uniword: no, we have one code for every single word in our languages, including codes for very old languages that exist only in two palimpsests.

"colour"-people: and why cannot we have the same level of granularity?

Uniword: because you have too many words!!! And we started we had only 100k available integers.

"colour"-people: and now?

Uniword: now we have 2^32. But, yeah, that is not the point; just do how we suggest. This dialog is getting to long.

"colour"-people: "dialogue", please.

patio1111y ago

The only way I could improve on this dialogue would be accusing the Australians of anti-American prejudice for refusing to accept the English unification.

That was perceived as happening more than a few times in the Han Unification debate.

haberman11y ago

This is a great summary, thanks for that. I'd never had this explained in a way I could personally relate to.

ksec11y ago

Thanks, I replied a bit to early without reading. I think you capture the problem nicely.

patio1111y ago

yongjik11y ago

ejr11y ago

We should stop before we take the "not following standard" = "broken" ideology. Especially when we consider whom the standard serves best.

Edit: By "it will continue to be the one standard for Japanese" in the 2nd paragraph, I meant ShiftJIS not Unicode. That looked a bit unclear.

fenomas11y ago

2 more replies

GFK_of_xmaspast11y ago

As a monolingual white person, my name is kind of important to me too.

rurounijones11y ago

The amount of Japanese systems that are SJIS only is staggering, basically all traditional IT (banks etc) uses it and does not support, nor will ever support, unicode.

This, naturally, has an impact beyond those systems' borders.

yuubi11y ago

fenomas11y ago

Actually I think that's roughly how things work today. It's not my area, but here's how this was explained to me recently by a colleague (the lead font guy at Adobe Japan):

Or anyway that's the understanding I took away as a font layperson - happy to be corrected.

[0] http://sourceforge.net/projects/source-han-sans.adobe/files/

1 more reply

anon411y ago

Osmium11y ago

I don't know if that's a decent solution, but just guesswork doesn't sound like a good idea, because there are bound to be edge cases where it wouldn't work, and then we're back where we started...

jessaustin11y ago

Well done. I admit, this passed over my head on first reading.

ksec11y ago

Over the years, i am starting to think Han Unification is western ways of hacking the CJK Hans problem rather then actually solving it.

fenomas11y ago

But speaking as a front-end webdev guy, it's been a looong time since I came in contact with any encoding here besides utf-8.

euske11y ago

est11y ago

There's some CJK sample on Adobe Typekit blog

http://blog.typekit.com/2014/07/15/introducing-source-han-sa...

unsignedint11y ago

Fortunately, Git guys added fix to convert it into Unicode internally.

glandium11y ago

Another problem is that while most japanese characters take 2 bytes in SJIS, they take 3 in UTF-8.

jdmitch11y ago· 15 in thread

On the other hand, Oriya, which has over 33 million native speakers, including 80% of India's Odisha state, does not appear to be supported.

lstamour11y ago

Oriya appears to be quite complicated to render: http://www.microsoft.com/typography/OpenTypeDev/oriya/intro....

Meanwhile, I wonder if this means we'll see OCR and ePubs for all kinds of scripts now; or if this will help enable Google Translate in more languages? ;-)

ultimoo11y ago

"Oriya is similarly structured to Devanagari and is used to write the Oriya language in Indian state of Orissa."

Devanagri is what Hindi, Marathi, and Sanskrit use, so I am certain that it isn't any more complex to render than those languages.

soperj11y ago

Also maybe this was a %20 time thing and the programmer who started it just wanted to do those languages (probably because they couldn't be found elsewhere).

2 more replies

kijin11y ago

It's probably just a matter of whether or not there's somebody in the relevant team(s) who is familiar with, or at least has heard of, any given script.

sandGorgon11y ago

have an upvote if you file an enhancement request for Tengwar [1] at https://code.google.com/p/noto/issues/list !!

[1] http://www.omniglot.com/writing/tengwar.htm

1 more reply

fzerorubigd11y ago

And the fun thing is, there is two OLD! persian script available, (Pahlavi, OldPersian Both dead for almost 1500 year) and the current Persian is not supported :)))

Fuxy11y ago

I'm guessing their going for the ones nobody would ever put the effort into supporting first and get to the current ones later.

aquilaFiera11y ago

talideon11y ago

Deseret wasn't a language, it was simply a script to go along with a reformed version of English with simplified spelling.

grrowl11y ago

Good pick, but I also see the value in preservation — maybe 500 years from now, the Noto fonts will be the best or only representation of many dead or forgotten scripts and languages.

ubernostrum11y ago

Although they are non-prescriptive, the Unicode Osmanya table:

http://www.unicode.org/charts/PDF/U10480.pdf

already contains reference glyphs. If you want to preserve scripts, preserve Unicode tables instead of making fonts.

1 more reply

harty6511y ago

I noticed the inclusion of Cornish. As of 2011 there were 557 people that claimed Cornish as their primary language.

peteretep11y ago

Sure, but, there are no Cornish glyphs. Cornish is entirely writeable with the same characters you write English with, so you get it for free

sahoo11y ago

Hi fellow Oriyan, google has very bad support for Oriya,since IT is not that great as in R&D in odisha, I work in IIIT hyd, which is the leading NLP lab in India and I dont see anything in Oriya.

snambi11y ago

Good observation.

teddyh11y ago· 11 in thread

In these enlightened Unicode days, why are fonts still “for” a language?

bazzargh11y ago

Well, one reason would be that your web font would be 134Mb or so? (looking at the size of the comprehensive Noto download)

ivanca11y ago

1 more reply

kijin11y ago

They're not really "for" a language. They are optimized subsets of the same font that only contain glyphs from one or more languages to minimize the file size.

If your website is written in English and an occasional accented character from other Western languages, there's no need to load a 50MB web font containing all the Tradntional Chinese characters.

kps11y ago

Part of the answer is that Unicode has a lot of characters, and web pages use only a few, so for web fonts it makes sense to have the user download only ones likely to be used.

Another part is that some CJK characters look somewhat different in C, J, and K. http://www.unicode.org/faq/han_cjk.html#3

kalleboo11y ago

ygra11y ago

wldcordeiro11y ago

rabbyte11y ago

According to the site all the fonts together are 134MB compressed. Maybe that's the reason or maybe because it's a work in progress so works out better in segments.

rurounijones11y ago

wouldn't it be reasonable to expect, at some point in the near future, that these fonts are pre-installed on all operating systems?

1 more reply

archagon11y ago

A typographer typically works within a small set of languages. It would be unfeasible for a single type foundry to cover every possible glyph with the same consistency.

cloudwalking11y ago

A lot of fonts don't support a lot of unicode characters.

mirzmaster11y ago· 7 in thread

Still no Nastaliq [1] for Urdu and Persian script. There's a great piece on Medium [2] about the death of the Urdu script at the hands of the more structured Arabic Naskh font.

[1] https://en.wikipedia.org/wiki/Nasta%CA%BFl%C4%ABq_script

[2] https://medium.com/@eteraz/the-death-of-the-urdu-script-9ce9...

sandGorgon11y ago

Could you file a bug at https://code.google.com/p/noto/issues/list ? This is a great place for @eteraz to get involved.

sandGorgon11y ago

Since nobody did it, I filed a bug at https://code.google.com/p/noto/issues/detail?id=39 through my phone.

the title is messed up, but I hope the message is clear.

1 more reply

scrollaway11y ago

mynameisvlad11y ago

+1. It sounds incredibly condescending the way it's written right now. Like the omission of Nastaliq is a great tragedy and Google should be deeply ashamed.

1 more reply

mikehotel11y ago

cies11y ago

Big thanks to Google for this effort...

But true, no Nastaliq (yet). Sadly not even mentioned as "unsupported".

capex11y ago

The Urdu font they currently have there is not too bad. I won't mind reading a passage in this font.

janlukacs11y ago· 3 in thread

Might be just me but i don't like the Sans Serif font at all, renders really bad in Safari.

_delirium11y ago

sebnukem211y ago

It looks awfully blurry in Firefox on Linux.

peedy11y ago

On the website, they are pre rendered into images. Example, http://www.google.com/get/noto/images/samples/noto-sans_en_4...

tokenadult11y ago· 2 in thread

How you might write the conversation

"Does he know how to speak Mandarin?

"No, he doesn't."

他會說普通話嗎？

他不會。

in Modern Standard Chinese characters contrasts with how you would write

"Does he know how to speak Cantonese?

"No, he doesn't."

佢識唔識講廣東話？

佢唔識。

jlebar11y ago

Obviously I'm wrong, because these are just regular Unicode characters, without an HTML "lang" attribute.

What gives?

fenomas11y ago

1 more reply

CitizenKane11y ago· 2 in thread

twerquie11y ago

I wonder it could be split up, and the necessary segment(s) could be dynamically loaded based on OS language preferences?

footpath11y ago

It is possible. There is a technology called dynamic subsetting that only loads the necessary glyphs on a web page:

http://www.monotype.com/services/screen-imaging-solutions/dy...

http://en.justfont.com/

shared4you11y ago· 2 in thread

I also couldn't find any font that covers mathematical symbols from the SMP.

EDIT: Just downloaded the zip archive. Unix permissions for the Bengali and Gurmukhi fonts are different from the rest of them.

sp33211y ago

For math Symbols: Cambria Math and DejaVu Sans should have them. http://www.alanwood.net/unicode/fontsbyrange.html#u1d400

tokenadult11y ago

That was an interesting comment. May I ask, as a follow-up comment, what you meant by

mathematical symbols from the SMP

as I did the expected Google search, and I am not sure that the search results I see refer to what you were referring to.

rurounijones11y ago· 2 in thread

Is there any reason that these could not be included as standard fonts in windows, linux, mac, android, IOS etc at some point in the near future?

marcoms11y ago

rurounijones11y ago

Was hoping for a moment that we could all come together in harmony and enjoy universal access to fonts for all languages without relying on webfont kludges... hope springs eternal

suyash11y ago· 2 in thread

Anyone know what license they are released under and if it is ok to use them freely for commercial projects?

pbhjpbhj11y ago

NotoSans/NotoSerif downloads have a LICENSE file which starts "Apache License Version 2.0, January 2004 http://www.apache.org/licenses/".

AlyssaRowan11y ago

Apache 2.0, so yes.

theandrewbailey11y ago· 2 in thread

I really like Noto Sans. From what I can tell, it's a fork of Open Sans. For the Latin alphabet it's mostly the same, but with a single story lowercase g.

abrowne11y ago

Which itself is a redrawing of Droid Sans. This Typophile thread has some comparisons, and a couple comments by the designer, Steve Matteson: http://typophile.com/node/101655

dutchbrit11y ago

It's just a shame that Noto Sans doesn't have a nice range of weights. They would of been better off improving Open Sans instead of creating yet another font.

mahmoudhossam11y ago· 2 in thread

I have a question. Why would anyone list Greek under "Egypt"?

Tortoise11y ago

Greek was spoken in Egypt for 1000 years. I imagine it's not that common today.

GFK_of_xmaspast11y ago

https://www.youtube.com/watch?v=1oTEQf1d9Iw

vincentchan11y ago· 2 in thread

Will Noto be available in Google Web Fonts later? That will be awesome.

kijeda11y ago

It has been for quite some time (at least a year).

https://www.google.com/fonts/specimen/Noto+Sans

vincentchan11y ago

But this is for english only, not for other languages.

1 more reply

smrtinsert11y ago· 1 in thread

I am in love. Why don't they offer a monospace programming version? Noto Sans outdoes my Consolas easily for clarity. No easy feat! Please release a Noto Sans Code Google!

amass11y ago

If they released a monospace version, I would switch from Inconsolata! Although it is probably more important to continue work on supporting more languages.

zvrba11y ago· 1 in thread

I looked at sample serif font, and it renders both blurry and jagged. At that size, this is quite an "achievement".

f05511y ago

I have the same effect on Safari for Mac. This is lame.

hownottowrite11y ago· 1 in thread

This page seemed really slow on mobile. I thought it was just me but...

http://developers.google.com/speed/pagespeed/insights/?url=h...

tripzilch11y ago

On my netbook, Firefox warned me about an unresponsive script.

marcoms11y ago· 1 in thread

Nice to see Material design in use on their sites for one of the first times!

scotty7911y ago

Not so sure. Commandeering of my scroll-bar and slight enlargement of clicked element along with dark outer glow didn't sit well with me.

dnqthao11y ago· 1 in thread

Nice, we look forward in the future for Chu Nom scripts.

sandGorgon11y ago

I suggest you file a bug https://code.google.com/p/noto/issues/list

jpatokal11y ago

This is brilliant, particularly the newly released Noto CJK: http://www.google.com/get/noto/cjk.html

keehun11y ago

abrowne11y ago

The Google Code page used to have a comment on the origins of the name. Noto is short for 'no tofu', tofu being the rectangles you get when you don't have a font covering that glyph.

idoco11y ago

jzzocc11y ago

https://www.ruby-lang.org has used Noto for a while and it looks great.

waitingkuo11y ago

Nice, so glad that it support for Chinese!

greenpresident11y ago

Note that this neatly integrates into their plan of digitizing all books ever written. Next: Brahimi Captchas.

deskamess11y ago

Cherokee (US) is one fine looking set of glyphs.

SimeVidas11y ago

Largest .ttf in collection: 762KB

Smallest .otf in collection: 4093KB