When text is rendered by a computer that lacks the characters to display it, the interface shows little outlined squares instead. Called “tofu,” these signify the absence of a digital character, but thanks to Google and Monotype, this phenomenon is a thing of the past.

“There’s been a change happening in infrastructure for the last 20 years or so,” says Kamal Mansour, a linguistic typographer based at Monotype in San Francisco. “The Unicode standard now includes codes for the characters of every script, every writing system that has been perfected and submitted. For example, in the case of Burmese, it has been in the standard for 15 years or so, but the country hadn’t opened up, so there wasn’t much demand for it. Now that the country’s opening up, there are more and more fonts being created, and more details being paid attention to.

“On the other hand there have been minority languages that have never been listed in the Unicode standard. There’s a certain process by which someone can collect all the necessary characters, documents, and submit a proposal to the Unicode technical committee. Once you have codes assigned to the characters you can represent any language that’s written in that particular script.”

Mansour’s team, in collaboration with Google, has just released the first version of an open-source family of typefaces that bring together all the known languages of the world, living and dead. Noto is the largest typographic project of its kind, encompassing over 800 languages, 100 different scripts, musical notation, punctuation, and of course, emoji.

“You cannot really represent your culture, your heritage, digitally unless you have unique codes for every character that you use,” says Mansour. “For English and the European languages in general this has been available for decades, but for some of these minority languages it’s fairly recent.”

google_monotype_noto_glyphs
Google + Monotype, Noto

Groups like the Cherokee Nation in the United States are working hard to preserve their language and teach it to younger generations to ensure its survival. “Being digital is really important to that,” says Google’s Bob Jung, “because all the devices that their kids are using can actually be operated in their native language. We need these languages to be ready for our devices, but Google has always taken this attitude that the more information is available to people the better it is, and I think we want to encourage the culture of preservation of all of these languages, and the information that’s recorded in all of them.”

For minority languages, Noto represents an opportunity to digitally preserve their cultural and linguistic history, forming records of traditions that would otherwise be lost. The investment in such a benevolent, open-source project has been colossal, with dedicated teams at Google and Monotype working in collaboration for over five years. So far they’ve produced a sans serif family that’s almost complete, a serif family still very much in development, and now have the additional challenge of rendering all of this in up to eight different weights. For Mansour, it’s career-defining work.

“My background is mixed, design, linguistics, and computer science,” he says. “The amount of information that I’ve learned through this project is really overwhelming. I could not have asked for anything better.”

google_monotype_noto_character-sets
Google + Monotype, Noto

Producing Noto required an extraordinary depth of research from Mansour, Jung, and both of their teams, filling in gaps in the Unicode system and in many cases creating the first digital typeface for a given language. In the case of Tibetan, this required translating a script with thousands of years of calligraphic heritage into a series of anchor points and terminals on screen. To facilitate its production Mansour and designer Toshi Omagari, worked closely with a team of Buddhist monks.

“The Tibetan script has been coded in the Unicode standard since early on,” says Mansour. “What hasn’t happened for Tibetan is the development of a type of graphic style and typographic point of view. Everything has been calligraphic until the very recent decades. Even typefaces people have created for Tibetan in the last 10-15 years have all looked exactly like one calligraphic style or the other. In this case Toshi’s ambition was to present something in a contemporary typographic style that still maintains various traditional features that spring from the calligraphic tradition, but to meld those two together so that it would be pleasing and accessible to the native mind.

“We had access to this monastery in Japan only because we knew the manager. It came out as a very interesting coincidence and collaboration. He and Toshi had a chance to exchange lots of ideas and collect background before coming up with something final. We presented one early proposal to Google and the various reviewers had refused it—they thought it looked too bulky and wasn’t considerate of their tradition. In the design that Toshi produced afterwards, they found Toshi’s contemporary proposal pleasing, which is really quite an achievement. The proof is the approval of these monks.”

Elsewhere the project has led to the evolution of existing alphabets, not just in digitizing what was previously hand-drawn, but by creating new parameters within which different cases might be used. The Cherokee alphabet had existed in Unicode for over a decade, but had been coded on the assumption that it would function only in lower case. While working on Noto, the Cherokee decided they needed provision for an upper case too, meaning Mansour and his team had to create a whole new case of letters from scratch.

“What differs greatly from script to script is that some are already so well documented that we can find references fairly easily,” says Mansour. “For others it’s an empty hole. Sometimes the available sources might be very old, and it’s hard to know whether they’d be appropriate for modern usage.”

The question of appropriateness is one that’s also been raised in relation to the sans serif and serif versions of Noto. The serif is a form particular to the Latin typographic tradition, and difficult to impose upon eastern and calligraphic scripts. Homogenizing these alphabets to conform to western type standards was something Mansour and his team were keen to avoid, and so the serif version of Noto has simply been rendered as the more formal of the two.

“Very often the distinction is translated as formal and informal. The sans would be informal and the serif would be formal. If you were to examine some of the Indian scripts like Urdu, and look at the serif version in comparison to the sans you would notice the shift from informal to formal. Obviously Urdu does not have a real serif, but the formal has sharp terminals, classic looking angles, and so on.”

For Mansour the creation of Noto has been the work of a lifetime, and an undertaking from which he seems still to be reeling. “It really is shocking. The amount of work that comes out of this is incredible. I saw my designer colleagues overwhelmed; the amount of detail that’s involved in making point compatible design across the board is extraordinary, and without everybody’s hands the ship would go down.