Notes from TUG2008 in Cork: Day 2

2008-07-22

These are rough notes from a public event, and any errors or stupidity should be attributed to me and my poor note taking; I hope these notes are useful despite their obvious flaws, and everything should be double checked :)

Unicode and TeX

Arthur Reutenauer

9:05 [5 mins late]]

XeTeX does on the fly translation from UTF8 to TeX’s “legacy” encodings.

RFC 4646 is a language naming scheme standard that covers everything; the ISO 2 or 3 character codes don’t cover enough language variants. Eg, the UK language could be British English or Ukranian.


xindy: UTF8 indexes

Joachim Schrod

9:30

if you create a in index, that usually means page numbers. But not always; music pieces have names, Bibles have named sections that matter. Ranges over structured location references. xindy allows for this. We have a declarative style language for both declaring these locations and for defining the output style. We have pre-made modules for common tasks.

Perhaps the most important contribution of xindy is its theoretical model for index creation. Something that LuaTeX could take on?

We have a set of predefined languages - even Klingon ;) - although that’s not in Unicode! ;p - but this isn’t a very wide selection, its euro centric (because its a community effort)

We have markup normalisation for the index; we made a TeX introductory book that has “\MF” and so on instead of “\index{METAFONT@\MF}”


Do we need a ‘Cork’ math font encoding?

Ulrik Vieth

10:00

Returning to Cork this year, I thought about the last time, when the Cork encoding was developed. It provided a model for more 8 bit font encodings, supported many European languages, and started further developments. Its complete 7 bit ASCII support was good … but some shortcomings; didn’t follow any other standards like ISO Latin 1 or 2, and input and output encodings were different (solved in 93/94 by LaTeX2e and inputenc and fontenc) and created a lot of local encoding forks (solved by TeX Gyre fonts) and left out text symbols and the glyphs commonly available in PostScript fonts. So there was a big mess of font encodings.

This is only resolved by moving to Unicode and OpenType fonts. The TeX Gyre project provides a consistent implementation of many encodings, with a root in Unicode/OpenType.

Today, TeX is transitioning again - from DVI/PS to PDF, scalable fonts have replaced bitmap PK fonts, Unicode and OpenType are replacing 8 bit encoded fonts thanks to the new engines that are widely available.

The 7 bit text and math fonts were developed at the same time, DEK needed them to typeset TAOCP. 8 bit text fonts were developed by European users for their own needs but math fonts weren’t. There are reasons for doing them though, and the ‘Aston’ project in 1993 and then the ‘newmath’ prototype in 1997/98,

OpenType math in MS Office 207: while we were waiting for STIX fonts, MS added a MATH table to OpenType, and Cambria Math font is a reference implementation.

There is acceptance of OpenType math: many concepts and idas from TeX were adopted by Microsoft, its officially still experimental but already a de facto standard, FontForge and XeTeX already support it, LuaTeX is likely to follow. Its likely that OpenType Math Support will be adopted in new TeX engines and new TeX fonts. And Unicode sorts out the issue of ‘math font encodings’ - the issue is not developing OpenType Math fonts.

The OpenType font format; developed by Adobe and Microsoft, its a vendor controlled specification and isn’t really open; it has concepts in Type1 and TrueType fonts; the table structure of TrueType; uses Unicode encoding; advanced typographic features like glyph positioning GPOS and glyph ….

The OpenType MATH table: Font specific global parameters, and some have direct relations to TeX parameters, and others are simplifications, although a few TeX parameters don’t have clear correspondence. TeX engines can use some workarounds for that. And glyph specific metric information.

Optical sizing is important for super/sub scripts, and METAFONTs typically have 5/7/10pt adjusted for readability.

Challenges presented by OpenType Math fonts: the scope of the project; a huge set of geometric symbols and alphabetical font shapes to be designed. There are organisational issues, the font extends across multiple Unicode planes (> 16 bits) and there are size variants and optical sizes to be packaged in un-encoded slots. technical issues, matching fontdimens and other TeX parameters to the MATH table, and mapping TFMs to glyph-specific metrics, and font substitutions too.

10:30

Q: 10-20 person years put into the OpenType MATH stuff, including Cambria implementation. They don’t claim their MATH table is generic; its specific to Cambria, and its an ongoing and infinite task…

A: Sure

Q: You left out something in the summary: Interface issues. Its useful to have Unicode math, and STIX fonts. But what about higher level interfaces?

A: Sure


Three Typefaces for Mathematics

Dan Rhatigan

10:50

This is not about technology, its about design issues.

I’ve been typesetting for a long time; using a lot of core configurations for dealing with math; as I get more into type design, I knew I had problems with type as a compositor/designer. So I was casting about for things to look at for this, and I found 3 case studies that bring up different issues.

  1. Times 4-line Mathematics Series 569, Monotype Corporation 1957 (based on Times New Roman, Stanley Morison and Victor Lardent, et al, 1931)

  2. AMS Euler, Zapf DEK et al, 1985

  3. Cambria Math, Jelle Bosma, Ross Mills, 2004

Trick things about maths? legibility in paragraphs is different to that in equations, they combine multiple styles scripts and symbols and the positioning and spacing is a kind of script of its own, moving vertically and horizontally and even back and forth.

Legibility, of letters, and readability of paragraphs.

….

Here’s a hand set equation using Modern Series 7, and here’s a machine-set equation using Times Series 569. You can see the x height was normalised and other changes, but the big thing was the italic’s slant was changed, 4’ to be more upright. Times had a 16’ slant which is quite a lot.

Here’s photos of the pattern drawings, with shapes highlighted, and overlapped, and you can really see the difference.

So Knuth also made a font of Modern Series 7, Computer Modern, and then had an idea for a new kind of approach, a CONTRAST of style, rather than a seamless blend. Zapf did the drawings that the typeface was bsaed on, but there was a rich correspondence between Zapf and DEK also, and the design pushed the boundaries of the technology it was meant for. “An upright italic with a casual twist” that reflected the tone of handwriting a mathematician would use. Eliminating the problem of how to fit all the pieces together with a slanted shape. It has the characteristics of a italic shape, though. The calligraphic forms also help. A notion Zapf got behind was not capturing a sense of fine formal broad-nib calligraphy, but the rough quick pen work of someone jotting down an equation. They started with book typography but moved away from it in the process.

The problems of making the subtleties of Zapf’s drawings come across in the digitisation with METAFONT by a team at Stanford. Here’s photos of the final drawings that Zapf submitted. There were subtle modulations

The team decided to drawn the OUTLINES with METAFONT instead of a stroke/nib skeleton/flesh model.

Cambria, the default Math font in Office 2007+ until more math fonts are developed. A focus on ClearType rendering; curves that move quickly from horizontal to vertical, avoiding large diagonal gestures wherever possible - so things render sharp and crisp on screen with ClearType.


Minion Math

I wanted a math font that improves over existing math fonts: something that is very consistent (Computer Modern uses some AMS Math glyphs…) and comprehensive and versatile (not just one width, one optical size, one weight)

Why start with Minion? I like it. It has Greek letters and optical sizes already. 1990 Adobe font, had Multiple Master versions, and then Greek glyphs.

Weights: Regular-Medium-Semi bold-Bold

Optical Sizes: Display-Subhead-Regular-Caption-Tiny

In the final release, they will offer full Unicode math support, full math alphabets, and a real Math italic. I plan to fill the Unicode block for mathematical characters totally.

Consistent look, consistent metrics.

Q: legal status?

A: yes I have a legal agreement, I’m licensed to use their trademark and to publish my font.


Cuneiform with METAFONT

Starting point for cuneiform is the basic elements, the wedges. I didn’t scan images of clay tablets, I’ve constructed the shapes in 3 variants, Classic, Filled and Academic.

I used MetaType1 to produce Type 1 fonts, then FontForge to generate OpenType, and I also use t1utils and others for the final result.

The MetaType1 package, was developed for the TeX Gyre project, and it runs MetaPost (any available version) to produce EPS files with outlines for all the glyphs, and collects the data together into one Type1 file. The MetaPost source files describe the glyph designs, and then additional macros are defined in a MetaType1 macro extension or appended by the user to combine them into a font.

Compound elements with intersections require “remove overlap” during compilation to Type1 and OpenType font formats.

TODO: I wish MetaType1 would be extended to MetaOpenType to produce OpenType directly.


Meta-Designing Parametrized Arabic Fonts For AlQalam

Ameer M Sherif

Hossam A H Fahmy

Here’s a reed pen nib, the traditional Arabic writing tool. Here’s the Naskh style of Arabic script, written right to left, and most letters connect - only 6 do not. And you have the same word written wider or shorter to justify the line as you like. Its not justified by the spaces between the words, as in Latin, but inside the words. There are a lot of ligatures, the same letter can have a very different shape depending on its position in a word. The 2nd and 3rd line of this slide are images from a Arabic calligraphy handbook.

There are other styles of Arabic; like roman, italic, fraktur for Latin. in Naskh you have a unit like an em, a scalable unit, and the base pen nib shape is a square at 45’

A vertical stroke is not really vertical, its not just two points, but 4 points describe it well, “z1..z2..z3..z4”, a 5th point is redundant often, although sharp bends and asymmetric strokes can require them.

There are primitives for Latin glyphs; vertical (stem, bow) and horizontal (arm, bay, turn, elbow) then secondary (nose, bar, dot) and then specialised parts (Q tail, R tail, a belly, g tail)

DEK used a simple set of primitives, and parametrised them to get a large set of glyphs. We want primitives to make letters more flexible and better connected.

We used 3 kinds of primitives:

  1. Some are used without any modifications in many letters

  2. Some are dynamic but change shape only a little

  3. Some are dynamic and change a lot

There are ‘approximate’ directions in calligraphy books, where ligatures are pretty different shapes to their component characters. METAFONT isn’t that smart yet, to learn over time ;), so we have to put that into the design. These are the 2nd kind above.

The 3rd kind are tricky; eg the “kashida” that doesn’t belong to one of the two letters, its a connection between the two. OpenType is buggy; you cannot have glyphs that change width on the fly; you have to predefine sizes. But the line-breaking algorithm ought to tell the font what width an Arabic character it wants.

The best OpenType fonts in Arabic, from Decotype in Holland, have a predefined width. This will create poor connections between joined up glyphs, but if you can have a smart font and line breaker, it will be smooth. (?)

Urdu is totally oblique, and so you need to look at the different Arabic writing styles for each font. Arabic is the most commonly used script after Latin; used for about 15 languages.

Taco and Hans were asking about when Arabic letters stack up; The baseline is the base; for combining letters, we benefit from the declarative nature of METAFONT. The horizontal positioning starts from the right, the vertical positioning starts from the left at the baseline, and the writing starts from the right.

Flexing and contracting with kashidas is a matter of personal taste of a calligrapher, so with type its something the type designer/typographer can decide. The length of the kashida is the length of the word, minus the minimum width of the letters.

We wrote a simple GUI for this: it reads input word(s) and parses them into character streams, lists the chars, manually select the letter-forms and length, then output files with selected letter-forms, lengths and order in word(s), and finally runs METAFONT and a DVIViewer. So we get complete words out of METAFONT using these primitives.

We tested 16 words with 30 people on a comfort scale of 1 to 5, and made a mean average of their opinions. We used Simplified Arabic and Traditional Arabic, that Microsoft ship, and DecoType Nashk - said to be the best available - and ours. We get 3.9/5, DecoType gets 3.2/5, trad 2.4 and simple 2.3. The big difference is the kerning, and Decotype isn’t doing a good job with the kerning right now.

Future?

We want automatic selection of the most suitable glyph shapes and sizes.

We want contextual analysis to choose the form, and line justification analysis to choose the size and ligatures. This will take a whole paragraph, and process the whole thing. You won’t know the shape of the first character of the first line until you’ve taken into account the last character of the last line. Very complex!

We want to meta-design all possible letter forms

We want to automatically place dots and other diacritic marks.

We’re not sure if its worth modelling the ink spread and movement speed of human calligraphers

We want to embed METAFONT sources into PDFs; if you want to re-flow things, you need to re-justify them. So the sources of METAFONT should be available in the PDF, and then in the PDF viewers have a METAFONT engine to re-typeset the paragraphs. PDF viewers have an OpenType engine, so why not a METAFONT engine? METAFONT is much much better than the tables of OpenType.

Finally, we want to support other Arabic writing styles. We haven’t finished this one yet, but plan to move forward

Q: Tom Milo (behind DecoType) has the ACE text layout engine as an InDesign plug-in that uses a special font format to set text, and these fonts can be ‘frozen’ into OpenType fonts for general use.

A:


Writing Gregg Shorthand with LaTeX and METAFONT

Gregg shorthand was made in 1888, the current version is the centennial version, it is a simplified alphabet for phonetic writing and brief forms and phrases. text2gregg.php at http://www3.rz.tu-clausthal.de/~rzsjs/steno/Gregg.php shows how this works; lets input “once upon a time there was a family that lived happily ever after” with a proof of 23 to make it larger.

Gregg has a lot of ligatures, and so we need to join curved (C) and vertical (V) strokes together - basically in 3 ways - CV VC and CVC. We use Hermite Interpolation for Bezier Splines to do this smoothly. …

Phonetic writing is done with “unisyn” from http://www.cstr.ed.ac.uk/projects/unisyn/

The 15 most frequent words in any text make up 25% of it.

text2Gregg works great!

CAVE CANEM - a Pompeii before 79AD, there is old Roman cursive, DEK, Herout-Mikulik, Gregg and Pitman. These meta-notations or shorthand notations are machine drawn; and do not confuse pen stenography with machine stenographer products in the US.

There is a book “Gregg shorthand adapted to Irish”, with copy inscribed “courtesy of john r Gregg 1930” (?)


My talk


Multidimensional Text

John Plaice

15:20

What is text? In many ways we are stuck in the typewriter age; Most formatting systems assume input and output strongly resemble each other; typewriter, telegraph, WYSIWYG, TeX/LaTeX, Unicode/XML

A sequence of typeset glyphs, and the characters that generate it, there is such a resemblance.

But documents needs to be editor (many times) edited (a lot too) annotated and searched.

This needs versioning (revisions and variants) input methods (sms, typing, audio) and output methods (raw text, formatted text, audio) and various kinds of text processing (spell check, typesetting, searching morphological analysis…)

What do we know? We need to move from one representation to another with only the inherent complexity of each process… We need separate input, output and internal representations (note the plural)

Chris Rowley (Kyoto 2003) wrote about this.

The solution already exists, it took a while to invent it and realise it was already invented; AVMs, or Attribute Value Matrices. Everything is an attribute valued list; values themselves can be AVMs. Any value is reachable through an index (“iterator”) AKA feature structures.

3 common structures; ordered sequence of ‘flat’ structures, ordered trees of ‘hierarchical’ structures, and matrices of multi-ordered data.


Parallel Typesetting

Toby Rahilly

16:00

Uses a physics model of forces to layout text! Cool!

Creative Commons License
The Notes from TUG2008 in Cork: Day 2 by David Crossland, except the quotations and unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Comments

Leave a Reply