Notes from TUG2008 in Cork: Day 1
2008-07-20
20 years of TeX Development; a personal perspective
09:15
Frank Mittelbach
Late 80s
Early enthusiasm!
Saw the program but couldn’t use it, then later got Latex 2.08 with PCTeX, but I had 512k RAM and it couldn’t run. So I wrote FM-TeX as a mini-LaTeX, and I focused on speed and lightness.
doc and docstrip; about 85% of the LaTeX code on CTAN uses this for documentation and installation
Other packages - array, multicol, theorem, varioref, etc - and I won the Don Knuth Scholarship for multicol.
I identified LaTeX’s shortcomings - poor math support, hardwired fonts, no color or graphics support, no extension interface, missing non-English language support, no input-encoding support, no consistent internal programming language, (nearly) no high level internal interfaces - and a fairly simple pag model
So we discussed at a conference with Don and talked him into various changes, and TeX 3.0 was announced. This opened up non-English usage, but basically froze the processor for a long time.
The Early 90s
The community was running at full speed!
Babel development, NFSS development, E-TeX: guidelines for future TeX extensions (1990). 1990, Cork encoding, a single community wide standard that supported many languages, although it over-did some things which still haunt us today. PostScript fonts exploded, and so fontinst came along, AMS-Math development was started, and LaTeX2e beta was released in 1993, then fixing bugs and doing input encoding support, and then the first official release in June 1994.
The shortcomings I identified were mostly fixed;maths, fonts, languages, input colour and graphics, extension interface, etc.
But still had some: no consistent programming language, no high level internal interface, and a still a fairly simple page model.
And so then there was the first consolidation phase.
The LaTeX Companion book’s 1st Edition, translated to German French Russian and Japanese, and sold over 100,000 copies. And the Graphics Companion was also important.
The Late 90s
Going mainstream!
We tried to establish LaTeX as a product to build up a big user base. We developed regression test suites and had regular maintenance release - 17 by now. Test driven development is typical today but we did it in the early 90s.
We did a lot of hacks to optimise things, which was necessary in those days because computers were slow, but comes back to haunt us. We developed a new kernel in expl3 in 1992, but never published it until 1995 or so, which was a mistake. And we tried to solve some of the license issues, with the LPPL in 1999, and that becoming a free software license in 2003 after 1600 messages in debian-legal.
47% of CTAN is LPPL, 25% unknown, 17% GPL, 5% Public Domain…
But at this time maintenance becomes more and more rigid and slow.
The Early 00s
This was the second consolidation phase (03-07) and we released The LaTeX Companion 2nd Edition, a 90% rewrite, doubled the size, and translated into German and French. And the Graphics Companion is also released.
2008
expl3 available and usable to solve the lack of a consistent internal programming language, and a template interface available within LaTeX2e more high level interface via packages.
But the big problem is the simple page model; this is a limitation of the underlying TeX engine. But, would a replacement system reach the critical mass of users needed to make people switch?
So, is it dead?
Not yet, but the sharks are circling.
Not yet, but it is fragile.
Its still strong, but old, and changing it might kill it.
So the real answer is a mix of 2 and 3. we have solved many large issues i n the past, and the consolidation efforts helped. A semi-optimal standard that is used is better than nothing.
A big dilemma in TeX is that all major development have been by individuals or small groups - students, academics, or old dinosaurs hanging around - and all the large projects with committees have failed. This is good for development but not so good for maintenance.
LaTeX is an exchange protocol; if I send something to you, I expect it to look identical on both our computers.
For LaTeX to evolve, we need to identify what would make people switch (many USA users still use 2.09 seeing no need for updates), we need a clean update and upgrade path for software and documents.
09:50
Q: Can you replace the underlying engine and still have it compatible?
A: Gradual stuff, yes. But a different model? No.
Q: Is LaTeX doomed?
A: This is was I meant by 3, there are a lot of old people who used it a long time, and few new users.
A Pragmatic Toolchain
Steve Peter
10:00
I consult for http://www.pragprog.com. Who are they? Andy Hunt and Dave Thomas were programmers working for years and years, AT&T and elsewhere, and they started comparing notes, and realised their jobs were the same jobs over and over. They wrote “The Pragmatic Programmer” and people didn’t read it and follow the advice and they kept doing the same jobs. They suggest documenting things, and using LaTeX to do that. They fell in love with Ruby, thought directly in it, and at the time all the docs were in Japanese, and they wrote “Programming Ruby” (now in 3rd edition). My mind works more like Perl, I’m a linguist by training, I figure something out and 6 months later have no idea how it works :)
So these guys bought back the rights to the books and started their own publishing house, the Pragmatic Bookshelf. They typeset TPP themselves, the original was in TROFF, and the publisher suggested using TeX - their in house designer used TeX - and so when they started the Pragmatic Bookshelf, they inherited that tool chain.
So I inherited that tool chain. A few years in, I got an email asking if I knew anyone who knew TeX and XSLT. So I went to the bookstore for a book on XSLT and here I am :)
Our tool chain: The source our authors write is XML, our own DTD “Pragmatic Programmers Book” (PPB), and use make, Ruby, XSLT, TeX and PDF to process that. All the books we sell in print and as PDF downloads. Both are PDFs but there are two different routes; the print PDFs go through Acrobat because our printers don’t like PDFs from other sources. But the online PDFs go via dvips and then ps2pdf.
We don’t use PDF’s DRM although we do watermarking.
Since our tool chain is almost all free software, all our authors can have copies and test things out. Its cross platform, UNIX centered. So when I have my author XML, I just run “make all” and the PDF is built. Normal make switches effect the build pathway for print or screen.
pragprog.sty (with the memoir class file) is how we format the TeX part.
We try to keep the XML as the canonical source, with a pipeline transformation to reading formats, so we can expand to other eBook formats as they become established.
What are out “issues”?
Principles: XML is always the canonical source; everything much be automated.
URLs and hyperrefs in general are a pain.
The biggest problem is making re-flowable PDFs for eBooks. We get an email a day asking us when we’ll support Kindle and so on. I’m under pressure to produce re-flowable PDFs. I can’t do that with stock TeX, and I hope I can figure this out this week ;) LuaTeX? No. XeTeX? No.
And do we used LaTeX? ConTeXt? Eplain? A new format?
We can provide some funding (in dollars…) to do this.
We might switch away from TeX to do this. XSL-FO does have tool chains to produce re-flowable PDFs. It doesn’t look as good as our TeX based chain, so perhaps we’ll add a whole new pipeline with XSL-FO for eBooks.
Q: How to authors write the XML?
We offer authoring packages for TextMate and Emacs. I used to use Emacs but now use Textmate.
Q: How do you process XSLT?
A: Xerces I think.
Q: We have not packaged TeX so it can be used as a reflow engine for PDF files; perhaps we ought to. Can we make a BOF session about this?
A: Sure
Q: Why not using pdfTeX?
A: The original team used psTricks for lots of things so we can’t switch to that, but yes that is the right thing to create re-flowable PDFs.
…
Kindle has a black and white screen, and has only 2 fonts - serif and sans, no fixed width font or advanced math support.
Q: Could you get XSL-FO out of TeX?
A: Perhaps, not sure..?
Q: Does the optimal route pass through XSL-FO?
A: Not sure either :)
Developing your own Document Class
Niall Mansfield
10:30
I wrote “The Joy of X” in LaTeX in 1990, and it was written in STOP format: from Hughes Aircraft. Every section is 2 pages. It has a title, and then a theme summary. Can have table, ol and ul. It also had a per-chapter TOC. Subsections are also 2 pages. The section title is repeated here.
I wrote “Practical TCP/IP” for Addison Wesley initially, got the rights back, and wanted to set it myself with LaTeX. I used Emacs, Inkscape and Subversion. My old TeXspert was an amazon founder so he wasn’t around any more and I was on my own :)
So this was LaTeX2.09, 1400 lines of code including 550 comments.
We tried to convert the class file. this was dreadful. Writing a whole class is impossible for normal people, you don’t know how components work or what to include or omit…
LaTeX2e is wonderful, it looks like normal programming (instead of stack programming) and it has lots of useful packages.
So the new style file: We assume book.cls, its 300 lines of code plus comments, and 34 and \requirepackage’s, 7 \newenvironment convenience functions, and 54 \newcommands. We have a tricky 34 line \@sect{} code though.
We rely on class to do as much as possible; we can rely on standard classes and packages, stuff that is been around for years and is stable and well tested. Whenever we changed anything, we could reuse other people’s things, and changed as little as possible.
We keep changes to the absolute minimum.
No “tidying up”!
Lessons learned?
I had an 800 page book in word, used a proprietary word2tex converter.
Hard stuff: \@sect{} is not for sissies. macro processing is not C, Java, etc. The TeX syntax is crazy; one time, I could work out how to change a “==” to a “>=” expression. Will LuaTeX or PyTeX solve this?
But having done 3 different books with 3 different tool chains - POST, Word and LaTeX - I thought LaTeX is best.
Desirables? Annotable source, where 2 authors can comment without changing the text directly, and “live” diffs where you can see the changes in the text clearly. Word has this as its WYSIWYG.
And does XML mean we’re wasting our time with LaTeX?
10:50
LaTeX, Lilypond, Perl
Joe McCool
11:30
I am a LaTeX user. An amateur, I try things until they work, and don’t really know what I am doing. I work in the countryside, yesterday is the first time I met someone who knew what LaTeX was.
I am an amateur musician too; I made a book of tunes with some connection to boating/water/maritime stuff. I made a book about a canal we revived with LaTeX and loved it, and so wanted to make a cookbook or songbook next.
Folk music is often learned by ear; its copied by ear, person to person. Folk players are not strong sight readers. This effected my requirements:
Integer number of tunes per page, cross referencing for grouping songs, text snippets associated with the tunes, nice big fonts and colour, easy to input the song data, good control of output, good community support, and a high quality output.
GNU Lilypond’s /lilypond-learning/Engraving.html documentation has some images of various musical fonts. Its called “engraving” rather than “typesetting” for traditional reasons although its really typesetting. For the typesetting of classical music, the quality must be 100%, because anything less can effect the quality of the music being played.
I had other requirement too: Midi production, a standard format for storing music data so people can listen to things. I want the process form source to output to be as automated as possible, and I want web publishing.
There are some proprietary software packages: Finale, Sibelius are the main ones, Noteworthy Composer and Cakewalk are others. I rejected these quickly because they are proprietary. There are free software: MusicTeX, psTricks, GNU Lilypond. I looked at these and GNU Lilypond is the best IMO.
Classical music world regards Lilypond as very high quality. Finale and Sibelius have nice GUI interfaces, and have various interesting features, but their printed output is poor, and in fact they will output in GNU Lilypond format.
GNU Lilypond is under heavy active development, started by some dutch musicians in 1996, it has a strong community development, has a strong free software ethic and is heavily based in hacker culture which can be a strain for musicians without that background. It has partial compilation, a mode for beloved Emacs, and a classical Unix command line interface. It incorporates Scheme; I haven’t got far enough to play with that, but its powerful. The documentation has some good examples of what it is capable of. A lot of space to be handled, a lot of ink to be placed, and it meets the challenge well. A solid UNIX like tool.
…demo…
LaTeX3 Project
Jonathan Fine
12:20
I work at the Open University - basically a strange kind of publishing house - in the LTS - I don’t teach, I prepare course materials - and we wanted PageMaker style layout but done with LaTeX.
Here’s a page full of figures, and the editors are keen the figures be on the same page as the reference in the text.
From 1992 to 1995, the members of the LaTeX3 project cleaned up LaTeX2.09 to create 2e, identified remaining problems in 2e, and published statements of their goals. The Open University has been working on their SGML/XML goals: Allowing document elements to have attributes, and solve real-world production problems. We plan to release it as free software.
At the 1997 TUG, Frank Mittelback and Chris Rowley presented a paper on ‘The LaTeX3 Project’ that was later published in XML Coverpages. They wanted a syntax to automatically convert popular SGML DTDs into LaTeX.
Attributes are meta data belonging to an element; eg a figure element has attributes like caption, src image fie, copyright, location in page (margin, body), and raise/lower. Eg,
\Figure{
\label{fig:top}
\caption{a spinning top}
\src{file.png}
\copyright{Ty Coon}
\raise{2pc}
}
Attributes are keys for building a dictionary, not control sequences to be executed; elements have a list of allowed keys, required keys, default values, they can’t be specified twice; the parser can normalise and validate their values, and if there is an error the \do@Figure is not called.
docx2tex: Word 2007 to TeX
14:15
Krisztian Pocza, Mihaly Biczo, Zoltan Porkolab
c:\bin\docx2tex.exe paper.docx paper.tex
This is our PhD work, in Hungary
Overview
This is a small tool that uses ECMA and ISO standard formats so we can trust them ;) and its free software.
Motivation
Word 2007 is not very good a typography. TeX is better. But Word is easier to use, WYSIWYG, and it supports collaboration and team work. So we wanted to connect them.
Features & benefits
Existing solutions: propriety programs exist, mostly as Word plug ins that cannot be tool chained, some only support RTF, and OpenOffice.org can read old Word formats and output some TeX but its format support in both ways has many shortcomings.
docx2tex is free software, it supports most parts of documents although not Word Equations or Drawings (yet). Normal text, text formatting, alignment, lists, figures, tables, listings, cross references, image conversion (via ImageMagick)
Applications & Use Cases
We use it for scientific publications … Our work flow might have several authors, and Word 2007 handles this well, tracking changes and merging forks, and so when we arrive at a final docx file, we convert it to TeX, polish that up, and submit it.
Technical Details
…
License and availability
The license is X11. GPL is cancer. http://codeplex.com/docx2tex/ Codeplex is a sourceforge from Microsoft. Its written in C# and Microsoft Visual Studio 2008
Demonstration
…
Questions
Q: use Mono to run it?
A: Depends on Miguel. We use .Net 2.0 and I don’t know how much Mono supports that. Maybe. We use the system.io packagine namespace that is part of .net 3
Q: The OOXML specification is complex; what areas did you have trouble with?
A: …
Q: Roundtrip back to OOXML?
A: Interesting.
TeXWorks: Lowering the Barrier To Entry
Jonathan Kew
14:45
Been working in XeTeX for the last few years, but recently been doing something different. But there is a link: something that drove XeTeX was that it made things easy that used to be hard.
using a new font in latex used to be intimidating, and there was fontinst to help, but many users never got over that initially barrier that its hard and complex and TeX meant Computer Modern, Palatino or Times.
A big reason for XeTeX adoption is using any font you want easily.
Another big big issue in TeX is is known for being really good, especially at maths and science stuff, but the other side is that people typesetting things who are not doing maths and science, who are frightened off by $ and so on.
So how can we make TeX more accessible to people who aren’t typesetting equations daily?
What does TeX look like when a newcomer arrives? They have to write a TeX document, and need a text editor to do that. Lets looks at some examples of TeX editors:
TeXniCenter. Lots and lots of buttons, cryptic abbreviations for processes you might run…
WinEdt. Very similar, lots of buttons, 1/4 of the screen is used for writing your document…. For someone used to double clicking an icon on the desktop and writing their document immediately, this is frightening.
Kile. The same again. Lots of Greek mathematical symbols…
LaTeXEditor. Same.
We know what they are like. There are other kinds, like Emacs, but I got out of the UNIX world and don’t use it any more.
And then there is Dick Koch’s TeXShop. TeX on Mac OS X has been very popular, largely because of TeXShop. This is not frightening. It is easy to start and to use initially.
There didn’t seem to be anything else like this.
Maybe there is a place for something with a similar interface that is cross platform?
Wikipedia says “The introduction of TeXShop caused a TeX-boom among Macintosh users” and this true. It presents only the essentials, and it has a simplified workflow straight to PDF instead of DVI and so on, and it had a few really neat user interface features - the magnifying glass feature [from XDVI!] and the ability to click somewhere in the output and be taken to the area of source that created that.
TeXWorks is an effort to build a similar program, to give a similar experience, in a portable way. It builds on portable free software. I’ve been working on it in a spare time, hobbyist manner, using existing components and putting them together. I’m using Poppler which is the most popular free software PDF viewing library, and QT, a popular free software GUI toolkit. Those are the key pieces.
What do we hope to do?
A simple text edit. Unicode support, using standard OpenType fonts, multi level undo/redo, search and replace with regex, usual stuff.
Tools to execute TeX to create PDF.
A preview window to view the output. This is unusual; a PDF viewer that is integrated; anti-aliased PDF, opens automatically when TeX finishes, auto-refreshes when you rerun TeX that stays at the same page/view, the magnifying glass feature, and Jerome Lauren’s “SyncTeX” technology to jump around from source to output and back.
Power user features, but that must not complicate the UI for the newcomer. Code folding, interaction with external editors/viewers, and so on.
This presentation is being done with the TeXWorks viewer, but here’s a demo of the UI with sample2e.tex
If we command-click a selection in the source, it takes our viewer to that spot, and likewise the other way. This is SyncTeX at work.
Q: What about page numbers?
A: Right, there is no source to jump to. Jerome’s presentation will show off what SyncTeX can do, its just integrated with TeXWorks.
Here’s the sample for the polyglossia package - a Unicode replacement for babel - and this shows the on-the-fly spellchecker doing red underlining of the source, and we’re set to English so almost everything is misspelled, but if we change the spelling language to German, the German paragraph checks out fine, same for Greek, or we can turn the integrated off. Its based on hanspell (?) and of course is free software.
Using Mac OS X here, I could use TeXShop, but here’s a GNU/Linux machine, and load it up.
Here’s the templates feature: It includes templates for common kinds of documents, and here is a new document based on that template, lets pick a Beamer class, and run TeX on this. So if you install it, and have any standard TeX distribution like TeXLive, it will just run straight away.
And for completeness, lets see it on Windows XP and on Vista. You can see it works precisely the same way.
An earlier presentation took a long time to start up because it used a lot of .Net libraries, but this starts very quickly even though it uses a lot of free software libraries.
So the main thing I want people to do is join in development!
I can’t do everything. There’s a lot of ways I’d like to see the TeX community contribute. Its C++, hosted at Google Code. It has command completion that needs work, specifically. The QT Linguist tool allows localisation and translation.
You can also use it for real work and provide feedback.
There is currently zero documentation and tutorials. Pages suitable for integrated help would be great.
If you can package it for your OS, please do, I haven’t done any of that.
http://tug.org/texworks/ is the homepage
http://code.google.com/p/texworks/ is the development home (source code repo, downloads of binary packages, issue tracker, wiki for developer notes…
Q: Other areas for development?
A: How to integrate images is tricky and scares new users. Old timers who have an investment in PostScript based work-flows aren’t well supported today, but they aren’t the users we’re aiming at, but I’d be happy to accept contributions that support this.
Q: My son is computer illiterate. We told him to throw out Word, took an hour to teach him TeX, and his teachers praise him for the high quality of his work, and he’ll never go back to Word. It only takes an hour.
A: It should!
Q: What about Lyx, ScientificWord? Aren’t they meant for novices?
A: They have a place. But they contain the kinds of documents you can write. They work great for a narrowly defined domain, but they are not general purpose. This can do anything that can be done with TeX.
MPLib
16:15
Tako Hoekwater
John Hobby’s MetaPost. In Pascal, so unpleasant to work with. We thought it would be good to modernise the whole system, at least a little bit, and we got some funding to do this.
We wanted a reusable METAPOST component, and completely re-entrant, so several programs could use the library without duplication in system memory of the same program. Indirect I/O so you can use the library without touching the hard disk ever, and simplifying the subsystem for labels, and having totally dynamic memory allocation - the original MetaPost had static memory allocation so if you needed to do a big job, you needed to recompile it to be able to access more memory.
Problems we ran into?
Pascal WEB is very old fashioned; many global variables, static arrays, string pool, and complicated compilation…
CWEB is a single language to replace Pascal and C, and compilation only depends on ctangle; we have a single C library, with a “mpost” front-end …
Restructuring, we redid the instance structure, revisited the string pool, and isolated the PostScript back end - so the core creates a set of objects, and then those are converted to PostScript, and could be converted to something else. We have a C name-space (mp_…)
Usage: Set up MPLib options, create an MPLib instance, …
Example C program to take input at a command line and run it.
It has Lua bindings, so here’s a Lua example that will work as Lua code within LuaTeX. The core of the program is a string delimited by [[ and ]]
…
TODO: Dynamic memory allocation for everything, MegaPost (range and precision limits), configurable error strategies, and internationalisation of messages (a nice feature to have, via GNU gettext) and finally, expand the API.
There is no way to say, here’s an equation, what does it look like? And then better use of MPLib can be made in other programs.
http://www.tug.org/metapost/ is the homepage
MPLib has generated the graphics for these slides - the background bookshelf image is generated with random variables, if you look closely.
Q: Other output back-ends other than PostScript?
A: Well I wrote the PostScript back-end in 2 days, and have a private prototype using Lua to create PDF directly, and I think SVG would be very feasible.
[I wonder if it could output Spiro splines…]
MPLib
Hans Hagens
Was going to be a demo but now just a talk…
This follows up on what Taco has said about MPLib, about ConTeXt MkIV
This is a large rewrite of ConTeXt, as ConTeXt has become quite large, so v4 is a slimming down release.
We started using MetaPost over 10 years ago, when graphics were embedded as EPS. Sebastian Rahtz challenged me to write a MetaPost to PDF converter in TeX so we could use them directly. This meant you could use TeX fonts inside MetaPost. We added some extensions (shading, transparencies, etc). Embedded text was taken care of very efficiently, avoiding reruns totally in the end. We managed to include MetaPost source in a document source, and this allows reusing graphics but with awareness of the state of the document (dimensions, colors, etc) and such graphics can be really cool - well integrated with background mechanisms and adapted to situations… layout, font and other contextual variables are available to graphics. These features are stable and frequently used by users, with no real in-depth knowledge of how it all works required.
…
The MetaPost run time is now almost zero overhead.

The Notes from TUG2008 in Cork: Day 1 by David Crossland, except the quotations and unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Comments
Leave a Reply
