GSoC Idea: Parse TrueType/OpenType font data

Parse sfnt-housed font data

sfnt-housed is a term summarising several related font formats, including TrueType (.ttf, .ttc), OpenType (.otf), and OFF (Open Font Format); the name refers to the general container format all of these rely on. While this has mostly replaced the plethora of platform-specific font formats that were previously common, it is also a very complex format, and few programs make use of its full capabilities. This project aims to provide Tcl scripts access to the wealth of information present in these fonts.

Areas Fonts
Good (though not essential) if student knows binary, Postscript, a non-LGC language, XML, LaTeX
Priority Low
Difficulty Easy to Medium
Benefits to the student Learn about font technologies. Experience of working with binary data formats.
Benefits to Tcl Access to fonts and font metrics, for e.g. generating PS/PDF output, without having to have a GUI running. Access to more advanced typographic information.
Mentor Lars H

Project Description

A starting point for this project could be the code existing in the sfntutil program available here:

The primary functions present there is to dump data from sfnt-housed font files in text files that are both human- and machine-readable. For example, it can produce the following output (see TDL for an explanation of the basix syntax; XML output is also available):

 % sfntutil.tcl dump LinLibertineFont/Biolinum_Re-0.4.1.otf -only=head,OS/2
 sfnt-font tag OTTO {
    sfnt-table tag head start 236 length 54 {
       FontRevision hex 00006666 bcd 0.6666 num 0.4000 shortshort 0.26214
       /flag {Has strong right-to-left}
       /flag {Force ppem to integer}
       /flag {Left sidebearing point at x=0}
       /flag {Baseline at y=0}
       /setint designunits 1000
       /when created 1237593688
       /when modified 1237593688
       /FontBBox -1082.0 -247.0 6171.0 896.0
       /dontsetint lowestRecPPEM 8
       /dontsetint fontDirectionHint 0
    }
    sfnt-table tag OS/2 start 336 length 96 {
       /setint averagewidth 560.0
       /setint ascender 894.0
       /setint descender_neg -246.0
       /setint linegap 0.0
       /setint maxheight 894.0
       /setint maxdepth 246.0
       /setint xheight 432.0
       /setint capheight 648.0
       /setint sub1 140.0
       /setint sup2 479.0
       /scriptsizepos sub 650.0 699.0 0.0 -140.0
       /scriptsizepos super 650.0 699.0 0.0 479.0
       /Panose 2 0 5 3 0 0 0 0 0 0
    }
    /datum funit 1.0
    /datum indexToLocFormat 0
    sfnt-table tag {CFF } start 4580 length 265758
    sfnt-table tag FFTM start 290480 length 28
    sfnt-table tag GDEF start 270340 length 1032
    sfnt-table tag GPOS start 276300 length 14178
    sfnt-table tag GSUB start 271372 length 4928
    sfnt-table tag cmap start 2508 length 2038
    sfnt-table tag hhea start 292 length 36
    sfnt-table tag hmtx start 290508 length 9664
    sfnt-table tag kern start 300172 length 760110
    sfnt-table tag maxp start 328 length 6
    sfnt-table tag name start 432 length 2073
    sfnt-table tag post start 4548 length 32
 }

Another starting point (as YS points out) would be the existing TTF support in pdf4tcl. Features that does not currently support are:

  • Type 2 outlines (.otf fonts)
  • Kerning (data is found in kern table or GPOS table)
  • Ligatures and other glyph substitutions (data is found in GSUB table or mort table)
  • cmap subtable formats other than 4.
  • Container formats other than .ttf and .ttc.

Other directions for further work are:

  1. Extend sfntutil with parsers for such tables and subtable formats that are currently unsupported. (There are enough of those that one can easily spend a whole summer at it.) This is the most straightforward direction to take the project, and thus also the easiest (mostly being an exercise in turning information from a specification into working code).
  2. Design a package that lets Tcl scripts access sfnt-housed data; a goal might be to have a command that from a string and a font computes the corresponding sequence of glyphs and any position adjustments that should be applied to these. This is more difficult, as the student would have to design a useful API for accessing the data, but perhaps also most useful for the community in general. This would probably not be so practical to build on top of the sfntutil codebase, but the codebase could still be harvested for ideas and solutions.
  3. Turn the conversion around, giving sfntutil the ability to "assemble" fonts in addition to the current capability of "disassembling" them. Difficulty-wise, this is somewhere between the previous two.
  4. Develop code to do font subsetting (generate a partial copy of the font, with just the glyphs needed for a specific text; licences often require that this is done if a font is embedded in a document). This would be helpful for things like the postscript method of the Tk canvas, as the code it generates assumes someone else will make the necessary fonts available. pdf4tcl can do this for TrueType, but not for OpenType.

References

TrueType/OpenType is a format with a complicated history (involving at least three major software vendors), so there is no source which tells the whole story. Important sites are:

Comments & Discussion

Some comments here, and discussion of the idea

abu 13 march 2011

See [L2 ] for some astonishing results. It's a library written in Processing for parsing and rendering true-type fonts in an amazing way. http://www.caligraft.com/works/web-dibuixant/gallery/dibuixant-01.jpg

Lars H, 17 march 2011: And that is relevant how, precisely?

abu 18 march 2011 - This is very close to what you listed in point 2 of "Other directions for further work are" section.

The above picture has been generated with just two "parameters" : a string ("Caligraft") a font-file (TimesRoman.ttf .. or maybe another font-file) The hard work was to extract from the font-file the geometry of the "C" "a" "l" ... letters, and then, draw the curves in a new 'artistically-corrected' way. I think that at least the first part this work (extracting the letters geometry) should be one of the goals of this project.

YS 2011-03-14 pdf4tcl0.7 also includes .ttf and postscript fonts parser. It also does font subsetting for them.