Updated 2017-02-26 20:33:47 by ABU

ABU 26-feb-2017

tclMuPdf is a porting of the MuPDF framework (see at mupdf.com), for fast and high-quality rendering of PDF pages.

History

  • 13-dec-2016 - Version 1.0b1 (beta) released. No support for MacOS
  • 15-dec-2016 - Version 1.0 - Support for MacOS. API unchanged, but big internal optimization for reusing opened pages (read
  • 28-jan-2017 - Version 1.1 (Withdrawn) - new commands: fields, field, anchor, mupdf::libinfo . Added package mupdf-notk for tcl-only usage. (read the documentation). Aligned with core library MuPDF v.1.10a
  • 23-feb-2017 - Version 1.2 (Withdrawn) with a lot of new features:
    • commands for extracting and searching text
    • commands for extracting images from PDF pages (experimental)
    • first steps towards PDF manipulations: you can add new signature fields, then save your changes.
    • many minor auxiliary commands
  • 26-feb-2017 - Version 1.2.1 - BugFixing - replaces Version 1.1 and 1.2
    • a bug related to the saveImage command, introduced in 1.1 was fixed here.
    • Versions 1.1 and 1.2 were withdrawn

Download

Version 1.2.1 is still distributed in a pre-built package with multi-platform support, but if you only need support for a single platform, you can download a lighter package.

Note that a specific platform support (e.g. "Linux 32") is not referred to the hosting O.S. architecture, but it's referred to the architecture of the TclTk interpreter. E.g. if you have a 32-bit TclTk interpreter running on a 64-bit Linux, you need the tclMuPdf package for linux-x32.

  • [1] FULL (Win 32/64, Linux 32/64, MacOS) (Warning: 28 MB)
  • [2] (Win 32/64) (11 MB)
  • [3] (Win 32 only) (6 MB)
  • [4] (Win 64 only) (6 MB)
  • [5] (Linux 32/64) (11 MB)
  • [6] (Linux 32 only) (6 MB)
  • [7] (Linux 64 only) (6 MB)
  • [8] (MacOS 64 only) (6 MB)
  • ---
  • [9] tclMuPdf Development-Kit. For developers/maintainers.

Examples

codeoutput
$page0 savePNG img0.png -zoom 0.5
$page2 savePNG x2.png -zoom 2.75 -from 50 450 200 700

  Reference manual


tclMuPDF 1.2 Tcl meets MuPDF

Tcl meets MuPDF

SYNOPSIS edit

package require Tcl 8.5

package require mupdf ?1.2?

  • mupdf::open filename
  • pdfHandle fullname
  • pdfHandle version
  • pdfHandle npages
  • pdfHandle getpage n
  • pdfHandle anchor anchorName
  • pdfHandle openedpages
  • pdfHandle closeallpages
  • pdfHandle fields
  • pdfHandle field fieldname
  • pdfHandle signatures
  • pdfHandle haschanges
  • pdfHandle export filename
  • pdfHandle search needle ?-startpage pagenum? ?-max hits?
  • pdfHandle search..more ?-max hits?
  • pageHandle size
  • pageHandle docref
  • pageHandle pagenumber
  • pageHandle savePNG filename ?-zoom zoom? ?-from x0 y0 x1 y1?
  • pageHandle saveImage image ?-zoom zoom? ?-from x0 y0 x1 y1? ?-to x0 y0?
  • pageHandle addsigfield fieldname x0 y0 x1 y1
  • pageHandle search needle ?-fromtop true/false? ?-max hits?
  • mupdf::close handle
  • mupdf::quit pdfHandle
  • mupdf::isobject handle
  • mupdf::type handle
  • mupdf::documents
  • mupdf::documentnames
  • mupdf::isopen filename
  • mupdf::libinfo
  • pageHandle experimental.images list ?-id imageID?
  • pageHandle experimental.images extract ?-id imageID? ?-dir pathname? ?-transparency 0xRRGGBBAA?

DESCRIPTION edit

Package mupdf integrates the MuPDF framework in Tcl. The focus of MuPDF is on speed, small code size, and high-quality anti-aliased rendering. The main goal of this integration is to generate images of the pdf pages, in a .png format, or directly in a Tk's photo image type. Thanks to its speed mupdf can be used for building interactive pdf-viewers with high-quality and real-time zooming. mupdf is a binary package, distributed in a multi-platform bundle, i.e. it can be used on

  • Windows 32/64 bit
  • Linux 32/64 bit
  • MacOS 64 bit

Just an example to get the flavor of how to use mupdf:
    # open a file and save 1st page as a .png file

    package require mupdf
    set pdf [mupdf::open /mydir/sample.pdf]
    set page [$pdf getpage 0]   ;# 0 is the 1st page
    $page savePNG /mydir/page0.png
    mupdf::close $pdf

MuPDF with and without Tk edit

Starting from version 1.1, you can also run mupdf from a tclsh interpreter, without loading Tk. The following command
package require mupdf-notk

can be used in a tclsh interpreter to load the package without requiring Tk support. You will be still able to save images as PNG files, but of course some subcommands related to Tk won't be available (e.g saveImage ) The command
package require mupdf

loads the full package (and requires Tk).

mupdf Commands edit

mupdf supports the following commands:
mupdf::open filename
This is the main command: it opens the PDF-file filename and returns a pdfHandle to be used in subsequent operations.
pdfHandle fullname
return the fully normalized pathname of the pdf-file.
pdfHandle version
return the document's internal PDF-version.
pdfHandle npages
return the number of pages.
pdfHandle getpage n
return a pageHandle to be used in subsequent operations. Note that first page is page 0. Note that if the requested page is currently opened, getpage reuses the handle of the opened page.
pdfHandle anchor anchorName
return the location of anchorName as list of 3 numbers:

  • a page number ( -1 if anchorName is not found )
  • x displacement
  • y displacement x and y are hints for displaying the page: they represent the displacement of the top-left corner of the page relative to the top-left corner of the window.
pdfHandle openedpages
return a list of all pageHandles currently opened related to pdfHandle
pdfHandle closeallpages
close all currently opened pages related to pdfHandle
pdfHandle fields
return a list of field-records. Each field-record is a list of three elements:

  • the field-name
  • the field-type (pushbutton, radiobutton, checkbox, text, combobox, listbox, signature or unknown)
  • the field-value Note that for a signature field, if a signature is present the returned field-value is simply the fixed string <<signature>>. Warning: field-names with accented characters or in general with non-ASCII charaters may require a special care when used. See the section Notes about field-names with accented letters for details.
pdfHandle field fieldname
return the field's value, or raise an error if fieldname is not a valid field. Warning: field-names with accented characters or in general with non-ASCII charaters may require a special care when used. See the section Notes about field-names with accented letters for details.
pdfHandle signatures
return a list of signature field-records . Each field-record is a list of two elements:

  • the field-name
  • the field-value Empty signature fields (blanc signatures) have a field-value equal to "" (empty string). Currently, filled signature fields are simply denoted with the fixed string <<signature>>.
pdfHandle haschanges
return 1 if pdfHandle has been changed, otherwise 0.
pdfHandle export filename
save the current document and its changes in an alternative filename. Note that the target filename should be different from the original pdf-file related with pdfHandle and in general, different from the name of any pdf-file currently opened in this process. When a pdfHandle is closed (see mupdf::close), all the changes will be saved to its original pdf-file. (see also mupdf::quit for closing without saving changes).
pdfHandle search needle ?-startpage pagenum? ?-max hits?
search the string needle starting from page-number pagenum (default is page 0) and returning up to hits results (default is 10). The result of search subcommand is a list of page-positions. Each page-positions is a list of two elements:

  • the page-number
  • a list with the 4 coords of the box enclosing the searched needle Next results can be retrived with the search..more subcommand.
pdfHandle search..more ?-max hits?
return a list of the next hits elements matching the last given needle. The result of search..more subcommand is a list of page-positions similar to the list returned by search.
pageHandle size
return the physical size of the page as a list of two decimal numbers. Note that page size is expressed in points, i.e. 1/72 inch.
pageHandle docref
return a reference to the related pdf-document as a pdfHandle
pageHandle pagenumber
return the pagenumber of pageHandler
pageHandle savePNG filename ?-zoom zoom? ?-from x0 y0 x1 y1?
render the page in a .png file named filename. With a default -zoom factor equal to 1.0, a page whose size is W x H points is rendered as a raster image of W x H pixels. If -zoom is specified, the resulting image size is scaled by a factor of zoom. By default the whole page is rendered; the -from option, allows you to render only a given rectangular area of the page. x0 y0 are the coords of the top-left corner and x1 y1 are for the bottom-right corner. These coords must be expressed in terms of the physical size of the page, i.e in points Note that if these coords lies outside of the page, only the intersection of this area with the page area is rendered.
    ...
    set page [$pdf getpage 0]   ;# 0 is the 1st page
    lassign [$page size] dx dy
     # save just the upper half of the page
    $page savePNG /mydir/page0.png -zoom 2.25 -from 0 0 $dx [expr $dy/2]
    mupdf::close $pdf
pageHandle saveImage image ?-zoom zoom? ?-from x0 y0 x1 y1? ?-to x0 y0?
render the page in an existing Tk's photo image. The width and/or height of image are unchanged if the user has set on it an explicit image width or height (with the -width and/or -height configuration options, respectively). About the -zoom and -from options, the same rules for the savePNG apply. Option -to allows you to place the resulting raster image at the x0 y0 coords of the destination image. By default, is -to 0.0 0.0 NOTE: this command is not available with the package mupdf-notk.
pageHandle addsigfield fieldname x0 y0 x1 y1
add a blank signature field in a rectangular box at coords x0 y0 x1 y1. fieldname must be unique among the existing field names.
pageHandle search needle ?-fromtop true/false? ?-max hits?
search the string needle in the current page and return up to hits positions (default is 10). By default search starts from top of the page. If you need to find then next hits, use option -fromtop false. The result of search subcommand is a list of positions. Each positions is a list with the 4 coords of the box enclosing the searched needle.
mupdf::close handle
if handle refers to a pageHandle, close the page. if handle refers to a pdfHandle, close the pdf (updating its original pdf-file) and all its opened pages;
mupdf::quit pdfHandle
close the pdfHandle without saving the changes.
mupdf::isobject handle
return 1 if handle is a valid reference to a pdf or a page.
mupdf::type handle
return document or page if handle is a valid reference, else raise an error.
mupdf::documents
return a list of the currently opened pdfHandles
mupdf::documentnames
return a list of pdf-filenames currently opened (fully normalized filenames).
mupdf::isopen filename
check if filename is among the currently opened pdf-files.
mupdf::libinfo
return specific attributes of the underlying MuPdf libray as a list of keywords and their values. The provided keywords are:
version
The version of the underlying MuPDf library
..more to come ..

Experimental features edit

mupdf 1.2 introduces some new experimental features for extracting images from PDF pages. The following commands are unsupported and probably they will be changed in the next release. Code has been tested with many PDFs containing different kind of images. Although many severe bugs have been fixed during the developement, running these commands may cause an application crash when dealing with unexpected image formats. Your feedbacks are welcomed for fixing bugs and providing suggestions for a more complete API.
pageHandle experimental.images list ?-id imageID?
return a list of all the images contained in the page referred by pageHandle. The result of experimental.images list subcommand is a list of image-records. Each image-record is a list of four elements:

  • image-ID (unique for each page)
  • image's width (in pixel)
  • image's height (in pixel)
  • mask-flag : 1 means that the image has a pixel-mask (i.e. some transparent pixels) If option -id is present, the resulting list is limited to the image-record for imageID.
pageHandle experimental.images extract ?-id imageID? ?-dir pathname? ?-transparency 0xRRGGBBAA?
if option -id is specified, extract and save a the image referred by imageID (see experimental.images list). If option -id is missing, all the images contained in a page are extracted and saved. If option -dir is not specified, images are saved in the current directory. Images are saved with a fixed naming convention: PPPP.iNN.ext (e.g. 0003.i00.jpg ) wherer PPPP is the page number, iNN is the the imageID (unique within a page), and extension is ".jpg" or ".png". Currently option -transparency has no effect. extract returns a list of extracted-records. Each extracted-record is a list of two elements:

  • image-ID (unique for each page)
  • saved filename (empty string if the image was skipped (unknow format...))

Notes about field-names with accented letters edit

Field-names returned by [pdfHandle fields] are not standard (Unicode) strings; they are binary string. For 'good plain' field names like "City", there is no difference: the returned binary-string and the literal (Unicode) string "City" are byte-by-byte identical" Now let's consider a field-name like this "Città" (italian term for "City"): the returned field name, even if it is represented like "Città", is not comparable with the literal Unicode string "Città", just because the former byte-array is made of 5 bytes (plus a '\0' string-terminator), whereas the unicode string is made of 6 bytes (plus a '\0' string-terminator). This may produce strange results when comparing these values, and in paticular mode, it may cause the field subcommand to produce unexpected results. Let's try with this interactive example:
 ...
 set fields [$pdf fields]
  # let's suppose the 0-th returned record is about the field "Città" ....
 
  # try to get the field value:
 set value [$pdf field Città] ;#  -->  error !!!

  # Workaround: let's take back the fieldname from the $fields list ... 
 set fieldname [lindex $fields 0 0] ;# field name is the 0-th elem of the 0-th record
 puts "$fieldname"  ;# -->  Città
 set value [$pdf field $fieldname]  ;# -->  ...  ok

KEYWORDS edit

pdf, photo

CATEGORY edit

pdf parsing and rendering