ePubs – under the hood

Blognosticator Head

I’ve been making ePubs for several years, and I have taught courses on ePubs at GraphExpo and at Cal Poly, where I am employed as a professor. For the first few years I was very grumpy about ePubs, as they were not created nicely by InDesign. There were sometime serious flaws as a the result of exporting a document to ePub format from InDesign. Adobe improved that a lot (still a little bit of work to do there!) and now I find that InDesign-generated ePubs are nearly correct, and that makes the job easier.

Though advertised as being able to create an ePub (ready to publish), InDesign doesn’t get everything right, and it’s very important to check all ePubs for compliance with the ePub standard. This is most easily done in a shareware program called Sigil (available online).

ePub icon

When you open an ePub file in Sigil, the contents of that file are exposed, and fully editable. In one backwater menu of Sigil is a green checkbox for verifying your ePub against the international ePub standard. The most common error that I find is that the date in the ePub is not formatted correctly. InDesign doesn’t force the information correctly from its IPTC data, so it usually needs to be repaired. If you enter a date as Month/Day/Year, it’s not compliant. ePub wants the date to be Year-Month-Day (with hyphens). It’s easy to fix in the content file in Sigil.

ePubs are very similar to web sites. They contain a master folder (Site), an images folder for any images that are embedded in the book, and a Text folder for all of the contributing text (this is slightly different from a web site). They also contain a Fonts folder that holds any fonts used in the ePub, and unlike a web site, they contain a Contents file and a TOC file for the table of contents.

ePubs are specially-compressed ZIP files. They can be unzipped to view the contents, and the contents can be edited by hard-code editing in a program like Text Wrangler. Sigil makes it much easier, because it converts otherwise unreadable components of the ePub package into editable files. If you unzip an ePub file successfully (and it is easy to do so unsuccessfully), you can work on the component files, but you must be meticulous when zipping the files back to ePub format. It’s safer to use Sigil, and let that program put them back together when you are finished.

Examining the components of an ePub gives you a good idea of the structure of these files. Let’s start with the ePub file itself, then open it to see its components. The .epub suffix is covering the document that is actually a .zip file. If you change the suffix, and unzip the file, you (might) get the following folder.

 

Structure of an ePub part 1

Once an ePub file is unzipped, the resulting folder can be seen with three subfolders. The two marked with a red dot here are used by e-book readers to recognize the component files of the book.

Inside that folder is the content in three main parts: META-INF, mimetype, and OEBPS (Open eBook Publication Structure). Only the last of these three is of interest to us. Inside the OEBPS folder are several more folders including Images, Fonts, Text, Styles, content, and toc (table of contents). The content and toc files are text files, but must be modified to be read by a text editor (Sigil does this for you).

Structure of an ePub 2

Once the OEBPS folder is opened, it reveals these four folders and two files. This is the core of an ePub.

In the example here I have made a book from a public domain text of L. Frank Baum’s Wizard of Oz. That book has only one image, its cover, which was made into a JPEG and a PNG by InDesign at the time of export. I discarded the JPEG in Sigil, because it’s not actually used by the ePub, and it takes up valuable space. When producing books with numerous illustrations, those will show up in the Images folder. Careful editing of an ePub will usually allow you to discard extra image files that are not used in the book.

Structure of ePub fonts

In my Fonts folder are only two fonts, both in OpenType format, that are associated with this book (other books could have more). The Apple iBooks app was updated last year to acknowledge and display embedded fonts, so viewing this book on an iPad, for example, will display it in the fonts that I used to create the book. Other book readers will not do this, substituting generic fonts instead. Only OpenType and TrueType fonts work with book readers. PostScript fonts will not work.

Structure of ePub text

In the Text folder are as many text files as there are chapters in the book. Each one is an xhtml-tagged file with the contents of the chapter. It is easy to edit these text files, and that makes last-minute corrections possible. This past week I made an eleventh-hour change to an ePub just before converting it to a Kindle book file and uploading that file to Amazon, saving several steps and considerable time.

Structure of ePub, CSS

The Styles folder usually contains only one file, the CSS (Cascading Style Sheet) file for the book. Here you find the xhtml-encoded styles for your book, and here you will discover immediately if you left anything in the original book unstyled, as InDesign will create as many styles as it needs to make your book work. I have a personal policy that everything on every page of my books must be styled, and this saves InDesign the trouble of creating extra styles on the fly. I like my CSS to be clean and succinct, without superfluous styles for single words or lines of type that I left unstyled in the original.

Structure of ePub TOC

The toc.ncx file contains the table of contents in xhtml format. NCX stands for Navigation Center eXtended. This is a text file, but it’s best to let Sigil open it for you and make it editable, as opening it in a text editor is risky. In the table of contents file is a reference to each a href (just like a web link), which will deliver the book reader to the appropriate chapter. In the book I made, there are only the 24 chapters in the table of contents, and the contents document is just one level deep. In another book I created recently a multilevel table of contents resulted in four pages of content material.

The content.opf file is a file that contains a manifest of all of the content material in the book. It lists the chapter texts, the contents components, the cover art and any other illustrations in the book. OPF stands for Open Packaging Format, part of the ePub standard. It is legible and editable by those who understand its construction, so edit with care. In one book a few months back I discovered an errant page in my book. I removed the page, then removed the reference to the page from the manifest, and the book was repaired. Had it been any more complex than that, I probably would have gone back to the original and figured out how to fix it there.

In coming days I will write more about ePubs, so you have more geekiness to look forward to.

 

About Brian Lawler

Brian Lawler is an Emeritus Professor of Graphic Communication at California Polytechnic State University, San Luis Obispo. He writes about graphic arts processes and technologies for various industry publications, and on his blog, The Blognosticator.
This entry was posted in Software, Technology and tagged , , , , , , , , , , . Bookmark the permalink.

2 Responses to ePubs – under the hood

  1. ninsuhn says:

    Hi:

    Thanks for this breakdown. I’ve been trying to figure out if it’s possible to edit / add to the contents of an epub while reading on a computer. That is, to simply click in any space and start typing, thereby adding to the text and augmenting the contents. I don’t mean adding notes in a different space, just typing inside the text.

    I’m sorry if this seems random, it’s just that I seem to do my best thinking while reading what I’ve written in epub format, and it would change everything if I would just place the cursor inside the paragraph and start typing.

    Would this entail changing one of the above files to be permanently editable?

    Thanks in advance for your time.

    • Brian Lawler says:

      Hi Ninsun,

      Apologies for my late reply to your query.

      You can certainly edit your ePub in Sigil with the ability to switch back and forth between the text view (easy for editing) and preview mode. I do this often, and I find it very handy.

      Sigil is also very nice for making small corrections to a “finished” ePub, without having to go all the way back to the beginning and starting over.

      My vote is for Sigil.

      Best wishes,

      Brian P. Lawler

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.