I’ve been making ePubs for several years, and I have taught courses on ePubs at GraphExpo and at Cal Poly, where I am employed as a professor. For the first few years I was very grumpy about ePubs, as they were not created nicely by InDesign. There were sometime serious flaws as a the result of exporting a document to ePub format from InDesign. Adobe improved that a lot (still a little bit of work to do there!) and now I find that InDesign-generated ePubs are nearly correct, and that makes the job easier.
Though advertised as being able to create an ePub (ready to publish), InDesign doesn’t get everything right, and it’s very important to check all ePubs for compliance with the ePub standard. This is most easily done in a shareware program called Sigil (available online).
When you open an ePub file in Sigil, the contents of that file are exposed, and fully editable. In one backwater menu of Sigil is a green checkbox for verifying your ePub against the international ePub standard. The most common error that I find is that the date in the ePub is not formatted correctly. InDesign doesn’t force the information correctly from its IPTC data, so it usually needs to be repaired. If you enter a date as Month/Day/Year, it’s not compliant. ePub wants the date to be Year-Month-Day (with hyphens). It’s easy to fix in the content file in Sigil.
ePubs are very similar to web sites. They contain a master folder (Site), an images folder for any images that are embedded in the book, and a Text folder for all of the contributing text (this is slightly different from a web site). They also contain a Fonts folder that holds any fonts used in the ePub, and unlike a web site, they contain a Contents file and a TOC file for the table of contents.
ePubs are specially-compressed ZIP files. They can be unzipped to view the contents, and the contents can be edited by hard-code editing in a program like Text Wrangler. Sigil makes it much easier, because it converts otherwise unreadable components of the ePub package into editable files. If you unzip an ePub file successfully (and it is easy to do so unsuccessfully), you can work on the component files, but you must be meticulous when zipping the files back to ePub format. It’s safer to use Sigil, and let that program put them back together when you are finished.
Examining the components of an ePub gives you a good idea of the structure of these files. Let’s start with the ePub file itself, then open it to see its components. The .epub suffix is covering the document that is actually a .zip file. If you change the suffix, and unzip the file, you (might) get the following folder.
Once an ePub file is unzipped, the resulting folder can be seen with three subfolders. The two marked with a red dot here are used by e-book readers to recognize the component files of the book.
Inside that folder is the content in three main parts: META-INF, mimetype, and OEBPS (Open eBook Publication Structure). Only the last of these three is of interest to us. Inside the OEBPS folder are several more folders including Images, Fonts, Text, Styles, content, and toc (table of contents). The content and toc files are text files, but must be modified to be read by a text editor (Sigil does this for you).
Once the OEBPS folder is opened, it reveals these four folders and two files. This is the core of an ePub.
In the example here I have made a book from a public domain text of L. Frank Baum’s Wizard of Oz. That book has only one image, its cover, which was made into a JPEG and a PNG by InDesign at the time of export. I discarded the JPEG in Sigil, because it’s not actually used by the ePub, and it takes up valuable space. When producing books with numerous illustrations, those will show up in the Images folder. Careful editing of an ePub will usually allow you to discard extra image files that are not used in the book.
In my Fonts folder are only two fonts, both in OpenType format, that are associated with this book (other books could have more). The Apple iBooks app was updated last year to acknowledge and display embedded fonts, so viewing this book on an iPad, for example, will display it in the fonts that I used to create the book. Other book readers will not do this, substituting generic fonts instead. Only OpenType and TrueType fonts work with book readers. PostScript fonts will not work.
In the Text folder are as many text files as there are chapters in the book. Each one is an xhtml-tagged file with the contents of the chapter. It is easy to edit these text files, and that makes last-minute corrections possible. This past week I made an eleventh-hour change to an ePub just before converting it to a Kindle book file and uploading that file to Amazon, saving several steps and considerable time.
The Styles folder usually contains only one file, the CSS (Cascading Style Sheet) file for the book. Here you find the xhtml-encoded styles for your book, and here you will discover immediately if you left anything in the original book unstyled, as InDesign will create as many styles as it needs to make your book work. I have a personal policy that everything on every page of my books must be styled, and this saves InDesign the trouble of creating extra styles on the fly. I like my CSS to be clean and succinct, without superfluous styles for single words or lines of type that I left unstyled in the original.
The toc.ncx file contains the table of contents in xhtml format. NCX stands for Navigation Center eXtended. This is a text file, but it’s best to let Sigil open it for you and make it editable, as opening it in a text editor is risky. In the table of contents file is a reference to each a href (just like a web link), which will deliver the book reader to the appropriate chapter. In the book I made, there are only the 24 chapters in the table of contents, and the contents document is just one level deep. In another book I created recently a multilevel table of contents resulted in four pages of content material.
The content.opf file is a file that contains a manifest of all of the content material in the book. It lists the chapter texts, the contents components, the cover art and any other illustrations in the book. OPF stands for Open Packaging Format, part of the ePub standard. It is legible and editable by those who understand its construction, so edit with care. In one book a few months back I discovered an errant page in my book. I removed the page, then removed the reference to the page from the manifest, and the book was repaired. Had it been any more complex than that, I probably would have gone back to the original and figured out how to fix it there.
In coming days I will write more about ePubs, so you have more geekiness to look forward to.