{"id":1920,"date":"2015-01-10T09:28:12","date_gmt":"2015-01-10T17:28:12","guid":{"rendered":"http:\/\/thelawlers.com\/Blognosticator\/?p=1920"},"modified":"2015-01-10T09:28:12","modified_gmt":"2015-01-10T17:28:12","slug":"epubs-under-the-hood","status":"publish","type":"post","link":"https:\/\/thelawlers.com\/Blognosticator\/?p=1920","title":{"rendered":"ePubs \u2013 under the hood"},"content":{"rendered":"<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2014\/07\/Blognosticator-Head.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1755\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2014\/07\/Blognosticator-Head.png\" alt=\"Blognosticator Head\" width=\"252\" height=\"115\" \/><\/a><\/p>\n<p>I\u2019ve been making ePubs for several years, and I have taught courses on ePubs at <em>GraphExpo<\/em> and at Cal Poly, where I am employed as a professor. For the first few years I was very grumpy about ePubs, as they were not created nicely by InDesign. There were sometime serious\u00a0flaws as a\u00a0the result of exporting a document to ePub format from InDesign. Adobe improved that a lot (still a little bit of work to do there!) and now I find that InDesign-generated ePubs are nearly correct, and that makes the\u00a0job easier.<\/p>\n<p>Though advertised as being able to create an ePub (ready to publish), InDesign doesn\u2019t get everything right, and it\u2019s very important to check all ePubs for compliance with the ePub standard. This is most easily done in a shareware program called <em>Sigil<\/em> (available online).<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/ePub-icon.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1921\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/ePub-icon.jpg\" alt=\"ePub icon\" width=\"191\" height=\"288\" \/><\/a><\/p>\n<p>When you open an ePub file in <em>Sigil,<\/em> the contents of that file are exposed, and fully editable. In one backwater menu of Sigil is a green checkbox for verifying your ePub against the international ePub standard. The most common error that I find is that the date in the ePub is not formatted correctly. InDesign doesn\u2019t force the information correctly from its IPTC data, so it usually\u00a0needs to be repaired. If you enter a date as Month\/Day\/Year, it\u2019s not compliant. ePub wants the date to be Year-Month-Day (with hyphens). It\u2019s easy to fix in the content file in Sigil.<\/p>\n<p>ePubs are very similar to web sites. They contain a master folder (Site), an <em>images<\/em> folder for any images that are embedded in the book, and a <em>Text<\/em> folder for all of the contributing text (this is slightly different from a web site). They also contain a <em>Fonts<\/em> folder that holds any fonts used in the ePub, and unlike a web site, they contain a <em>Contents<\/em> file and a <em>TOC<\/em> file for the table of contents.<\/p>\n<p>ePubs are specially-compressed ZIP files. They can be unzipped to view the contents, and the contents can be edited by hard-code editing in a program like <em>Text Wrangler<\/em>. Sigil makes it much easier, because it converts otherwise unreadable components of the ePub package into editable files. If you unzip an ePub file successfully (and it is easy\u00a0to do so <em>unsuccessfully<\/em>), you can work on the component files, but you must be meticulous when zipping the files back to ePub format. It\u2019s safer to use Sigil, and let that program put them back together when you are finished.<\/p>\n<p>Examining the components of an ePub gives you a good idea of the structure of these files. Let\u2019s start with the ePub file itself, then open it to see its components. The .epub suffix is covering the document that is actually a .zip file. If you change the suffix, and unzip the file, you (might) get the following folder.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-part-1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1922\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-part-1.jpg\" alt=\"Structure of an ePub part 1\" width=\"697\" height=\"894\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-part-1.jpg 697w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-part-1-233x300.jpg 233w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/a><\/p>\n<p><em><span style=\"color: #0000ff;\">Once an ePub file is unzipped, the resulting folder can be seen with three subfolders. The two marked with a red dot here are used by e-book readers to recognize the component files of the book.<\/span><\/em><\/p>\n<p>Inside that folder is the content in three main parts: META-INF, mimetype, and OEBPS (Open eBook Publication Structure). Only the last of these three is of interest to us. Inside the OEBPS folder are several more folders including Images, Fonts, Text, Styles, content, and toc (table of contents). The content and toc files are text files, but must be modified to be read by a text editor (Sigil does this for you).<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1923\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-2.jpg\" alt=\"Structure of an ePub 2\" width=\"697\" height=\"567\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-2.jpg 697w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-an-ePub-2-300x244.jpg 300w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/a><\/p>\n<p><em><span style=\"color: #0000ff;\">Once the OEBPS folder is opened, it reveals these four folders and two files. This is the core of an ePub.<\/span><\/em><\/p>\n<p>In the example here I have made a book from a public domain text of L. Frank Baum\u2019s <em>Wizard of Oz.<\/em> That book has only one image, its cover, which was made into a JPEG and a PNG by InDesign at the time of export. I discarded the JPEG in Sigil, because it\u2019s not actually used by the ePub, and it takes up valuable space. When producing books with numerous illustrations, those will show up in the Images folder. Careful editing of an ePub will usually allow you to discard extra image files that are not used in the book.<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-fonts.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1924\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-fonts.jpg\" alt=\"Structure of ePub fonts\" width=\"962\" height=\"682\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-fonts.jpg 962w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-fonts-300x212.jpg 300w\" sizes=\"auto, (max-width: 962px) 100vw, 962px\" \/><\/a><\/p>\n<p>In my\u00a0<em>Fonts<\/em> folder are only two fonts, both in <em>OpenType<\/em> format, that are associated with this book (other books could have more). The Apple iBooks app was updated last year to acknowledge and display embedded fonts, so viewing this book on an iPad, for example, will display it in the fonts that I used to create the book. Other book readers will not do this, substituting generic fonts instead. Only <em>OpenType<\/em> and <em>TrueType<\/em> fonts work with book readers. <em>PostScript<\/em> fonts will not work.<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-text.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1925\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-text.jpg\" alt=\"Structure of ePub text\" width=\"825\" height=\"1256\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-text.jpg 825w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-text-197x300.jpg 197w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-text-672x1024.jpg 672w\" sizes=\"auto, (max-width: 825px) 100vw, 825px\" \/><\/a><\/p>\n<p>In the <em>Text<\/em> folder are as many text files as there are chapters in the book. Each one is an xhtml-tagged file with the contents of the chapter. It is easy to edit these text files, and that makes last-minute corrections possible. This past week I made an eleventh-hour change to an ePub just before converting it to a Kindle book file and uploading that file to Amazon, saving several steps and considerable time.<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-CSS.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1926\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-CSS.jpg\" alt=\"Structure of ePub, CSS\" width=\"697\" height=\"885\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-CSS.jpg 697w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-CSS-236x300.jpg 236w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/a><\/p>\n<p>The <em>Styles<\/em> folder usually contains only one file, the CSS (Cascading Style Sheet) file for the book. Here you find the xhtml-encoded styles for your book, and here you will discover immediately if you left anything in the original book unstyled, as InDesign will create as many styles as it needs to make your book work. I have a personal policy that everything on every page of my books must be styled, and this saves InDesign the trouble of creating extra styles on the fly. I like my CSS to be clean and succinct, without superfluous styles for single words or lines of type that I left unstyled in the original.<\/p>\n<p><a href=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-TOC.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1927\" src=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-TOC.jpg\" alt=\"Structure of ePub TOC\" width=\"1184\" height=\"479\" srcset=\"https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-TOC.jpg 1184w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-TOC-300x121.jpg 300w, https:\/\/thelawlers.com\/Blognosticator\/wp-content\/uploads\/2015\/01\/Structure-of-ePub-TOC-1024x414.jpg 1024w\" sizes=\"auto, (max-width: 1184px) 100vw, 1184px\" \/><\/a><\/p>\n<p>The <em>toc.ncx<\/em> file contains the table of contents in xhtml format. NCX stands for <em>Navigation Center eXtended.<\/em> This is a text file, but it\u2019s best to let Sigil open it for you and make it editable, as opening it in a text editor is risky. In the table of contents file is a reference to each <em>a href<\/em> (just like a web link), which will deliver the book reader to the appropriate chapter. In the book I made, there are only the 24 chapters in the table of contents, and the contents document is just one level deep. In another book I created recently a multilevel table of contents resulted in four pages of content material.<\/p>\n<p>The content.opf file is a file that contains a <em>manifest<\/em> of all of the content material in the book. It lists the chapter texts, the contents components, the cover art and any other illustrations in the book. OPF stands for <em>Open Packaging Format,<\/em> part of the ePub standard. It is legible and editable by those who understand its construction, so edit with care. In one book a few months back I discovered an errant page in my book. I removed the page, then removed the reference to the page from the manifest, and the book was repaired. Had it been any more complex than that, I probably would have gone back to the original and figured out how to fix it there.<\/p>\n<p>In coming days I will write more about ePubs, so you have more geekiness to look forward to.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I\u2019ve been making ePubs for several years, and I have taught courses on ePubs at GraphExpo and at Cal Poly, where I am employed as a professor. For the first few years I was very grumpy about ePubs, as they &hellip; <a href=\"https:\/\/thelawlers.com\/Blognosticator\/?p=1920\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25,30],"tags":[63,64,465,466,464,467,471,470,468,469,472],"class_list":["post-1920","post","type-post","status-publish","format-standard","hentry","category-software-2","category-technology","tag-blognosticator","tag-brian-lawler","tag-e-book","tag-electronic-publishing","tag-epub","tag-indesign-epub","tag-meta-inf","tag-oebps","tag-sigil","tag-structure-of-epub","tag-toc-ncx"],"_links":{"self":[{"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/posts\/1920","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1920"}],"version-history":[{"count":1,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/posts\/1920\/revisions"}],"predecessor-version":[{"id":1928,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=\/wp\/v2\/posts\/1920\/revisions\/1928"}],"wp:attachment":[{"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thelawlers.com\/Blognosticator\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}