Making ePubs from scratch

I had been working on a project while at Vectorform that required me to develop an ePub reader for the web. I ended up severely modifying epub.js, in order to meet all of the requirements our client requested. I’ve always wondered about ePubs, and this was a perfect opportunity to learn about them.

It turns out ePubs are merely ZIP archives with a hierarchy that follows the guidelines provided by the International Digital Publishing Forum, or IDPF.

Sample ePub Hierarchy

/mimetype
/META-INF/container.xml
/package.opf
/Content/chapter1.xhtml
/Content/chapter2.xhtml
/Content/chapter3.xhtml
/Content/nav.xhtml

Unfortunately, our client had a bunch of ePub 2 publications to display as readable previews, so I’ll try to point out the differences as I describe some of the ePub 3 specifics.

The first file that needs to be added to the archive (uncompressed) is the mimetype file, without an extension. This merely describes the archive as an ePub, so rendering and management software knows that the contents should follow the standards provided.

mimetype

application/epub+zip

The next file that is required is the container.xml file, which must be located in a META-INF directory in the root of the archive. The container directs software to a package file that describes the content of the ePub. There are some additional files that can be included in the META-INF directory, such as a file for setting iBooks-specific flags (com.apple.ibooks.display-options.xml) or some metadata about the publication as a whole, but the container is the only file that is required to exist in the META-INF directory.

container.xml

<?xml version="1.0" encoding="UTF-8" ?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
	<rootfiles>
		<rootfile full-path="package.opf" media-type="application/oebps-package+xml" />
	</rootfiles>
</container>

The package file is the only other required file to make up a valid ePub (although having no content would be, theoretically, pointless). The package file is broken up into a couple of sections, and is usually of the .opf extension, for Open Package Format.

package.opf

<?xml version="1.0" encoding="UTF-8" ?>
<package prefix="rendition: http://www.idpf.org/vocab/rendition/#" unique-identifier="unique-identifier" version="3.0" xmlns="http://www.idpf.org/2007/opf">
	<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
		<!-- ... -->
	</metadata>

	<manifest>
		<!-- ... -->
	</manifest>

	<spine>
		<!-- ... -->
	</spine>
</package>

First, any metadata about the contents of the ePub (the publication, itself), can be added to the metadata element. There are a few elements based on something called the Dublin Core that are required: dc:title, dc:creator, dc:language. In addition, there are various meta elements that can be added to further explain the contents, but not all software utilizes them.

example metadata nodes – package.opf

<!-- File Information -->
<dc:format id="format">application/epub+zip</dc:format>
<dc:identifier id="unique-identifier"></dc:identifier>
<dc:identifier id="issn-identifier"></dc:identifier>
<meta property="dcterms:modified"></meta>

<!-- Publication Information -->
<dc:title id="main-title"></dc:title>
<meta refines="#main-title" property="file-as"></meta>
<meta refines="#main-title" property="title-type">main</meta>

<dc:title id="collection-title"></dc:title>
<meta refines="#collection-title" property="title-type">collection</meta>

<dc:title id="short-title"></dc:title>
<meta refines="#short-title" property="short-type">short</meta>

<dc:title id="subtitle"></dc:title>
<meta refines="#subtitle" property="title-type">subtitle</meta>

<dc:title id="extended-title"></dc:title>
<meta refines="#extended-title" property="title-type">extended</meta>

<dc:description id="description"></dc:description>
<dc:subject id="subject"></dc:subject>

<dc:date id="publication-date"></dc:date>
<dc:language id="primary-language"></dc:language>
<dc:type id="publication-type"></dc:type>

<!-- Author Information -->
<dc:creator id="creator"></dc:creator>

<!-- Publisher Information -->
<dc:publisher id="publisher"></dc:publisher>
<meta refines="#publisher" property="role" scheme="marc:relators">pbl</meta>

<!-- Copyright Information -->
<dc:rights></dc:rights>

<!-- ePub Properties -->
<meta property="rendition:layout">pre-paginated</meta>
<meta property="rendition:orientation">auto</meta>
<meta property="rendition:spread">auto</meta>

Next, a manifest element describes the location of every facet of the ePub, including images, HTML files, stylesheets, audio files, etc. Software utilizes this information to preload and prepare content to make the reading experience as pleasant as possible. Each item can be given an ID for referencing later.

example manifest nodes – package.opf

<!-- Navigation Document (Table of Contents) -->
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav" />

<!-- Images -->
<item id="000-image" href="Content/Images/000.jpg" media-type="image/jpeg" properties="cover-image" />
<item id="001-image" href="Content/Images/001.jpg" media-type="image/jpeg" properties="cover-image" />
<item id="002-image" href="Content/Images/002.jpg" media-type="image/jpeg" properties="cover-image" />
<item id="003-image" href="Content/Images/003.jpg" media-type="image/jpeg" properties="cover-image" />

<!-- XHTML -->
<item id="000" href="Content/XHTML/000.xhtml" media-type="application/xhtml+xml" />
<item id="001" href="Content/XHTML/001.xhtml" media-type="application/xhtml+xml" />
<item id="002" href="Content/XHTML/002.xhtml" media-type="application/xhtml+xml" />
<item id="003" href="Content/XHTML/003.xhtml" media-type="application/xhtml+xml" />

<!-- CSS, JavaScript, Audio, ... -->

Lastly, the package contains a spine element, which describes a logical reading order for the content files of the publication. These, typically, refer to XHTML files in the manifest, using the IDs provided above. Usually, this means adding each chapter or page, in order. You can also add some attributes to suggest how to render these spine items, but not all software is utilizing those attributes.

spine element – package.opf

<itemref idref="000" linear="no" properties="rendition:page-spread-center" />
<itemref idref="001" />
<itemref idref="002" />
<itemref idref="003" />

There are, really, two types of content, but an ePub can theoretically contain both types in the same archive: reflowable and fixed. Reflowable content is usually just a bunch of paragraph and heading tags, used appropriately, to display the textual content of typical book, with a few images interspersed. A fixed layout is a specific-size publication, such as a magazine or children’s book, which may have images and text positioned relative to the page or each other.

Your content will typically be XHTML files, though it is possible to just use an image as a spine item. I think the IDPF is trying to enforce accessibility, so using an XHTML file, even if the actual content is just an image, allows alt tags and additional logic to be applied, such as JavaScript files, viewport dimensions, and searchable text.

reflowable.xhtml

<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<link href="../Styles/global.css" rel="stylesheet" />

		<title>Reflowable Example</title>
	</head>

	<body>
		<h1>Chapter 1</h1>
		<p>Lorem ipsum dolor sit amet&hellip;</p>
		<p>Lorem ipsum dolor sit amet&hellip;</p>
		<p>Lorem ipsum dolor sit amet&hellip;</p>
		<p>Lorem ipsum dolor sit amet&hellip;</p>
		<p>Lorem ipsum dolor sit amet&hellip;</p>
	</body>
</html>

fixed.xhtml

<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<link href="../Styles/global.css" rel="stylesheet" />

		<meta name="viewport" content="width=1000, height=1500" />

		<title>Fixed Layout Example</title>
	</head>

	<body style="background-image: url( '../Images/000.jpg' );">
		<h1 style="left: 50%; top: 25%;">Lipsum</h1>
		<p style="left: 50%; top: 75%;>Lorem ipsum dolor sit amet</p>
	</body>
</html>

You may also wish to include a navigation file, or a Table Of Contents, which provides easier navigation of the content of the ePub to the end-user. While the spine provides an ordered-list of chapters, or pages, the nav.xhtml (or toc.ncx in ePub 2) provides a list of links to specific chapters or pages (actually, ePub CFI links), so the user can move about independently of the spine.

nav.xhtml

<?xml version="1.0" encoding="UTF-8" ?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
	<head>
		<title>Sample Book</title>
	</head>

	<body epub:type="frontmatter">
		<header>
			<h1>Contents</h1>
		</header>

		<nav epub:type="toc">
			<ol>
				<li>
					<a href="Content/XHTML/0000.xhtml">Cover</a>
				</li>
				<li>
					<a href="#">Section 1</a>
					<ol>
						<li>
							<a href="Content/XHTML/0001.xhtml">Subsection 1</a>
						</li>
						<li>
							<a href="Content/XHTML/0005.xhtml">Subsection 2</a>
						</li>
						<li>
							<a href="Content/XHTML/0010.xhtml">Subsection 3</a>
							<ol>
								<li>
									<a href="Content/XHTML/0013.xhtml">Sub-subsection 1</a>
								</li>
								<li>
									<a href="Content/XHTML/0016.xhtml">Sub-subsection 2</a>
								</li>
							</ol>
						</li>
					</ol>
				</li>
				<li>
					<a href="#">Section 2</a>
					<ol>
						<li>
							<a href="Content/XHTML/0020.xhtml">Subsection 1</a>
						</li>
						<li>
							<a href="Content/XHTML/0025.xhtml">Subsection 2</a>
						</li>
					</ol>
				</li>
				<li>
					<a href="Content/XHTML/0030.xhtml">Section 3</a>
				</li>
			</ol>
		</nav>
	</body>
</html>

And there you have an ePub. Take all these files and package them up in a ZIP accordingly (I’ve used ePub Zip-Unzip 2.1.1 for Mac) and you’ve got a self-contained publication file that can be opened just about anywhere (even the web!). I decided to re-create Nintendo Power‘s first issue (NP#1 – July/August 1988) as a learning exercise and it turned out great, although manually re-typing all of the content for accessibility purposes would be quite time-consuming. In case you were wondering, I believe the magazine scans I used came in a CBR/CBZ from RetroMags.com.

Download Nintendo Power, Issue #1 ePub

View full source for NP#1: package.opf
View full source for NP#1: nav.xhtml

Nintendo Power Cover

Nintendo Power Contents

Nintendo Power ePub