Overview
This specification is for the gempub ebook format (GPUB). GPUB enables the creation and transport of digital books and other documents in a single file, while maintaining an easier to implement specification than the many, more complex, XML-based formats (FB2, EPUB3, etc).
Purpose and Scope
The purpose of the gempub ebook format is to provide an easy way for end users to create and read books without the need for complex tooling or difficult to learn markup.
This document provides the following details for GPUB files:
- Describes and references all components of a gpub file (metadata, markup, images, and navigation structures)
- Provides metadata keys and value descriptions
- Specifies how the reading order of the final publication is determined, as well as providing a mechanism to allow for global navigation
- Specifies to developers what to do when unexpected/unsupported extensions or divergences from the specification are encountered
Definitions
book- The collection of text represented by a GPUB
devloper- A person following this specification to create a GPUB or to create software that, in turn, creates a GPUB
GPUB- The publication format defined by this document, this term is synonymous with "gempub"
gemtext document- A gemtext document is a complete and valid gemtext document as defined by the
text/geminispecification index.gmi- The gemtext document used by this specification to determine reading order and provide global navigation
metadata.txt- The metadata document used by this specification to set metadata, navigation document, and other file properties
reader- A person who reads a publication
reading system- Hardware/Software that accepts GPUB publications and makes them available to consumers of the content
zip archive- A container/archive file created in accordance with the .ZIP file format specification
Relationship to Other Specifications
This specification combines applications of other specifications. These combine to produce a GPUB file according to this specification.
- Gemini hypertext format, aka "gemtext", specification
- .ZIP File Format Specification
- BCP 47 (Language Tags)
- JPEG/JIFF Format
- Portable Network Graphics (PNG) Specification (Third Edition)
The GPUB Format
Gempub is a container format in the form of a zip archive containing gemtext documents and, optionally, metadata. The—unregistered, at present— mime/media type of the archive is application/gpub+zip. The filetype suffix .gpub is used.
The root path of the final zip archive MUST contain either a valid index.gmi file or a metadata.txt file, that points to a valid index.gmi file, or both. When the OPTIONAL metadata.txt file is present in the root of the zip archive, the REQUIRED index.gmi file may be in a subfolder at the developer's discretion.
A GPUB MAY include any number of image files—including a single cover image defined within the metadata.txt file—and as many gemtext documents as are desired. A GPUB MUST NOT include files other than gemtext documents, images, and a single metadata.txt file.
Publication Metadata
The file metadata.txt contains metadata for the GPUB. This information is intended to be used by reading systems for display to readers and to fulfill the functioning of the reading system.
The metadata.txt file contains key/value pairs separated by a colon (:) and ending with a single line feed (\n). Keys start at the beginning of a line and end with a colon. Values start after the first colon in a given line and are associated with the key that comes before said colon. Whitespace MAY surround keys and values and SHOULD be trimmed by reading systems when displaying the keys/values. A developer MAY include empty lines, but MUST NOT extend the metadata.txt file with keys not referenced by this specification. The order of entries in a metadata.txt file is not mandated and reading systems should not expect any particular ordering of metadata.txt entries.
A valid metadata.txt file MUST contain a title and a gpubVersion, but MAY contain other keys and values. If an index key and value are not provided, an index.gmi file MUST be present in the root directory of the archive.
A Simple Example metadata.txt file
The following example includes the two required entries, and nothing more. In this example, a reading system should assume that an index.gmi file resides in the root of the zip archive.
title: My Book! gpubVersion : 1.0.1
Metadata Keys and Values
Keys are case sensitive.
author- The author of the book
charset- The character encoding used by the gemtext documents contained in the GPUB. When this key is not present reading systems MUST assume UTF-8
copyright- A copyright notice for the book
cover- A path to the book's cover image. The cover image MUST be in the JPG or PNG formats. The path MUST be relative to the root of the zip archive and MUST use UNIX-style path separators (
/) gpubVersion- The version of the GPUB specification the GPUB in question conforms to. This key is REQUIRED and reading systems MUST present an error to the user if the key/value are not present or the value is invalid. If conforming to this document the value would be:
1.0.1 index- A path to the
index.gmifile to be used for navigation. The path MUST be relative to the root of the zip archive and MUST use UNIX-style path separators (/). If this key is absent a reading system MUST look for theindex.gmifile in the root of the zip archive language- A
BPC 47language code (e.g.en-USoren) license- A license notice for the book
publishDate- The date the book was published. The format is
YYY-MM-DD, e.g.1943-10-24 published- The year the book was published. The format is
YYYY, e.g.1943. This is a fallback topublishDatefor when a precise date is not known and SHOULD be used only whenpublishDateis not used revisionDate- The date the book was last revised. The format is
YYY-MM-DD, e.g.1955-11-02 title- The title of the book. This key is REQUIRED and reading systems MUST present an error to the user if the key/value are not present
version- A human readable version identifier for the book. This MAY be used by reading systems to differentiate different editions of the same book
wordcount- The total number of words in the book containing only digits (no commas or letters) in base 10, e.g.
41350
Publication Table of Contents
The table of contents for a GPUB book determines the reading order of the gemtext documents. The table of contents is the index.gmi file, whether it be in the zip archive's root, pointed to by a metadata.txt file, or both. The index.gmi file is REQUIRED and there MUST only be one, and only one, of them in the zip archive root or listed in the metadata.txt file as the value represented by the key: index. If the index.gmi file cannot be found, or is not valid gemtext, a reading system MUST display an error to the reader and stop further processing of the GPUB file.
The index.gmi file MUST be a valid gemtext document and MAY contain any text that a gemtext document can normally contain. Reading systems SHOULD parse the links, in order, from the index.gmi file to determine what files are a part of the book and what order they should be presented to the user. The index.gmi file is a special file intended to define navigation and is not displayed to readers. Reading systems SHOULD list table of contents information based on the index.gmi file as suits their application. To gurantee a reader sees the entire index.gmi file, and not just the the table of contents a reading system chooses to display, a developer MAY include a link to the index.gmi file in the index.gmi file itself (making it a part of the reading order/book navigation). All links in the index.gmi file MUST be relative to the index.gmi file itself. Reading systems SHOULD ignore non-local link lines in the index.gmi file.
All files linked to by the index.gmi file MUST be valid gemtext documents.
Book Content
The book content consists of gemtext documents and images. Reading systems MUST refuse to navigate to or display files that are not gemtext documents, JPG images, or PNG images.
Frontmatter, book text, and backmatter SHOULD all be represented by individual gemtext documents and referenced in the index.gmi file.
Gemtext Documents
All line types native to gemtext documents are allowed. Links to other files within the GPUB itself MUST use relative links where the path is relative to the index.gmi file's location. A reading system MAY may choose to follow or open external links in separate prorams. If external links are to be ignored by a reading system the reading system MUST still display the link text to the reader.
A single new line type has been added to allow for section breaks within a work in a universal way, rather than ad hoc solutions utilizing preformatted
text or extra line breaks: A line containing only three hyphens is a section break (---) and reading systems MAY render
this in a manner they deem suitable to their use case. This extension suits the display of books specifically and degrades in a way that make sense to
most readers when displayed as plain text.
Gemtext does not support inline types such as bold or italic. It is common to notate those in documents similarly to how they are marked up in markdown. They
degrade in a way that is comprehensible to many and MAY be used—as they already are in many gemtext and plaintext documents—in GPUB documents
with the caveat that developers should expect that most reading systems will render them as plain text. The underscore-based notation (_italic_ and __bold__) should be used in these
cases rather than the asterisk-based notation to prevent parser confusion with the asterisk used for list items in gemtext. This paragraph should be taken as
an acknowledgement of an existing practice, and not an extension of gemtext (like the above section break), with guidance for how to avoid issues.
Images
Any images the book contains MUST be JPG or PNG format images and must be linked to from a valid gemtext document with a path relative to the index.gmi file. Reading systems MAY display images inline, link to images and display them after link interaction, attempt to open them in external programs, or ignore them. A reading system MUST always display the text for the image link as an "alternative text" representation of the image for accessibility purposes.
Acknowledgements
The gempub specification appears to have been initially created by Gogledd-Orllewin.
The original specification lives in the following locations (both are archived git repositories): Codeberg and Github.
As no further work seems to have been done on this specification it has been reworked, but not substantially changed. You are reading the result as this 1.0.1 version.
Appendix I: Examples
A simple, gemlog-style GPUB structure
Here is an example zip archive structure for a simple GPUB:
my-book.gpub/
index.gmi
gemlog1.gmi
gemlog2.gmi
gemlog-other.gmi
whatnot.gmi
contact.gmi
A more comlpex GPUB structure
The following GPUB zip archive structure might be more typical of a novel or other more complex work:
my-novel.gpub/
metadata.txt
images/
my-cover.jpg
plate-1.png
plate-2.png
source/
index.gmi
titlepage.gmi
chapter-1.gmi
chapter-2.gmi
chapter-3.gmi
chapter-4.gmi
chapter-5.gmi
chapter-6.gmi
about-the-author.gmi
colophon.gmi
copyright.gmi
In the above, the metadata.txt file will have the value source/index.gmi for the key, index in order allow reading systems to parse out the contents and reading order.
Table of Contents
An example table of contents (`index.gmi`) might look something like the following:
=> titlepage.gmi Titlepage => chapter-1.gmi Chapter 1: In Which We Meet Our Hero => chapter-2.gmi Chapter 2: In Which Our Hero Meets Their End => about-the-author.gmi About the Author => colophon.gmi Colophon => copyright.gmi Copyright
That mostly mirrors the above "complex GPUB structure" example. A developer that wants to add more content to an index.gmi file MAY include a self-reference in the table of contents so that it is fully displayed to users:
# My Book by Some Person => index.gmi Table of Contents => titlepage.gmi Titlepage => chapter-1.gmi Chapter 1: In Which We Meet Our Hero => chapter-2.gmi Chapter 2: In Which Our Hero Meets Their End => about-the-author.gmi About the Author => colophon.gmi Colophon => copyright.gmi Copyright
In that example a reading system would base the navigation solely on the link lines, but since one of the link lines is the index.gmi file it would be shown to the reader as a part of the document order.