Skip to content

The Importance of Semantic Export

With inspiration from my friend and mentor Doug Engelbart’s opening remarks of his famous 1968 demo, re-wrought for this post:


“The research program that I’m going to describe to you is quickly characterisable by saying: If, in your office, you as an intellectual worker, were supplied with a” document editing system backed up by full semantic embedding and access, “which was alive for you all day, and was instantly responsive to every action you had, how much value could you derive from that?”


The use of this quote is both for emotional resonance and for the analogy of how he saw the computer as a type of intellectual ‘fire’–as a richly interactive system accessible through a computer display. We however, are living in the shadow of his vision where academia sees the display but not the interactive power of the computer behind it and micromanage the cosmetics of the visual display of esoteric citation styles without investing in robust and flexible document standards for maintaining rich document semantics affording rich interactions.

The academic publishing world currently takes documents and squeeze every bit of useful semantics out of them by flattening them into PDFs to fetishise their particular wrinkle of paper based citation styles, instead of build a semantic document standard where the basic metadata of the document such as the author’s names, publication date and title is stored for any reader software to extract. Such a digital-native document format would also allow for digital-only experiences and control such as high resolution addressability for linking to exact passages (please note how digital documents do not even allow for linking to pages generally) and using link-types for the author to express that a citation is not supportive and for this to have meaning in a concept or document space or map.

Liquid | Author is moving to an export to PDF where it’s made semantically explicit what the document contains, both for queries via server repositories and other bulk operations, but also so that when a user copies text from such as a document and pastes it in to their own document, all the relevant citation information is carried over, making citing a copy and paste operation, including high resolution support meaning that the pasted text link (as a hyperlink address or identifier) to not only the origin document but to the particular section with that document, giving the reader a quick way to check a citations veracity and relevance etc.

This level of semantic export via the ubiquitous PDF format will allow for innovations in citation analysis, where it becomes relatively easy to build software to give the users views of their documents resulting in greater insights, such as keyword connect maps, author analysis and much more. This is frequently discussed in computer science and there is a rich literature of ideas to support, but it is let down by the paucity and varying quality of the available citation data, which even in the best instances is made available to the reader as a separate piece of downloadable data, such as in a BibTeX sheet or only through the use of third party, proprietary ‘Citation Management Systems’ which all have their own ‘magic sauce’ for allocating the correct citation information to PDF documents–and only within their system, not in a generally accessible way.

JATS is a promising approach for this but the ‘thread lightly’ approach taken by everyone to everyone else in the industry by the lack of backbone to build robust standards means that currently different parts of the industry use their own custom JATS dialects.

Let us come together to support a rich document interchange format, whether a richly exported PDF or a clearly and uniformly tagged JATS, and let the reader choose whichever way they prefer to see the citation styles, as paper or as advanced digital, depending on their need.

Currently we are at the development of digital documents stage where TV was when showing plays with the camera fixed on a tripod and no edits. Let’s liberate our academic dialog by embracing the richness of the media and truly augmenting our academic discourse.

My first step is simply to develop Author so that any paper authored can be exported as richly as possible so that another Author user can gain from these benefits and hopefully demonstrate its utility for the industry. But first, we are completing Dynamic Views which are more visually exciting and which will hopefully help gain more users and thus more resources to the future developments.

Published inAuthorDeep LiteracyFuture Of TextLiquid | ViewNotes On...PhD

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.