Skip to content

Category: Future Of Text

Continuing symposium on the future of text.

The Importance of Semantic Export

With inspiration from my friend and mentor Doug Engelbart’s opening remarks of his famous 1968 demo, re-wrought for this post:

 

“The research program that I’m going to describe to you is quickly characterisable by saying: If, in your office, you as an intellectual worker, were supplied with a” document editing system backed up by full semantic embedding and access, “which was alive for you all day, and was instantly responsive to every action you had, how much value could you derive from that?”

 

The use of this quote is both for emotional resonance and for the analogy of how he saw the computer as a type of intellectual ‘fire’–as a richly interactive system accessible through a computer display. We however, are living in the shadow of his vision where academia sees the display but not the interactive power of the computer behind it and micromanage the cosmetics of the visual display of esoteric citation styles without investing in robust and flexible document standards for maintaining rich document semantics affording rich interactions.

The academic publishing world currently takes documents and squeeze every bit of useful semantics out of them by flattening them into PDFs to fetishise their particular wrinkle of paper based citation styles, instead of build a semantic document standard where the basic metadata of the document such as the author’s names, publication date and title is stored for any reader software to extract. Such a digital-native document format would also allow for digital-only experiences and control such as high resolution addressability for linking to exact passages (please note how digital documents do not even allow for linking to pages generally) and using link-types for the author to express that a citation is not supportive and for this to have meaning in a concept or document space or map.

Liquid | Author is moving to an export to PDF where it’s made semantically explicit what the document contains, both for queries via server repositories and other bulk operations, but also so that when a user copies text from such as a document and pastes it in to their own document, all the relevant citation information is carried over, making citing a copy and paste operation, including high resolution support meaning that the pasted text link (as a hyperlink address or identifier) to not only the origin document but to the particular section with that document, giving the reader a quick way to check a citations veracity and relevance etc.

This level of semantic export via the ubiquitous PDF format will allow for innovations in citation analysis, where it becomes relatively easy to build software to give the users views of their documents resulting in greater insights, such as keyword connect maps, author analysis and much more. This is frequently discussed in computer science and there is a rich literature of ideas to support, but it is let down by the paucity and varying quality of the available citation data, which even in the best instances is made available to the reader as a separate piece of downloadable data, such as in a BibTeX sheet or only through the use of third party, proprietary ‘Citation Management Systems’ which all have their own ‘magic sauce’ for allocating the correct citation information to PDF documents–and only within their system, not in a generally accessible way.

JATS is a promising approach for this but the ‘thread lightly’ approach taken by everyone to everyone else in the industry by the lack of backbone to build robust standards means that currently different parts of the industry use their own custom JATS dialects.

Let us come together to support a rich document interchange format, whether a richly exported PDF or a clearly and uniformly tagged JATS, and let the reader choose whichever way they prefer to see the citation styles, as paper or as advanced digital, depending on their need.

Currently we are at the development of digital documents stage where TV was when showing plays with the camera fixed on a tripod and no edits. Let’s liberate our academic dialog by embracing the richness of the media and truly augmenting our academic discourse.

My first step is simply to develop Author so that any paper authored can be exported as richly as possible so that another Author user can gain from these benefits and hopefully demonstrate its utility for the industry. But first, we are completing Dynamic Views which are more visually exciting and which will hopefully help gain more users and thus more resources to the future developments.

Leave a Comment

Liquid | Reader Browser Plugin

Liquid Web Browser plugin for Safari and/or Chrome and/or Firefox, depending on ease of initial implementation.

Liquid | Reader Browser Plugin

Main functions: Download PDF with citation information & Augmented Copying.

Preferences include simple on/off options:

‘Auto-Append Citation Information if found?[]’

‘Augmented Copy on regular cmd-c (•) or shift-command-c?()’

With brief descriptions of each function.

Download PDF with Citation Information

User visits an academic download site (initially supporting https://dl.acm.org/) and searches for a document and finds one which the user then chooses to download. The downloaded PDF will automatically have all the citation information pasted as metainformation, ready for use by Liquid | Reader, Liquid | View and other applications:

The plugin checks all pages for citation information, such as BibTeX (the authors of the document, the title and so on).

If the user chooses to download a PDF the plugin shows a dialog to the user (same as Author’s citation dialog) which the user can OK, amend and OK or choose Ignore. Unless the user chooses Ignore then all the meta will be assigned to the PDF in the Get Info window, as though it was exported originally like this. There will also be an option to ‘Do not show this dialog in the future’ and stay with last used preference.

Liquid | Reader will use this information so that if a user copies text from such a Rich PDF, all the citation information will be automatically appended and if the user pastes into Author it will paste as a citation. The forthcoming Liquid | View graph application can also use this meta-information. Since this will follow the Adobe PDF Meta information standards, other applications are open to using it too, where supported: https://www.prepressure.com/pdf/basics/metadata

Augmented Copy

The user can copy from any web page and the clipboard payload will also contain in-page addressability, where possible, and author name, also where possible, in a citation ready format. Christopher Gutteridge may support this effort.

1 Comment