Skip to content

Category: Deep Literacy

The Importance of Semantic Export

With inspiration from my friend and mentor Doug Engelbart’s opening remarks of his famous 1968 demo, re-wrought for this post:

 

“The research program that I’m going to describe to you is quickly characterisable by saying: If, in your office, you as an intellectual worker, were supplied with a” document editing system backed up by full semantic embedding and access, “which was alive for you all day, and was instantly responsive to every action you had, how much value could you derive from that?”

 

The use of this quote is both for emotional resonance and for the analogy of how he saw the computer as a type of intellectual ‘fire’–as a richly interactive system accessible through a computer display. We however, are living in the shadow of his vision where academia sees the display but not the interactive power of the computer behind it and micromanage the cosmetics of the visual display of esoteric citation styles without investing in robust and flexible document standards for maintaining rich document semantics affording rich interactions.

The academic publishing world currently takes documents and squeeze every bit of useful semantics out of them by flattening them into PDFs to fetishise their particular wrinkle of paper based citation styles, instead of build a semantic document standard where the basic metadata of the document such as the author’s names, publication date and title is stored for any reader software to extract. Such a digital-native document format would also allow for digital-only experiences and control such as high resolution addressability for linking to exact passages (please note how digital documents do not even allow for linking to pages generally) and using link-types for the author to express that a citation is not supportive and for this to have meaning in a concept or document space or map.

Liquid | Author is moving to an export to PDF where it’s made semantically explicit what the document contains, both for queries via server repositories and other bulk operations, but also so that when a user copies text from such as a document and pastes it in to their own document, all the relevant citation information is carried over, making citing a copy and paste operation, including high resolution support meaning that the pasted text link (as a hyperlink address or identifier) to not only the origin document but to the particular section with that document, giving the reader a quick way to check a citations veracity and relevance etc.

This level of semantic export via the ubiquitous PDF format will allow for innovations in citation analysis, where it becomes relatively easy to build software to give the users views of their documents resulting in greater insights, such as keyword connect maps, author analysis and much more. This is frequently discussed in computer science and there is a rich literature of ideas to support, but it is let down by the paucity and varying quality of the available citation data, which even in the best instances is made available to the reader as a separate piece of downloadable data, such as in a BibTeX sheet or only through the use of third party, proprietary ‘Citation Management Systems’ which all have their own ‘magic sauce’ for allocating the correct citation information to PDF documents–and only within their system, not in a generally accessible way.

JATS is a promising approach for this but the ‘thread lightly’ approach taken by everyone to everyone else in the industry by the lack of backbone to build robust standards means that currently different parts of the industry use their own custom JATS dialects.

Let us come together to support a rich document interchange format, whether a richly exported PDF or a clearly and uniformly tagged JATS, and let the reader choose whichever way they prefer to see the citation styles, as paper or as advanced digital, depending on their need.

Currently we are at the development of digital documents stage where TV was when showing plays with the camera fixed on a tripod and no edits. Let’s liberate our academic dialog by embracing the richness of the media and truly augmenting our academic discourse.

My first step is simply to develop Author so that any paper authored can be exported as richly as possible so that another Author user can gain from these benefits and hopefully demonstrate its utility for the industry. But first, we are completing Dynamic Views which are more visually exciting and which will hopefully help gain more users and thus more resources to the future developments.

Leave a Comment

Un-Leash

When going to bed night before last I flipped through the recommended youTube channels and there was a live cast of the first time Elon Musk’s Falcon Heavy had all three boosts safely return to earth. I felt I was watching real human progress, it was amazing and uplifting.

This was also the easter weekend when my son Edgar and I took a (minor) part in the Climate Extinction protest which is also uplifting and inspiring.

This easter was the easter when the roof of the Notre Dame caught fire and had near-instant billion dollar pledges for support to fix it. As a species we have an amazing capacity to do amazing things, but often we don’t. The same human ego that drives massive wealth accumulation needs massive ego reward to spend the money–being important to the future of humanity or even to today isn’t enough.

Last night I watched Elon Musk’s Tesla’s Autonomy Day with my beautiful wife resting in my arms. It was hugely impressive, uplifting and inspiring. Last night’s far as I can piece together, I dreamt of many of these elements in poetic form.

Today I want to do something. Something big and useful.

I have worked on aspects of symbol manipulation for most of my career and I have had some progress but very limited compared with the potential.

Liquid | Flow is a powerful and, judging by actual user reviews, successful text manipulation utility with just 30,000 downloads but happy users. It allows the user to interact with their textual information far more rapidly than through other means, resulting in searches and references being carried out to check on whether news is fake or not much more often than through traditional means. The main problem with Flow is that it’s hard to communicate what it is, it is very hard to sell to someone who is not already interested in more powerful text.

Liquid | Author has had only 17,000 downloads but has not been on the market very long (a bit over a year) and has only had a brief period of being featured by Apple. Reviews are very good but people are not used to paying for software anymore and that severely lowers the cost per unit I can charge. The only effective marketing tool so far is to have Apple love it and feature it on the App Store and that is not a viable strategy for growth.

I can only see one way out of this and that is build and ship something which is so self-evidently more powerful than what we have today so that traditional and social media will spread the message virally.

To do that we need to explode the grey column traditional text layout but not in a demo-app or isn’t it cool kind of way, but in a way integrated into a useful workflow. Yes, this sounds like what I have been working on forever and it is, but it’s time to take it to the next level. I’ll park Author very soon, there are a few small issues needing fixing but I am concerned that they have turned into excuses as much as anything, so the final version of Author (for this round) will be submitted to Apple on Friday.

It will then be one month of work on the Dynamic View which is important for my PhD but also the most visually clear way I can explode the grey text column in a visually-impressive and work-useful way.

And from there on develop more interactions to usefully impress. How about infinite semantic zooming in both directions? How about graphed glossaries with auto-layout? How about gestures to expand and collapse text to fit the whim of the reader? We can do this and more and we can communicate it.

What happens when we unleash text?

Let’s find out.

Leave a Comment

Liquid | Reader Browser Plugin

Liquid Web Browser plugin for Safari and/or Chrome and/or Firefox, depending on ease of initial implementation.

Liquid | Reader Browser Plugin

Main functions: Download PDF with citation information & Augmented Copying.

Preferences include simple on/off options:

‘Auto-Append Citation Information if found?[]’

‘Augmented Copy on regular cmd-c (•) or shift-command-c?()’

With brief descriptions of each function.

Download PDF with Citation Information

User visits an academic download site (initially supporting https://dl.acm.org/) and searches for a document and finds one which the user then chooses to download. The downloaded PDF will automatically have all the citation information pasted as metainformation, ready for use by Liquid | Reader, Liquid | View and other applications:

The plugin checks all pages for citation information, such as BibTeX (the authors of the document, the title and so on).

If the user chooses to download a PDF the plugin shows a dialog to the user (same as Author’s citation dialog) which the user can OK, amend and OK or choose Ignore. Unless the user chooses Ignore then all the meta will be assigned to the PDF in the Get Info window, as though it was exported originally like this. There will also be an option to ‘Do not show this dialog in the future’ and stay with last used preference.

Liquid | Reader will use this information so that if a user copies text from such a Rich PDF, all the citation information will be automatically appended and if the user pastes into Author it will paste as a citation. The forthcoming Liquid | View graph application can also use this meta-information. Since this will follow the Adobe PDF Meta information standards, other applications are open to using it too, where supported: https://www.prepressure.com/pdf/basics/metadata

Augmented Copy

The user can copy from any web page and the clipboard payload will also contain in-page addressability, where possible, and author name, also where possible, in a citation ready format. Christopher Gutteridge may support this effort.

1 Comment