Skip to content

Category: Thoughts

How Visible-Meta Relates To My PhD

My background is that of a visual artist, having carried a camera bag on my shoulder since I was about 15 when I would buy and roll my own film. I got a degree at the Chelsea School of Art and studied Advertising at Syracuse University. It’s safe to say that my perspective leans heavily towards the visual. However, in my investigation to make the citation process more visual I came up against real limitations:

Augmenting Literature Reviews

My PhD concerns augmenting a student’s ability to do a literature review and to demonstrate to a supervisor that it was done to a satisfactory degree, which was expected to be a primarily visual exercise.

There were several limitations found in the course of experimenting and reading about different ways to visualise citations and these have shifted the focus from a visual interaction to improving the infrastructure in order to allow for better citation handling interactions followed by a new citation visualisation made possible by dealing with the infrastructure limitations:


Infrastructure Limitations

Having looked at and worked with citations it has become clear that citation visualisations can only work when the citation information is known to the user and downloadable documents are woefully inadequate at telling the system how they are connected to the wider world. Therefore citation analysis tends to be for specifically artificial subsets of the available citable documents.

My aim is that the Visible-Meta approach will alleviate this.


Interaction Limitations

As a consequence of the infrastructure limitations, the act of adding a citation to a document is cumbersome unless one uses a third party management system which exists in a silo. Checking citations is also cumbersome, with very little use of hypertextual surfacing techniques for links, previews and so on.

I therefore feel that working on a way to easily remove the infrastructure limitation in a way which is reasonable to expect to have implemented because of the clear end user (student, teacher, publisher and author) benefits, is where I should be putting my effort. This will also allow me to work on visualisations which are directly relevant to the user and not separate boxed presentations.


Visualisation Limitations

Furthermore there are inherent issues with citation space visualisations, as citation spaces do not map onto real-world attributes other than time and, geographic space though the latter is only of specialist interest. Discussions of visualisation techniques using graphical bubbles and lines show how these can remain abstract and not directly useful:


Graphed Overview

Below is an overview built in the Dynamic View in Liquid | Author while working on the nature of a manuscript (as defined as the document an author is working on) and the published Document and asking how the Visual-Meta approach can improve in the process.

Questions being addressed for the Visual-Meta. In order to support the citation process:

  • What should be encoded from manuscript into published document.
  • How can the encoding serve a student and teacher.
  • How should the information from a published document transfer into a future manuscript through citations and other meta-information
  • How should annotations be handled in this document environment

And then how can this support citation interaction and analysis in a Dynamic View


Core Visualisation Insight

Chosen citations are very different from citations ‘in the wild’. By this I mean that the attributes and meta-information of a document does not change when an author decides to possibly cite it but it becomes different to the author; Instead of a small point in a large ‘cloud’ of data it becomes a curated nugget of information.

As such, the large scale citation views discussed above, under Visualisation Limitations, become very different from the views of a users citations where they have already been chosen. There are less of the chosen citations and they have more meaning and they are all better understood by the author. They now also have become sources for the justification of assertions and origins of terms used and as such can benefit from being presented alongside concepts or key terms.

This is why I am working on how to add citations to the Dynamic View, where the immediate issue is one of clutter and mapping out exactly what should be shown.


1 Comment

Visible-Meta Introduction


An approach to make document’s meta machine and human readable by adding it visually at the end of the document as a series of appendices to allow for a rich reader experience.


Adding human readable appendices to a PDF document which usefully describe the semantics of the document and also making it machine readable offers many benefits and workflow improvements in the academic document space, while adding no document overhead beyond a few plain text pages at the end of the document. This approach keeps compatibility with legacy PDF software Readers while opening up rich opportunities for augmented Readers; Legacy Readers will simply show a normal PDF with an appendix with BibTeX style information.



Visible-Meta Augmented Readers can provide the user with as rich interactions as can be provided in a custom authoring environment–the publishing and freezing onto PDF is no longer a limitation. Advanced interactions can include:

  • Copy As Citation using a simple copy command, with all citation information added to the clipboard payload for use by Visible-Meta aware applications on Paste.
  • Instant Outline based on the document specifying heading formatting.
  • Dynamic Views, such as the one implemented in Liquid | Author could be stored as data not only images.
  • Server Access. Repositories can extract information for large scale analysis.
  • Glossary Support. Glossary terms could be added to the appendix.
  • High Resolution, Document Based Addressing. The Name of the document is not the same as the Title and this can be be used to address by document and not location and support High-Resolution Addressing.
  • & more, to be discovered.



For an author this approach means that they can embed more rich information in their document with a minimum of effort and be sure of the robustness of the information.

It allows the reader a much faster way to cite with a higher degree of accuracy and more access to the original data and interactions.

Augmented textual communication. Using the appendices to describe the document content, such as the formatting of headings and citations as well as the use of glossaries, can allow the reading software to present the document to the readers preference without loosing the creator’s semantics.

Server Friendly which allows for large scale citation and other document element analysis.

Institutions can worry less about the cosmetics of citations and benefit from more documents cited being checked and read.

This could put an end to the absurd academic time-waste of nit-picking how citations should be displayed: Let the teacher/examiner/reader specify how the citations should be displayed, based on the document having described in the appendix how they are used and therefore the reader can re-format the the readers tastes.

Universities still get to dictate the default handing-in formatting but the same document could be displayed in any format the reader chooses.



I am putting my money where my mouth is by demonstrating this interaction via my commercially available macOS word processor Liquid | Author, which was built during my PhD work, and the prototype test application Liquid | Reader.

This is but one implementation as the approach is as open as possibly can be. This demonstration is less than 2 minutes:



Examples and description of the format is posted: Visible-Meta Examples


Document Name

Note that the ‘document_name’ is distinct from the title and can be set automatically by the authoring software to help identify the document through search later. The unique name will be the first 10 characters of the title, author’s name, the time in condensed form and a random 4 digit number. For example:


  • 1962 | 10 | 21 | 23 | 15 | 32
  • year | month | date | hour | min | seconds

Document Based Addressability

This approach allows the user to click on a citation and have the PDF open if it is available to the user, not simply to load a download page. If the document is not found, an opportunity to search for it will be presented.

High Resolution Addressing

Enacting a linking in this style is an active process initiated by the Reader software so adding an internal ‘search’ to the processes will allow the software to not only load the document but to open it at the section cited..


Adoption Support

The first implementations will include links to actual code for how to add this into other developer’s projects, dramatically reducing the implementation overhead.


Legacy Support

When using a supported Reader, the user can download a PDF and copy the BibTeX export format on the download page, then open the PDF in Reader and click to ‘Assign BibTeX’ and it will be applied as an appendix and saved, same as if it was natively exported with Visual-Meta. Only the citation information will be provided in this way–formatting etc. will not be available.



This work grew out of work on Liquid | Author: Visible-Meta Origins.


How This Relates To My PhD

This work has grown out of my PhD work at the University of Southampton under Dame Wendy Hall and Les Carr. It aims to solve infrastructure issues which hamper citation interaction and visualisations: Visual-Meta & my PhD.



Make the Document Readable


Now that we have both Liquid | Author and Liquid | Reader I think it’s time to clarify the differences between an editable manuscript (in Author) and a published (made public/defined as done, at least to a specific version) ‘frozen’, document (PDF opened in Reader). In analog times this was a clear distinction where the typewritten document and a typeset document: one was produced in very limited amounts and the other reproducible in large volumes. With digital documents this distinction has disintegrated.

The nearest we have are probably Word documents for manuscripts and PDF for published documents where the prime characteristic of the Word document is editablilipy and the PDF that it is frozen. However, software allows for different kinds of manipulations so this is only a loose rule. The model described here does use PDF as the base published document but this is subject to change as the word moves on and another document format, such as JATS may step up. The notion of a private manuscript and published document remains however.

TL:DR / Summary / Abstract

This post makes the point that adding appendices to a document can usefully describe the semantics of the document for the reader software to present rich options to the user, rather then fixing information in hard to parse ways or embedding them in fragile meta-boxes.

First : Outside the Document

Universal Text Interactions

Powerful interactions should be possible for both categories of documents and in my world this means Liquid | Flow interactions where the user can select any text and instantly get a myriad of search results and transformations done.

Document Connections

Citation Analysis / Concept Mapping

Citation analysis can be a very visual process based on system-extracted data about documents and how documents connect through citations. I put concept mapping in the same section here since both are based on how concepts or documents connect and are therefore both outside and in-between the documents.


By glossary I mean definitions which are specific to a document, author, publisher or a field. The glossary systems I am concerned with have explicit connections to other glossary terms and documents and therefore can merge with concept mapping. I have blogged on various stages of this:

Author’s Manuscript

Moving forward it will be important to define the interactions–and possible interactions–for each document type. This will really mean defining the Reader document (.pdf) since the Author document (.liquid) should have as rich interactions as possible. There is not much to say on this here since this is covered under all of the work for Author and general interactions. The purpose of the thoughts in this post is to clarify the role of the published document how to expand and limit it’s potential interactions:

The Published Document

The defining characteristic of the published document is that it is a frozen substrate where the author’s work is not editable but it is annotatable and citable:


Annotations are notes of varying sorts added by the reader ‘on top of’ the author’s work. The reasons for this include:

  • Augment comprehension of the document
  • Augment comprehension of the content of the document in a multi-document context
  • To share with other readers for discussion
  • To share with the author for comment
  • To find passages of text in the future for citing
  • & more

The reader user should be able to highlight passages of text and to make any ‘mark’ they feel they want to. The system should store these highlights and marks and make them as useful as possible for the/a reader in the future. This includes the ability to search an individual document or a set of documents for only text which ash been highlighted, either in the Reader application or as part of a citation or concept analysis.

The way Reader should handle annotating is simply to let the user highlight any text with a colour highlight (default yellow) and that’s it for the initial highlighting. In the future it should be possible to choose colours based on some meaning and to draw and doodle.

The annotations should be stored in such a way as to be accessible to the Reader application, and any other PDF reader for searching the document based on only annotated/highlighted text and to an importing application, such as Author for the citation view to make connections and do other visualisations and interactions based on any keywords in the document and/or only highlighted text.


Citations are the means through which the reader can connect what they are themselves authoring to the source material in the published work.

This comes straight onto the issue of addressing, which I think is a prime issue to be dealt with and which I have blogged about quite a bit

The act of citing is the act of showing the source in relation to the author’s work and the act of reading a citation is the act of recognising the source and seeing if it adds credibility to the author’s work or seeing a new source which can then be investigated to check for relevance and veracity.

The act of adding a citation is currently generally absurd, with the source documents in PDF not carrying any useful meta-information other than what might be written in plain text in the document as a title and names of the authors and only sometimes the publication date. Companies provide commercial services to search databases to add full citation information to the user (but crucially, not the document itself) to help the user cite them. This is a key issue the Reader-Author interaction solves, with the Author Created PDF carrying the meta for Reader to allow the user to simply copy text and then paste it as a full citation:

(The important aspect of high-resolution addressing can come under this system, but that is not addressed here in detail)

Meta -> Visible ‘About this Document’

The information about a document would have to be on the same substrate level as the content in the analog world, there was no place to hide it. In digital documents however there can be a payload of information not visible to the user, in fact it is a requirement of digital documents since they need a way to convey to the operating system and reader/editor software what the document is and how it should be displayed and how it can be interacted with. This can clearly be useful, such as with the EXIF data of a photograph containing a lot of information about the technical status of the taking of the picture and has potential for adding all the citation information–and more–to a document but there are two issues: Publishers (software and companies) usually do not include this meta information and it gets stripped out on changing formats or printing.

I learnt that when Jacob implemented the ability to copy the document’s BibTeX textual citation information however, that this is findable information for a system since it starts with a unique and identifiable string, and as such, when a user copies a BibTeX from a download site to use in Author, the user does not need to copy only the the BibTeX text since if the whole web page including the BibTeX is copied, Author will easily parse the text and find the BibTeX and use it.

This gave me the most obvious revelation: Humans can read the visible text in documents and so can computer systems so why not not worry about embedding meta and instead leave it visible? This is why Author now has the option to export the BibTeX for the document at the end of the document as plain text, under the heading ‘BibTeX’. It means that Reader opens the document and ‘reads’ it and finds the BibTeX, it then uses this when the user performs a basic copy by appending it to the clipboard. When the user then pasted back into Author this is made available and on paste a dialog asks the user: Paste as plain text or use the embedded BibTeX to paste as a citation? The result is that a simple copy and paste becomes a fully formatted citation where the application accepting the paste (in this case Author) ‘knows’ that this is a citation.

The next step from this perspective is to encourage software vendors to produce PDF documents where the visual information contains semantic values, not expecting hidden information to do the job. In terms of archiving and data transfer this is useful but it’s also useful now, to make the systems more rich and robust.

Have a section at the end of the document with the BiBTeX as citation information and don’t call it meta, simply call it information but since it’s clearly marked any reader can use it in the same way as Reader / Author does.

And let’s go further. Let’s use such an appendix to describe the formatting of the document, including how headings are formatted and so on. This should allow for complete compatibility with basic PDF readers but also allow new readers to extract semantic values to allow for richer interactions, such as automatic headings interactions, citation display and interactions and so on.

This could put an end to the absurd academic time-waste of nit-picking how citations should be displayed: Let the teacher/examiner/reader specify how the citations should be displayed, based on the document having described in the appendix how they are used and therefore the reader can re-format the the readers tastes.

This can further be used to work with glossaries and much more and will be robust enough to even be printed out and scanned and all will be retained.

I am putting my money where my mouth is by demonstrating this interaction via Author and Reader but this is as open as possibly can be and the end user can seriously benefit from such a very open rich-information interchange.


Note: This became the Visible-Meta approach.


Leave a Comment