Skip to content

Progress Update for the PhD Late June 2019

Progress late June 2019

The PhD work is improving the way PhD students carry out their literature reviews. The goal of a literature review is to generate a deep insight into a specific field and, in PhD terms, to be able to demonstrate this to a supervisor and examiner. This means that I am my own subject and that I must carefully watch my own work and citation and the infrastructures available to me while also carefully studying the literature for prior work on this topic (hypertext, spatial hypertext, visualisation, graphs, addressing & linking & more) and to learn the processes and issues other PhD students have, which is why I have applied for a Survey and will follow that up with a recorded group meeting, both with the PhD students from Southampton.

The first major realisation during my studies was the importance of the flat digital document to be able to record a perspective in a slice of time. For this the PDF has become the format of choice in academia, as well as in the legal profession (as a way to store contracts and agreements) and in business in general (even a sizeable amount of Google Docs start as PDFs {confidential numbers so no attribution possible}). The aim of publishing an academic document is to present a linearised argument supported by references to prior work in a format conforming to the institutions vernacular.

The aim of reading an academic document however, is quite different and this is where the main thrust of my PhD lies: In order to support a more dynamic reading workflow, the publishing workflow must also take into account the needs of the reader.

Currently this is not the case: A PDF is produced only for the requirements of the university or publication, with little or no concern for the academic reader who must go through large volumes of documents to stay on top of their field, beyond worshipping at the feet of the paper legacy of how pages should be laid out and appear visually according to tradition. The PDF document is only a virtual photograph of an imagined paper copy: Most academic PDFs do not even contain the date of publication, this is something the PhD student needs to employ a Citation Manager System to figure out. These systems trawl through huge databases and find the publication data and other ‘meta’-data, information about the document. The systems do not append this to the PDF documents however, they simply add them as fields in their database. This means that a basic act of copying a line of text from a PDF to cite it becomes a trip to multiple destinations to get the text and then the information for citing the text.

Since I first presented at the WAIS Away Day and my concerns about PDFs were roundly pounced upon, I have accepted the stranglehold PDFs have on the industry and I have investigated notions of ‘Rich-PDFs’ with embedded information and found this to be beyond practical scope since paid-for licenses would be needed to append information to PDFs and even to suggest that publishers add author and date information turned out to be too much of an ask when I brought this up at the JATS Conference in Cambridge which I sponsored on behalf or my word processor Liquid | Author.

It is through Liquid | Author I have had a chance to experiment with this. I started the Author project not too long after the death of Doug Engelbart in 2013 but it was not until I started the PhD program that I got a competent coder to build what is now the current and live vision, starting in 2016.

A few weeks (early June 2019) ago I had the idea of adding all the necessary meta-information visually to the back of the PDF and building a PDF reader which could read the document and use this information, initially to allow the user to copy text from a PDF with such added ‘Visual-Meta’ information and simply paste it into a word processor, such as Author, as a citation, all in one operation: I decided this was an important idea so I invested in upgrading Author to be able to publish with Visual-Meta and we (I designed, Jacob Hazelgrove, Author’s coder built) created Liquid | Reader, a PDF reader which can indeed read it’s own documents. This now works and has been presented to several partners, all of whom have given great feedback so I am trying to co-design the scheme of how the meta is presented and what should be covered.

This will allow for many immediate end-user benefits, including one operation copy and paste as citation–which will allow the user to use any citation management system since the meta is self contained–and also the extraction and use of data fields; all tables etc. become useable data.

The approach also has the benefit of being very low in technical complexity to implement (all code and aspects to be made public and linked to from every published document) and robust since it can even handle printing and format change.

I aim to test this on users after the initial survey is done and to integrate with services such as Scholarcy to be able to automatically add the Visual-Meta to documents created outside of this ecosystem. I am also in talks with the largest online document provider to see how this fits in their process and I wish to discuss this with relevant people at my home university of Southampton.

My personal hope is that this will become a standard and save countless time and wasted efforts on minutia, such as the visual formatting of citations, since every reader can choose how to see them. My hope within my PhD is that I can use this to integrate PDF documents into Author’s Dynamic View, which I created as a core part of my PhD but which turned out to be useless without documents having meta-information attached.

I have struggled with how to ingrate citation documents into the Dynamic View, a flow of consciousness of which follows:


In order to carry out my own PhD work I need to sort the research documents I need to cite: I need to place some into a dump for those I found not useful but would rather not download again, those I found useful and have used and those I found useful and intend to use in the future. At the most basic level I can simply have them in a word processing view and organise them under headings.

I have developed the dynamic view which can be useful in this since I can spatially organise them with keywords/key concepts and authors as I see fit.

Maybe I should have a dynamic view for citations only, with documents and authors only? Maybe I should integrate them into the dynamic view but I am concerned that will become messy.

Question: Do I need more than one list of citation material? If so, many? Maybe make this a special view and a special database? It could of course over time relatively easily have different sets.

Maybe make it a separate, but integrated Liquid | Citations or Liquid | Library application? Maybe add it to Liquid | Reader? Yes. List all imported PDFs as well as URLs.

The primary user is me. Well, at least initially.

I’d need to design a different looking list view with option to search based on annotation or keyword and a dynamic view for this. How can I make the dynamic view look usefully different from the one in Author?

The magic, to me at least, is that the Visual-Meta and use of BibTeX makes this possible: The PDF stores information about itself.

So, user selects which folders to work with. They appear in the dynamic view as headings. Click to …. No, this is not so useful.

Back to Author. Considering that a thesis will have a max of around 200 cited documents, that should fit well in the Dynamic View. What matters is the freedom of interaction and therefore I think citations in Dynamic View should be the same as any text, but citation information should show on top in Find view.

<< That is how I am currently thinking about the problem. One huge (to me anyway, I’ll need to test and verify) thing that has also fallen out of this is to make citations in a document click-to-copy-as citation. This means any document can be usefully applied as a citation management system! I’ll be using this in Author once it’s been implemented, which I expect to be this week: I can add research to a document, organise it as I see fit, then copy and paste as full citations into my thesis document. Conclusion I feel the Visual-Meta is a real candidate for improving the student’s literature and citation process and I’ll test that hypothesis. The Dynamic View will be employed as a way to visualise relationships between citations I have chosen to use and their authors and the concepts they discuss. This is to address the second part of my PhD; demonstrating the student’s increased depth of understanding of the specific field. A glossary system may be used to store the concepts/authors so BibTeX/Visual-Meta needs to integrate with the glossary as different or merged systems. How to do this is the current focus of my work.

Published inPhDUpdates

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.