Skip to content

Category: Liquid | View Pitch

Visual-Meta Introduction

 

Visual-Meta is an approach to make document’s meta machine and human readable by adding an appendix to the end of the document, based on BibTeX, with all the information needed to cite the document (author, title, date etc.) as well as clearly stating the values of any data (such as tables, lists advanced layouts etc.) and glossary terms.

This visually (as plain text in the document) metadata can then be parsed by a Visual-Meta aware PDF reader to enable functionality such as copying text and pasting it as citation in one step.

Putting the metadata visually into the document means that even if the document format is changed or the document is printed and scanned, the data will still be a part of the document and compatibility with legacy readers is maintained since they will only see the metadata as plain text.

Adding human readable appendices to a PDF document which usefully describe the semantics of the document and also making it machine readable offers many benefits and workflow improvements in the academic document space, while adding no document overhead beyond a few plain text pages at the end of the document. This approach keeps compatibility with legacy PDF software Readers while opening up rich opportunities for augmented Readers; Legacy Readers will simply show a normal PDF with an appendix with BibTeX style information.

 

Augmentations

Visible-Meta Augmented Readers can provide the user with as rich interactions as can be provided in a custom authoring environment–the publishing and freezing onto PDF is no longer a limitation. Advanced interactions can include:

  • Copy As Citation using a simple copy command, with all citation information added to the clipboard payload for use by Visible-Meta aware applications on Paste.
  • Instant Outline based on the document specifying heading formatting.
  • Dynamic Views, such as the one implemented in Liquid | Author could be stored as data not only images.
  • Server Access. Repositories can extract information for large scale analysis.
  • Glossary Support. Glossary terms could be added to the appendix.
  • High Resolution, Document Based Addressing. The Name of the document is not the same as the Title and this can be be used to address by document and not location and support High-Resolution Addressing.
  • & more, to be discovered.

 

Benefits

For an author this approach means that they can embed more rich information in their document with a minimum of effort and be sure of the robustness of the information.

It allows the reader a much faster way to cite with a higher degree of accuracy and more access to the original data and interactions.

Augmented textual communication. Using the appendices to describe the document content, such as the formatting of headings and citations as well as the use of glossaries, can allow the reading software to present the document to the readers preference without loosing the creator’s semantics.

Server Friendly which allows for large scale citation and other document element analysis. University of Southampton’s Christopher Gutteridge, one the of the people behind the university repository, elaborates on this.

Institutions can worry less about the cosmetics of citations and benefit from more documents cited being checked and read.

This could put an end to the absurd academic time-waste of nit-picking how citations should be displayed: Let the teacher/examiner/reader specify how the citations should be displayed, based on the document having described in the appendix how they are used and therefore the reader can re-format the the readers tastes.

Universities still get to dictate the default handing-in formatting but the same document could be displayed in any format the reader chooses.

 

Demonstration

Visual-Meta export is built in to the Liquid | Author word processor and parsing it can be done by the Liquid | Reader PDF reader application, both produced by the author of this article, Frode Hegland: www.liquid.info

Video demonstration of the concept (less than two minutes long): youtube.com/watch?v=Q-LnkuI2Qx8&feature=youtu.be

 

Example

Examples and description of the format is posted: Visible-Meta Example & Structure.

 

Document Name

Note that the ‘document_name’ is distinct from the title and can be set automatically by the authoring software to help identify the document through search later: http://wordpress.liquid.info/addressability-supplemental-augmentation-for-visual-meta

 

 

Adoption Support

The first implementations will include links to actual code for how to add this into other developer’s projects, dramatically reducing the implementation overhead.

 

Legacy Support

When using a supported Reader, the user can download a PDF and copy the BibTeX export format on the download page, then open the PDF in Reader and click to ‘Assign BibTeX’ and it will be applied as an appendix and saved (along with a tag stating which source was used and when), same as if it was natively exported with Visual-Meta. Only the citation information will be provided in this way–formatting etc. will not be available

 

 

Legacy Augmentation

 ­

Manual

When using a supported Reader, the user can download a PDF and copy the BibTeX export format on the download page, then open the PDF in Reader and click to ‘Assign BibTeX’ and it will be applied as an appendix and saved, same as if it was natively exported with Visual-Meta. Only the citation information will be provided in this way–formatting etc. will not be available.

Server

Reader applications can also send non-visible-meta PDFs to a server, such as Scholarcy to have the Visible-Meta extracted and appended.

 

 

Background

This work grew out of work on Liquid | Author: Visible-Meta Origins.

 

How This Relates To My PhD

This work has grown out of my PhD work at the University of Southampton under Dame Wendy Hall and Les Carr. It aims to solve infrastructure issues which hamper citation interaction and visualisations: Visual-Meta & my PhD.

 

Known Issues

There are many issues to be worked out, including how to refer to different authors of different chapters and what exactly to encode.

15 Comments

Make the Document Readable

 

Now that we have both Liquid | Author and Liquid | Reader I think it’s time to clarify the differences between an editable manuscript (in Author) and a published (made public/defined as done, at least to a specific version) ‘frozen’, document (PDF opened in Reader). In analog times this was a clear distinction where the typewritten document and a typeset document: one was produced in very limited amounts and the other reproducible in large volumes. With digital documents this distinction has disintegrated.

The nearest we have are probably Word documents for manuscripts and PDF for published documents where the prime characteristic of the Word document is editablilipy and the PDF that it is frozen. However, software allows for different kinds of manipulations so this is only a loose rule. The model described here does use PDF as the base published document but this is subject to change as the word moves on and another document format, such as JATS may step up. The notion of a private manuscript and published document remains however.

TL:DR / Summary / Abstract

This post makes the point that adding appendices to a document can usefully describe the semantics of the document for the reader software to present rich options to the user, rather then fixing information in hard to parse ways or embedding them in fragile meta-boxes.

First : Outside the Document

Universal Text Interactions

Powerful interactions should be possible for both categories of documents and in my world this means Liquid | Flow interactions where the user can select any text and instantly get a myriad of search results and transformations done.

Document Connections

Citation Analysis / Concept Mapping

Citation analysis can be a very visual process based on system-extracted data about documents and how documents connect through citations. I put concept mapping in the same section here since both are based on how concepts or documents connect and are therefore both outside and in-between the documents.

Glossary

By glossary I mean definitions which are specific to a document, author, publisher or a field. The glossary systems I am concerned with have explicit connections to other glossary terms and documents and therefore can merge with concept mapping. I have blogged on various stages of this: http://wordpress.liquid.info/?s=glossary

Author’s Manuscript

Moving forward it will be important to define the interactions–and possible interactions–for each document type. This will really mean defining the Reader document (.pdf) since the Author document (.liquid) should have as rich interactions as possible. There is not much to say on this here since this is covered under all of the work for Author and general interactions. The purpose of the thoughts in this post is to clarify the role of the published document how to expand and limit it’s potential interactions:

The Published Document

The defining characteristic of the published document is that it is a frozen substrate where the author’s work is not editable but it is annotatable and citable:

Annotations

Annotations are notes of varying sorts added by the reader ‘on top of’ the author’s work. The reasons for this include:

  • Augment comprehension of the document
  • Augment comprehension of the content of the document in a multi-document context
  • To share with other readers for discussion
  • To share with the author for comment
  • To find passages of text in the future for citing
  • & more

The reader user should be able to highlight passages of text and to make any ‘mark’ they feel they want to. The system should store these highlights and marks and make them as useful as possible for the/a reader in the future. This includes the ability to search an individual document or a set of documents for only text which ash been highlighted, either in the Reader application or as part of a citation or concept analysis.

The way Reader should handle annotating is simply to let the user highlight any text with a colour highlight (default yellow) and that’s it for the initial highlighting. In the future it should be possible to choose colours based on some meaning and to draw and doodle.

The annotations should be stored in such a way as to be accessible to the Reader application, and any other PDF reader for searching the document based on only annotated/highlighted text and to an importing application, such as Author for the citation view to make connections and do other visualisations and interactions based on any keywords in the document and/or only highlighted text.

Citations

Citations are the means through which the reader can connect what they are themselves authoring to the source material in the published work.

This comes straight onto the issue of addressing, which I think is a prime issue to be dealt with and which I have blogged about quite a bit http://wordpress.liquid.info/?s=addressing

The act of citing is the act of showing the source in relation to the author’s work and the act of reading a citation is the act of recognising the source and seeing if it adds credibility to the author’s work or seeing a new source which can then be investigated to check for relevance and veracity.

The act of adding a citation is currently generally absurd, with the source documents in PDF not carrying any useful meta-information other than what might be written in plain text in the document as a title and names of the authors and only sometimes the publication date. Companies provide commercial services to search databases to add full citation information to the user (but crucially, not the document itself) to help the user cite them. This is a key issue the Reader-Author interaction solves, with the Author Created PDF carrying the meta for Reader to allow the user to simply copy text and then paste it as a full citation: https://www.youtube.com/watch?v=Q-LnkuI2Qx8

(The important aspect of high-resolution addressing can come under this system, but that is not addressed here in detail)

Meta -> Visible ‘About this Document’

The information about a document would have to be on the same substrate level as the content in the analog world, there was no place to hide it. In digital documents however there can be a payload of information not visible to the user, in fact it is a requirement of digital documents since they need a way to convey to the operating system and reader/editor software what the document is and how it should be displayed and how it can be interacted with. This can clearly be useful, such as with the EXIF data of a photograph containing a lot of information about the technical status of the taking of the picture and has potential for adding all the citation information–and more–to a document but there are two issues: Publishers (software and companies) usually do not include this meta information and it gets stripped out on changing formats or printing.

I learnt that when Jacob implemented the ability to copy the document’s BibTeX textual citation information however, that this is findable information for a system since it starts with a unique and identifiable string, and as such, when a user copies a BibTeX from a download site to use in Author, the user does not need to copy only the the BibTeX text since if the whole web page including the BibTeX is copied, Author will easily parse the text and find the BibTeX and use it.

This gave me the most obvious revelation: Humans can read the visible text in documents and so can computer systems so why not not worry about embedding meta and instead leave it visible? This is why Author now has the option to export the BibTeX for the document at the end of the document as plain text, under the heading ‘BibTeX’. It means that Reader opens the document and ‘reads’ it and finds the BibTeX, it then uses this when the user performs a basic copy by appending it to the clipboard. When the user then pasted back into Author this is made available and on paste a dialog asks the user: Paste as plain text or use the embedded BibTeX to paste as a citation? The result is that a simple copy and paste becomes a fully formatted citation where the application accepting the paste (in this case Author) ‘knows’ that this is a citation.

The next step from this perspective is to encourage software vendors to produce PDF documents where the visual information contains semantic values, not expecting hidden information to do the job. In terms of archiving and data transfer this is useful but it’s also useful now, to make the systems more rich and robust.

Have a section at the end of the document with the BiBTeX as citation information and don’t call it meta, simply call it information but since it’s clearly marked any reader can use it in the same way as Reader / Author does.

And let’s go further. Let’s use such an appendix to describe the formatting of the document, including how headings are formatted and so on. This should allow for complete compatibility with basic PDF readers but also allow new readers to extract semantic values to allow for richer interactions, such as automatic headings interactions, citation display and interactions and so on.

This could put an end to the absurd academic time-waste of nit-picking how citations should be displayed: Let the teacher/examiner/reader specify how the citations should be displayed, based on the document having described in the appendix how they are used and therefore the reader can re-format the the readers tastes.

This can further be used to work with glossaries and much more and will be robust enough to even be printed out and scanned and all will be retained.

I am putting my money where my mouth is by demonstrating this interaction via Author and Reader but this is as open as possibly can be and the end user can seriously benefit from such a very open rich-information interchange.

 

Note: This became the Visible-Meta approach.

 

Leave a Comment

Visually Distinguishing Dynamic Mode

It’s clear that we need a visual way to distinguish between word processing and dynamic modes.

For this reason I think we should seriously consider using monochromatic for the dynamic view, since it covers all options:

 

Dark Appearance

 

Edit Mode

 

Read Mode

 

 

Light Appearance

 

Edit Mode

Read Mode

 

Leave a Comment