How can we provide useful computer magic when publishing a document, loosing as little of the richness of the process of producing the document as the author chooses, supporting active reading and analysis of the document by others? These are important questions and something we are looking into at the University of Southampton and with the Author project. 


The document interchange system we are designing and, hopefully, building, should support: 

• Retainment of original document attributes (such as coordinates of nodes in a space) for when opening a published document in the original application.  

• Extraction of attributes to allow other applications to represent and use specialised attributes. 

• Annotations should be possible to add through whatever means the user wants to, such as underlines, highlights or drawings. These annotations should then be attached to the meta-data of the document so that the user can choose to search only highlighted text for example.  

• High Resolution Addressing should be possible so that the author can cite specific passages of text.  

• Distributed Publishing so that if the original server link does not work, the software can present the copies of the document. 

• A new form of Glossaries could be powerful in letting the reader gather a clearer understanding of the authors intention than what the author explicitly puts in to the document.   

• Server Knowledge of the content of the document to allow for analysis of the document or documents in bulk, through making the data in the document clearly tagged and surfacing this meta to other applications or servers.  


The big aim is to produce a document reading, writing and publishing system which will let the reader have a richer interaction with the author’s work than interacting with the author him or herself would allow. That is why we are calling this Socratic Publishing.  

‘Trojan’ PDF 

This needs to be possible within legacy systems, supporting a process of publishing a document in a way which keeps the data structured for better use when someone reads the document or interacts with it as a whole document or pieces of the document. Our solution is a process of encapsulation, where the original document and an XML version is embedded inside a .pdf document so that if a reader only has a PDF reader the PDF version will be shown but if the user has the original system the original document will be presented or if the reader has software which understands the XML then that version will be shown. In the DKR world I propose that these become Doug’s Xfiles.


This is the model as we see it so far: