Last updated on September 7, 2016
The Future Text Initiative is focused on making student-teacher interaction via documents richer and more efficient.
Social text is seeing investment in development and funding and so is collaborative documents, but in academia the finished, ‘frozen’ document represents a framing of a human perspective as part of the student learning process and as such is a crucial part of the development of the student and the academic process itself.
The Future Text Project is an integration of the following (initial) projects:
• Open Text Libraries
• Advanced Document Format
• Time Browser Initiative
• Liquid | Author, which is a showcase, real-world end-see product, integrating the other Open Source components listed above.
User Community Development
It will be important to develop a user community, which we are working on through Dino Karabeg at the University of Oslo, Brian Johnsrud at Stanford and Jeff Mackie-Mason at Berkeley. Specific collaborators are listed under the respective projects below.
The Audience for This Document
This document is not confidential but the primary audience for this document is potential collaborators, with a few named in the document, some of whom have signed off, and some who have yet to confirm. If you would like to join us, please tell me how and in what way – that is the most important outcome for this document
There are two problems Author addresses:
• The difficulty for students to produce well thought out, thoroughly researched and correctly cited documents which are clearly and persuasively presented
• The time it takes teachers to go through large volumes of student writing and give thoughtful feedback effectively
The Author project is built around the student-teacher document workflow, focusing on automating clerical work and giving greater scope for deeper academic engagement.
For students, Author makes the act of citing academic publications, generally available books, videos and other media quick, easy and with added context. This gives the student an effective means through which to show how their work fits within the academic discourse.
Author furthermore features advanced text interaction, through the Advanced Text Interaction Libraries (see below) initiative of which Author is a founder project. The advanced interaction gives students unparalleled opportunities to get to grips with what they are writing, to help them better understand the material.
When the student is done writing a document, the Publish component (see below) guides the student though the process of making sure the document communicates what the student intends to get across, in the correct academic way.
For teachers, Author pre-processes the student document on opening, to check for common issues. Author also provides efficient means for teachers to check the veracity of cited statements and Author provides the teacher with the means to insert comments into the document for the student to review.
Open Text Libraries – OTL
The Open Text Libraries is an initiative which aims to provide independent software developers with powerful text interaction tools, as developed by decades of academic research. In the same way that functions like spell-check are provided by the operating system, we feel it will be vitally useful to provide similarly powerful tools in the hands of all developers – none of this is ‘rocket science’ since it’s in the public domain, but there are quite a high cost for everyone ‘rolling their own’.
The Open Text Libraries are developed in collaboration with some of the top computational linguists in the world and will be made freely available to other developers to use in their own applications, hosted on a reputable repository and supplied with complete documentation and API descriptions. It will be a requirement that any software developers who incorporates these libraries will make this clear in the documentation and provide links for others to access the repository and hence spread the word.
There is a wealth of opportunities but we’ll leave it with two for this document; one example of use while writing and another to be used as a Publish and Import Module:
In-Document Interaction Example
When doing a keyword search in a document today, the results are based on that keyword. However, with the capabilities of the OTL, this can become more intelligent. For example, if a document contained the sentences ‘Doug Engelbart lived in California. He was born in Oregon however’ and the user did a keywords search on ‘Doug Engelbart’, the OTL system will be able to give both of these sentences as a result, since it understands that the ‘he’ in the second sentence also referred to ‘Doug Engelbart’.
Export & Import Module Example
The automatic summary used both when publishing and when importing, will allow the user to click on any sentence in the summary to see what parts of the document contributed to that specific summary sentence. This will allow the teacher access to see exactly how the summary was formed and thus understand the document better and it provides the student with a way to understand what the document actually communicates and allows for better editing to achieve the highest level of clear communication.
Author is the initial host for the Open Text Libraries which is a collaboration with Livia Polanyi, Consulting Professor of Linguistics, Stanford University and Bruce Horn coder of the first Mac Finder, now at Intel.
Advanced Open Document Format
There is a need for a document format which can handle rich, complicated data in a robust manner, which is open so that any developers can use it for sharing documents. The .txt format is not suitable since it is plain text and .doc is owned by Microsoft and is not open.
As a base we are looking at a HTML subset. Jesse Grosjean elaborates: “A nested UL list, plus spanning elements on the enclosed paragraph text of each item. It’s constrained enough that the structure is easy to parse out. It’s XML based so has lots of room for metadata. And it’s still HTML so can be opened in any browser and viewed.”
Issues With Current Approaches
Jesse has the following to say about issues with a new document format:
A problem is that most of this rich runtime is lost when serializing to plain text. For instance each item in the runtime has a unique id. If these were serialized it would give each paragraph a persistent identity that I and people extending my app could make great use of. But no plain text format want’s an ID serialized to each line. For for some predefined attributes (say bold) we can make up a plain text encoding scheme as Markdown has done. But there’s no place to store open ended attributes. And each attempt to do so (such as Critic Markup, etc) just makes the plain text harder and view and edit. Another problem is that it’s difficult for other apps to parse the information that has been serialized. So in the case of Markdown (or even much simpler TaskPaper) it’s very hard to write a parser that will correctly understand the underlying structure.
These are all solved problems in XML based formats like OPML and HTML. The structure is extensible while also being easily parseable. But both those formats have there own problems. OPML is easy for programs to read. But it encodes all text as an XML attribute, making it pretty much impossible to read in a plain text editor. HTML is great because you can read it in any web browser. But it has so many elements and is so open ended its impossible to parse out the kind of structure that’s useful to a word processing app.
Advanced Open Document Format Collaborators
We may collaborating with Oliver Reichenstein (iA Writer), Jesse Grosjean (WriteRoom Developer) and Craig Tashman (LiquidText developer) to build a richly extensible and robust document format which will give us interoperability between independent word processor developers applications.
The Time Browser Initiative
We will use the logic of the Time Browser Initiative where possible, particularly with citations: http://timebrowser.info
Liquid | Author
Author is a macOS word processing application with several unique, student centred features which was developed into a working prototype proof-of-concept and which now serves as an interaction study. Special features of Author include:
• Cuttings remember everything you cut. Cmd-shift-v to paste from your Cuttings.
• Automatic Outline. Pinch to collapse the documents to only see the headings. Click on a heading to jump or click outside or ESC to return to main body.
• Powerful Find. Select text and cmd-f to see only sentences which includes the selected text. Click on a sentence to jump to it’s location in the document or click in the margin to return to regular view.
• Quick Citations. Quickly & easily assign citations via Amazon look-up or Liquid | Flow’s Copy As Citation command.
• Read & Edit Modes. Read modes supports spacebar for screen down and select text & spacebar for the text to be spoken.
• Academic Export. Export (Author does not feature a ‘Publish’ system) with References automatically appended at the end of the document.
Publishing & Import Analysis Modules
The Publishing & Import Analysis Modules are designed to guide the student though an analysis process when the chooses to ‘Publish’ their document, to make sure the document handed in is as good as the student is capable of writing, and the same analysis can be done by the teacher on import in order to check for basic issues:
The student will choose a ‘Publish’ option when done writing the paper and this will launch a series of Publish Modules, all of which the student can choose to ignore or to use. These modules include:
• See an automatically generated summary of the document (as generated by the code in the Open Text Libraries, see below) which will show the student whether or not what was intended to get across actually comes across in the document. The student can click on any sentence in the summary to see what parts of the document contributed to that specific summary, and easily alter the document where needed
• Check for plagiarism
• Check writing level
• Check citations
• Specify what type of document/assignment this is and answer questions as to which sections correspond to which expected sections, such as a ‘Methods’ section in a science assignment
Publishing will assign unique ID has-tags to the document so that the document can be identified much in the same way a printed copy of a journal can be identified, without having to be at a specific server address.
Teacher Import Analysis
When a teacher opens a student document the document will automatically be analysed using the very same modules the student had access to when ‘Publishing’ the document and a report will be presented, where the teacher can choose to read the document or automatically reply to the student with the issues presented. The fact that both student and teacher have the same modules available should mean a higher level of work is presented and it also means that the teacher can accept other document formats and the analysis will still take place.
Implementing a layer in documents for comments and doodles and sketching, to help the reader understand the document and to communicate with the author.
Liquid | Author Project Team
This Liquid | Author component is a collaboration between the Project Lead Frode Hegland, director of the Liquid Information Company and PhD student at the University of Southampton and advisor Vint Cerf co-inventor of the internet, Internet Evangelist at Google Inc.
Frode is the developer of the macOS Liquid | Author software application, which is the initial host application for the Open text Libraries and which will use the Advanced Open Document Format standard.
Although the components Author uses will be open source, the Author application will be sold (£8 $10 which is comparable to minimalist word processors and should be affordable to students) to end-users and will be continually updated as the project advances.
Sponsorship will be greatly appreciated and all sponsors will be duly credited with gratitude.
This is not something we can finance on our own if it is to have a powerful boosting effect on education and building this as a commercial operation will focus too much on revenue and not enough on continually improving the student – teacher relationship via documents. This is why re rely on grants and donations to make this project truly user-focused.
We do not know the full cost of this project yet, this is an early exploratory project. The main elements of the cost of realising the project will be highly competent programmers, including those with computational linguistics experience.