The Future Text Initiative

The Audience for This Document

This document is not confidential but the primary audience for this document is potential collaborators, with a few named in the document, some of whom have signed off, and some who have yet to confirm. If you would like to join us, please tell me how and in what way – that is the most important outcome for this document

 

Introduction

The Future Text Initiative is organised around the premise that the written word is a fundamental unit of knowledge and as such is of universal importance. The richer we interact with the written word, the richer we interact with our knowledge and with each other.

The written word has evolved since its first appearance some five and a half thousand years ago and, moreover has evolved in particular evolutionary environments – today’s digital text is as far removed from the printed book as the printed book is from the first scratchings on clay. We don’t know how text will evolve but we do know that the history of text word has not yet been written and that it is of pivotal importance that we invest, as a society, in evolving the medium which carries our symbolic communication.

 

The Pivotal Importance of Text

The written word is not a frozen version of speech, the written word serves entirely different purposes than speech.

“They didn’t invent writing in order to copy spoken language,
but rather to do things that spoken language failed at.”
Yuval Noah Harari in ‘Sapiens’

While speech carries subtleties which text cannot, text carries interactions which speech cannot; you cannot have bullet points in speech. Sure, you can emphasise points, but you cannot let the user skim and re-organise the points. The written word has served as a permanent record for eons and this role is important. Going from a paper substrate to a digital substrate changes the inherent nature of the text however, from primarily being frozen and permanent to being fluid and interactive.

This is the change the Future of Text Initiatives needs to address.

How can we build text environments – tools, technologies, media and infrastructures, to give the text-user, the author and the reader, the most powerful interactions with their knowledge?

 

Pictures, Sound, Video & 3D

There is no question that still and moving 2D images as well as 3D images and audio are important media as well but these media are all getting attention for advanced development and all these media aim to mimic real-world, whereas text aims to take us beyond our immediate senses and into the real of ideas and abstract thought.

Just as a picture can convey 1K words, a word, like ‘love’ or ‘freedom’ can convey something a picture cannot.

We leave it to others to focus on these media while we focus on text, the medium of symbol manipulation which is getting very little development on its own. We do not question the importance of integrating text and other media, but we do not focus on the integration primarily, in the same way an initiative or congregation looking at images would not primarily focus on the integration of image and text.

 

Deep Literacy

The reason for continuing the development of the written word is to nurture and allow for the continual growth of deep literacy.

“Deep Literacy emerges when cognitive strategies enhanced by powerful computational
tools enable knowledge workers to interact effectively with the ever growing inter-connection
of digitized information needed to carry out their work successfully.”
Livia Polanyi

 


 

Projects

The Future Text Project is an integration of the following (initial) projects:

•  The Future of Text Symposium
•  Deep Citations
•  Open Text Libraries
•  Advanced Document Format
•  Time Browser Initiative
•  Liquid | Author, which is a showcase, real-world, end-user product, integrating the other Open Source components listed above.

 

The Future of Text Symposium

Annual Meetings

The annual Future of Text Symposia have been running successfully for 6 years, starting in 2011 and plans to continue: thefutureoftext.org

Weekly Virtual Meetings

We are starting with weekly virtual meetings where the agenda will be shaped by the participants, to support the various Future Text initiatives

Further Community

It will be important to develop a user community, which we are working on through Dino Karabeg at the University of Oslo, Brian Johnsrud at Stanford and Jeff Mackie-Mason at Berkeley. Specific collaborators are listed under the respective projects below.

 

Deep Citations

“When you cite a source, you show how your voice enters into an intellectual conversation,
and you demonstrate your link to the community within which you work.”
Yale University

Citations are what gives your work credibility when you write and clarity when you read. Static citations, the way they are on paper, only mildly delivers on this promise. Digital citations have the opportunity become active neurons in the academic discourse and provide a powerful flow of ideas. Using citations well and supporting the depth of citations is one of the key aspects of deep literacy and as such, developing the infrastructures and best-practices for deep, digital citations is an important aspect of Future Text.

The Deep Citations Initiative will work to developer increasingly better ways to cite, for authors, readers and researchers.

Collaborators include Pete Forsyth of WikiStrategies and likely WikiCite https://meta.wikimedia.org/wiki/WikiCite

 

Open Text Libraries – OTL

The Open Text Libraries is an initiative which aims to provide independent software developers with powerful text interaction tools, as developed by decades of academic research. In the same way that functions like spell-check are provided by the operating system, we feel it will be vitally useful to provide similarly powerful tools in the hands of all developers – none of this is ‘rocket science’ since it’s in the public domain, but there are quite a high cost for everyone ‘rolling their own’.

The Open Text Libraries are developed in collaboration with some of the top computational linguists in the world and will be made freely available to other developers to use in their own applications, hosted on a reputable repository and supplied with complete documentation and API descriptions. It will be a requirement that any software developers who incorporates these libraries will make this clear in the documentation and provide links for others to access the repository and hence spread the word.

There is a wealth of opportunities but we’ll leave it with two for this document; one example of use while writing and another to be used as a Publish and Import Module:

In-Document Interaction Example

When doing a keyword search in a document today, the results are based on that keyword. However, with the capabilities of the OTL, this can become more intelligent. For example, if a document contained the sentences ‘Doug Engelbart lived in California. He was born in Oregon however’ and the user did a keywords search on ‘Doug Engelbart’, the OTL system will be able to give both of these sentences as a result, since it understands that the ‘he’ in the second sentence also referred to ‘Doug Engelbart’.

Export & Import Module Example

The automatic summary used both when publishing and when importing, will allow the user to click on any sentence in the summary to see what parts of the document contributed to that specific summary sentence. This will allow the teacher access to see exactly how the summary was formed and thus understand the document better and it provides the student with a way to understand what the document actually communicates and allows for better editing to achieve the highest level of clear communication.

OTL Collaborators

Author is the initial host for the Open Text Libraries which is a collaboration with Livia Polanyi, Consulting Professor of Linguistics, Stanford University and Bruce Horn coder of the first Mac Finder, now at Intel.

 

Advanced Open Document Format

There is a need for a document format which can handle rich, complicated data in a robust manner, which is open so that any developers can use it for sharing documents. The .txt format is not suitable since it is plain text and .doc is owned by Microsoft and is not open.

Base Proposal

As a base we are looking at a HTML subset. Jesse Grosjean elaborates: “A nested UL list, plus spanning elements on the enclosed paragraph text of each item. It’s constrained enough that the structure is easy to parse out. It’s XML based so has lots of room for metadata. And it’s still HTML so can be opened in any browser and viewed.”

Issues With Current Approaches

Jesse has the following to say about issues with a new document format:

A problem is that most of this rich runtime is lost when serializing to plain text. For instance each item in the runtime has a unique id. If these were serialized it would give each paragraph a persistent identity that I and people extending my app could make great use of. But no plain text format want’s an ID serialized to each line. For for some predefined attributes (say bold) we can make up a plain text encoding scheme as Markdown has done. But there’s no place to store open ended attributes. And each attempt to do so (such as Critic Markup, etc) just makes the plain text harder and view and edit. Another problem is that it’s difficult for other apps to parse the information that has been serialized. So in the case of Markdown (or even much simpler TaskPaper) it’s very hard to write a parser that will correctly understand the underlying structure.

These are all solved problems in XML based formats like OPML and HTML. The structure is extensible while also being easily parseable. But both those formats have there own problems. OPML is easy for programs to read. But it encodes all text as an XML attribute, making it pretty much impossible to read in a plain text editor. HTML is great because you can read it in any web browser. But it has so many elements and is so open ended its impossible to parse out the kind of structure that’s useful to a word processing app.

Advanced Open Document Format Collaborators

We may collaborating with Oliver Reichenstein (iA Writer), Jesse Grosjean (WriteRoom Developer) and Craig Tashman (LiquidText developer) to build a richly extensible and robust document format which will give us interoperability between independent word processor developers applications.

 

The Time Browser Initiative

We will use the logic of the Time Browser Initiative where possible, particularly with citations: http://timebrowser.info

 

Liquid | Author

Author is a macOS word processing application with several unique, student centred features which was developed into a working prototype proof-of-concept and which now serves as an interaction study. Special features of Author include:

•  Cuttings remember everything you cut. Cmd-shift-v to paste from your Cuttings.
•  Automatic Outline. Pinch to collapse the documents to only see the headings. Click on a heading to jump or click outside or ESC to return to main body.
•  Powerful Find. Select text and cmd-f to see only sentences which includes the selected text. Click on a sentence to jump to it’s location in the document or click in the margin to return to regular view.
•  Quick Citations. Quickly & easily assign citations via Amazon look-up or Liquid | Flow’s Copy As Citation command.
• Read & Edit Modes. Read modes supports spacebar for screen down and select text & spacebar for the text to be spoken.
• Academic Export. Export (Author does not feature a ‘Publish’ system) with References automatically appended at the end of the document.
•  Commentary Layer. Implementing a layer in documents for comments and doodles and sketching, to help the reader understand the document and to communicate with the author.
• Publish Analysis Modules designed to guide the student though an analysis process when the chooses to ‘Publish’ their document, to make sure the document handed in is as good as the student is capable of writing, and the same analysis can be done by the teacher on import in order to check for basic issues:
•  See an automatically generated summary of the document (as generated by the code in the Open Text Libraries, see below) which will show the student whether or not what was intended to get across actually comes across in the document. The student can click on any sentence in the summary to see what parts of the document contributed to that specific summary, and easily alter the document where needed
•  Check for plagiarism
•  Check writing level
•  Check citations
•  Specify what type of document/assignment this is and answer questions as to which sections correspond to which expected sections, such as a ‘Methods’ section in a science assignment

Author Project Team

This Author component is a collaboration between the Project Lead Frode Hegland, director of the Liquid Information Company and PhD student at the University of Southampton and advisor Vint Cerf co-inventor of the internet, Internet Evangelist at Google Inc.

Frode is the developer of the macOS  Author software application, which is the initial host application for the Open text Libraries and which will use the Advanced Open Document Format standard.

For more information on Author, please see this post: http://wordpress.liquid.info/the-liquid-author-project/

 

Financials

Revenue

Although the components Author uses will be open source, the Author application will be sold (£8 $10 which is comparable to minimalist word processors and should be affordable to students) to end-users and will be continually updated as the project advances.

Sponsorship

Sponsorship will be greatly appreciated and all sponsors will be duly credited with gratitude.

This is not something we can finance on our own if it is to have a powerful boosting effect on education and building this as a commercial operation will focus too much on revenue and not enough on continually improving the student – teacher relationship via documents. This is why re rely on grants and donations to make this project truly user-focused.

Costs

We do not know the full cost of this project yet, this is an early exploratory project. The main elements of the cost of realising the project will be highly competent programmers, including those with computational linguistics experience.