The goal is to gather data in rich, annotated and connected ways, so that the mundane but immediately useful interaction such as “what did George say to Sam about NASA last Tuesday?” becomes possible immediately and the magical “what did George nor say to Sam about NASA last Tuesday but maybe should have?” can be developed.
There are many projects underway to record – The Doug Project aims to augment how all those projects record in useful ways which can then be annotated and interacted with in powerful ways.
As the very first step, the project aims to capture the human voice and turn it into text as the very first data stream. This will allow records of meetings to be accessed and searched in meaningful ways.
Browser & HTML Analogy
The project aims to build a foundation where all the captures data types are clear and clean and rich. Anyone can then build their own interfaces for this – we are most certainly building our own – but may the best Time Browser win. The name’s use of ‘browser’ is meant to make the analogy with the web browser clear; the project works to create a a markup standard and a ways to browse the information.
With time as the backbone, richer data, more annotated and ever more connected, becomes available and alive. With high resolution audio and video (and other sensor data as they come online) we can present this date to high performance analysis and artificial intelligence agents to extract a data set of temporal points which reaches deep into what it means to be human. Picture a dialog between philosophers recorded over time and they are recorded with high fidelity audio from multiple microphones so the system can build a virtual acoustic ‘room’. The video image is also high definition and high frame rate so the system can extract cues from behaviour, including pulse. Over time this rich record will then be able to present ‘knowledge’ from these meetings beyond what was explicitly said, what was explicitly understood at the time.
The concept of ‘Event’ as container/time capsule
Picture the wildly exciting and useful interactions, far beyond any hollywood movie or science fiction novel: Millions of people recording their public dialog, or private, where they keep the data entirely out of the public stream. Imagine building an ‘event’ based on a group of users proximity, they were all present at a location and all recorded a talk given by someone and then later their own aspects of break-out group dialog. Pictures of diagrams, 3D scans, notes and more comes together to give a rich and connected ‘event space’. This event could join up with other events on the other side of the planet and stay connected. At any point later anyone can refer to any part of the event either as a single data point or as a wide-shot of what else was going on.
Imagine the great philosophers working like this. Imagine small project teams working like this. Imagine what we can imagine doing once we ourselves start to live like this. Let’s imagine – and build – together.
The first component of the Doug System is the Dialog Record Component, for recording events/meetings/interactions with as much context as possible to then allow users to interact with the resulting record in rich ways.
User installs the Doug Extension in their web browser (Firefox, Chrome or Safari) or smartphone (iOS or Android) and they can now record any meeting, via teleconference or in-person.
The meeting is recorded as audio, tagged with the following, throughout:
- Time. The recording gets an initial date stamp from a time server to make sure all the Doug recordings are exactly in sync
- Speaker (based on the ID of the person using the extension
- User entered tags for projects etc.
- Manually entered keywords, if any
The audio is then transcribed through automatic computer speech to text or also using human translation (optional button when accuracy is more important, which the user will be billed for by whatever company does it)
• The user can also allow access to their photo albums and thus any pictures taken during a session will automatically be included in the stream (for the user to edit as appropriate).
• Same with any video though it’s not as easy to make sure video recordings are exactly in synch with central/real time as pictures so the Doug system will ask on upload what the starting time was and will attempt to match audio to video to make sure it synchs up, or stick with the time the user entered if there was not a match since audio was recording away from video etc. The system will support multiple video streams overlapping.
• Other documents, such as text documents, can be dragged into the timeline and will be linked to the time they were added, created and under the main section for that meeting section.
Initial Project Spec
Direct Manipulation Queries
Queries are interactions with the text/multimedia on the screen:
• On selecting a speakers name: Show only this speaker/hide this speaker
• On selecting arbitrary text: Show me the first time/most recent time/every time this was mentioned
Queries are entered (via voice, text or other interaction means):
• Show sentences with ‘keyword’ from last Monday’s meeting
• Show me what Sam said in response to Stan when we were talking about DKR last week
& much more.
Initial Prototype Spec for Journal
Extension for web browsers (Chrome first). This will also be done as a mobile app shortly after the web version works.
The extension will allow the user to record their microphone audio, and computer audio as separate channels. The computer audio is used to help line up/synch with other user’s record and is used to determine what should come together as an ‘event’, something which will necessarily need to become more sophisticated later, with permissions and so on, but for the prototype, all data is open, which is fine, since this will be used to build this project itself and the project itself is open.
The extension is correct time using publicly available time-server.
This recorded audio goes to a server, as yet to be decided, in the highest quality possible, and a link is generated. If this is through dropbox this is easy. Probably the same with Amazon or Google storage.
The various feeds from the same time ‘event’ are combined into a multi stream where the person listening can see who is who, by who recorded which stream. This means that transcription can ‘know’ who is who as well.
Once done, the website will list the events and users can choose to send this for transcription, with a known company, who will use this interface to do the transcription in an environment which labels each speaker. The invoking will be done to whomever clicks the ‘transcribe button’ at something like $1.25 a min ($0.25 of which is for the time coding).
Users will be notified when the transcription is done and will be able to visit the website to do keyword searches, scroll through the full text and/or the audio, which will be in synch.