I have long argued against voice interfaces for information manipulation since they interfere with the visual-dexterity operations of reading and writing. There is good reason why no-one has asked for a system where you speak ‘turn the page’ to turn the page since it would take you out of your internal mental world and break your flow.
However, there are interesting developments being opened up, such as the linguistic support for macOS Mojave which Howard Oakley mentions in his blog https://eclecticlight.co/2018/09/27/mojaves-linguistic-support-a-promising-start and Apple’s APIs for machine learning being included in iOS and macOS, as coreML https://developer.apple.com/machine-learning/ provide opportunities for very rich text manipulation.
What can be reasonable to design systems for now includes such interactions as:
Designing the interfaces for this can quickly include a lot of buttons or commands to memorise and that is an issue.
However, I recently splurged and bought an Apple iPhone X S max which is very, very fast (capable of 5 trillion operations per second!) and which processes speech commands near perfectly and near instantly.
It is becoming clear that we must start experimenting with flexible views based not just on linear commands but also on analysis (coreML) and interactions for this can benefit from speech, where the system does not need a ‘hey Siri’ prompt and which is aware of the on-screen/in-document data it is working on, including the results of previous operations for continual builds of views.
This could be backed up with a tokenised command bar where the commands are added as tokens when spoken and acted upon, both to make sure the user is happy the commands are correctly interpreted and also for the user to then visually edit them as desired and share/save the command set as a ViewSpec if useful.
I feel this warrants serious consideration.