Last week the US Patent & Trademark Office published a patent application from Microsoft that reveals their invention relating to voice-controlled camera operations for a smartphone. The new feature would be great for self-portraits or for group photos that could allow you to join in. While it's not as cool as saying "OK Google" take a photo in a hands free mode, it's still interesting to see that voice commands will be coming to smartphones to control camera functionality in the future, and particularly from a Windows Smartphone. It was rumored last month that Microsoft has been testing a new smartphone design, and so this Microsoft patent filing is just another hint that such a development is in fact in play.
Microsoft's Patent Background
With the increasing popularity of computing devices (e.g., smart phones) having image capture functionality, there is a need for improving the user experience by allowing quick access to image-capture functionality. Image capture operations on smart phones are typically initiated by physical contact with the device (e.g., by tapping a touchscreen or pressing a hardware button). Some computing devices provide dedicated hardware buttons (e.g., camera shutter buttons) to provide access to image capture functionality. However, there are situations when such interactions are not convenient for the user. For example, when taking a digital photograph with a smart phone, pressing a hardware button may cause camera shake, thereby distorting the resulting image.
Microsoft's Voice-Controlled Camera Operations
Microsoft's invention generally relates to a computing device (e.g., a smart phone, tablet computer, digital camera, or other device with image capture functionality) that causes a camera to capture one or more digital images based on a voice command received by the computing device.
For example, a user speaks a word or phrase, and the user's voice is converted to audio input data by the computing device. The computing device compares (e.g., using an audio matching algorithm) the audio input data to an expected voice command associated with a camera application, and determines whether the word or phrase spoken by the user matches the expected voice command.
In another aspect, a computing device activates a camera application on the computing device and captures one or more digital images based on a voice command received by the computing device. In another aspect, a computing device transitions from a low-power state to an active state, activates an camera application, and causes a camera device to capture digital images based on a voice command received by the computing device. In the low-power state, the computing device listens for voice commands that could cause a transition to the active state.
The technologies described in Microsoft's invention are useful, for example, for performing image capture functions without making physical contact with a device so as to avoid camera shake, or to allow a subject of a photo in a self-portrait or group photo to initiate an image capture operation without touching the device or setting a timer and without having to navigate through a complicated menu system.
Is this a Peek at Microsoft's Smartphone?
The first thing that strikes you when you view Microsoft's patent FIG 2A below is the clearly visible Windows logo located at the bottom of the bezel that acts as the smartphone's home button. Is this a hint of a future Microsoft smartphone?
In a basic overview of this future smartphone, we point out just a few features that are notable. Microsoft states that this smartphone will include a microphone, speaker and two proximity sensors (#246 and #248) which will be able to emit an infrared beam and receive a reflected infrared beam, which is reflected off the surface of a nearby object that has been illuminated by the emitted infrared beam.
In another example, Microsoft states that the smartphone could include a photodiode (#280) which could be used as a light in order to determine objects in proximity with camera with improved accuracy.
The camera shutter button (#224) of the smartphone could be a dedicated single-action camera shutter button or a dedicated dual-action camera shutter button with the ability to detect "half-press" (partial actuation) and "full-press" (further actuation) as distinct actions. In some examples, the dual action camera shutter button has different attributes associated with half-press and full-press actions.
Turning to the rear view of the smartphone in patent FIG. 2B we see that this future smartphone includes a camera lens (#260) and an electronic flash (#265).
Microsoft's patent FIG. 9 is a flow chart showing an exemplary technique for activating an image capture application and capturing digital images based on a voice commands.
Exemplary Voice-Controlled Image Capture Application
Microsoft's patent FIG. 4 depicts a front view of an example smartphone displaying a graphical user interface for a voice-controlled image-capture application on a touchscreen display. For example, a user (not shown) could speak a voice command into the microphone (#450) to provide voice input that causes the smartphone to capture the image shown in the viewfinder (#415).
A voice command could be any word, phrase, or utterance, and can be spoken in any language at any volume, distance, or orientation that is suitable for the smartphone to recognize the command. For example, in an image capture scenario, a user could speak a command while in front of the camera (e.g., during a self-portrait), behind the camera (e.g., while looking into the viewfinder), beside the camera, wearing the camera (e.g., on a helmet while riding a bicycle), or in any other orientation.
A voice command can be a default command or a custom command selected by a user. Training can be used to customize commands or to make voice recognition more accurate. Training could be useful to help audio matching algorithms accurately recognize commands given by different people, or by groups of people speaking in unison, which may have different characteristics and may benefit from the use of different acoustical models for matching.
A user could also be given options (e.g., via a settings menu) to perform training to improve accuracy (e.g., by recording the user's voice speaking a default command) and/or to record a different command that can be used to replace a default command. Training can also take place automatically (e.g., by storing previously recognized commands) without user action.
A voice command could be descriptive of the action to be taken (e.g., "capture image" or "take photo"), or some other command can be used. For example, the voice command can take the form of a word or phrase (e.g., "one . . . two . . . three!") that a photographer may use to announce that the photo is about to be taken, or that photo subjects themselves might say before the photo is taken (e.g., a word or phrase that may cause the subjects to smile, such as "fuzzy pickles!" or "cheese!" or "cheeseburgers!"). Alternatively, other types of sound (e.g., clapping hands, snapping fingers, synthesized speech or tones) besides a human voice can be used to issue a command.
Save or Delete Photo Commands
Voice commands can be used for invoking or controlling functionality described with reference to controls and hardware buttons shown in patent FIG. 4 or other functionality (e.g., other image capture or image processing functionality, video or audio recording functionality). Besides commands to cause capture of images, other possible commands include commands to save a photo that has been taken (e.g., a "Keep" command), to delete a photo (e.g., a "Delete" command), to show a previously captured photo (e.g., a "Show" command), to record video or audio (e.g., a "Record" command), to stop recording video or audio (e.g., a "Stop" command).
And lastly, it should be noted that Microsoft's invention could relate to other devices such as a tablet, a dedicated camera or future gaming console (Xbox). Considering that the next Xbox may include Kinect, adding voice controls for taking photos while in the living room could be fun.
Microsoft filed their patent application under serial number 297116 back in Q4 2011. The US Patent and Trademark Office published the application earlier this month. Considering that this is a patent application, the timing of such a product to market is unknown at this time.
A Note for Tech Sites covering our Report: We ask tech sites covering our report to kindly limit the use of our graphics to one image. Thanking you in advance for your cooperation.
Patent Bolt presents a detailed summary of patent applications with associated graphics for journalistic news purposes as each such patent application is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent application should be read in its entirety for full and accurate details. Revelations found in patent applications shouldn't be interpreted as rumor or fast-tracked according to rumor timetables. About Comments: Patent Bolt reserves the right to post, dismiss or edit comments.