UMS Transcribe - Usage Manual | UniMRCP Documentation

Name	Unit	Description
license-file	File path	Specifies the license file. File name may include patterns containing '*' sign. If multiple files match the pattern, the most recent one gets used.
credentials-file	File path	Specifies the AWS credentials file to use. File name may include patterns containing '*' sign. If multiple files match the pattern, the most recent one gets used.
credentials-provider	String	Specifies a credentials provider. If not initialized or set to custom, the custom credentials provider is used to read credentials from credentials-file. Otherwise, if set to default, the AWS default credentials provider chain is used, and credentials-file is not observer. Otherwise, if set to sts, the AWS STS profile credentials provider is used, and credentials-file is not observer.
init-sdk	Boolean	Specifies whether to initialize AWS SDK upon loading of the plugin. Must be set to true by default. Set it to false, if another plugin using the same AWS SDK is loaded prior to this plugin.
shutdown-sdk	Boolean	Specifies whether to shut down AWS SDK upon unloading of the plugin. Must be set to true by default. Set it to false, if another plugin using the same AWS SDK is unloaded next to this plugin.
sdk-log-level	Integer	Specifies a log level of AWS SDK. If not initialized or set to 0, the SDK logs are disabled. Acceptable values are from 0 (OFF) to 6 (TRACE).

Name	Unit	Description
streaming-recognition	String	Specifies recognition parameters of streaming recognition.
results	String	Specifies parameters of recognition results set in RECOGNITION-COMPLETE events.
speech-contexts	String	Contains a list of speech contexts.
speech-dtmf-input-detector	String	Specifies parameters of the speech and DTMF input detector.
utterance-manager	String	Specifies parameters of the utterance manager.
rdr-manager	String	Specifies parameters of the Recognition Details Record (RDR) manager.
monitoring-agent	String	Specifies parameters of the monitoring manager.
license-server	String	Specifies parameters used to connect to the license server. The use of the license server is optional.

Name	Unit	Description
language	String	Specifies the default language to use, if not set by the client. For a list of supported languages, visit https://docs.aws.amazon.com/transcribe/latest/dg/websocket.html
single-utterance	Boolean	Specifies whether to detect a single spoken utterance or perform continuous recognition.
interim-results	Boolean	Specifies whether to request interim results or not.
start-of-input	String	Specifies the source of start of input event sent to the client (use "service-originated" for an event originated based on a first-received interim result and "internal" for an event determined by plugin).
max-alternatives	Integer	Specifies the maximum number of speech recognition result alternatives to be returned. Can be overridden by client by means of the header field N-Best-List-Length.
alternatives-below-threshold	Boolean	Specifies whether to return speech recognition result alternatives with the confidence score below the confidence threshold.
skip-unsupported-grammars	Boolean	Specifies whether to skip or raise an error while referencing a malformed or not supported grammar.
transcription-grammar	String	Specifies the name of the built-in speech transcription grammar. The grammar can be referenced as builtin:speech/transcribe or builtin:grammar/transcribe, where transcribe is the default value of this parameter.
inter-result-timeout	Time interval [msec]	Specifies a timeout between interim results containing transcribed speech. If the timeout is elapsed, input is considered complete. The timeout defaults to 0 (disabled).
region	String	Specifies the AWS region.
vocabulary-name	String	Specifies an optional custom vocabulary. https://docs.aws.amazon.com/transcribe/latest/dg/how-vocabulary.html
vocabulary-filter-name	String	Specifies a custom vocabulary filter name. Available since 1.3.0.
vocabulary-filter-method	String	Specifies a vocabulary filter method (one of "remove", "mask", "tag"). Available since 1.3.0.
language-model-name	String	Specifies a custom language model name. Available since 1.3.0.
show-speaker-label	Boolean	Specifies whether to enable speaker identification. Available since 1.3.0.
enable-partial-results-stabilization	Boolean	Specifies whether to enable partial results stabilization. Available since 1.3.0.
partial-results-stability	String	Specifies a partial results stability level (one of "high", "medium", "low"). Available since 1.3.0.
grammar-param-separator	Char	Specifies a seprator of optional parameters passed to a built-in grammar. The separator defaults to ';'. Available since 1.3.0.
proxy-host	String	Specifies the host name of HTTP proxy, if used.
proxy-port	Integer	Specifies the port number of HTTP proxy, if used.
proxy-username	String	Specifies the username employed for HTTP proxy authentication, if used.
proxy-password	String	Specifies the password employed for HTTP proxy authentication, if used.
proxy-scheme	String	Specifies the URI scheme of HTTP proxy, if used. One of "http" or "https". Defaults to "http". Available since 1.4.0.
sort-alternatives	Boolean	Specifies whether to sort speech recognition result alternatives to ensure the order based on confidence score (the highest first). Available since 1.5.0.
service-uri	String	Specifies a custom service endpoint. Available since 1.9.0.

Name	Unit	Description
format	String	Specifies the format of results to be returned to the client (use "standard" for NLSML and "json" for JSON).
indent	Integer	Specifies the indent to use while composing the results.
confidence-format	String	Specifies the format of the confidence score to be returned (use "auto" for a format based on protocol version, "mrcpv2" for a float value in the range of 0..1, "mrcpv1" for an integer value in the range of 0..100)
tag-format	String	Specifies the format of the instance element to be returned. The parameter is observed only if the format is set to standard. Use one of: default for the original behavior semantics/json for results received from the Transcribe API represented in JSON Available since 1.6.0.

Name	Unit	Description
id	String	Specifies a unique string identifier of the speech context to be referenced by the MRCP client.
enable	Boolean	Specifies whether the speech context is enabled or disabled.
speech-complete	Boolean	Specifies whether to complete input as soon as an interim result matches one of the specified phrases.
language	String	The language the phrases are defined for.
scope	String	Specifies a scope of the speech context, which can be set to either hint or strict.

Name	Unit	Description
tag	String	Specifies an optional arbitrary string identifier to be returned as an instance in the NLSML result, if the transcription result matches the phrase.

Name	Unit	Description
save-waveforms	Boolean	Specifies whether to save waveforms or not.
purge-existing	Boolean	Specifies whether to delete existing records on start-up.
max-file-age	Time interval [min]	Specifies a time interval in minutes after expiration of which a waveform is deleted. Set 0 for infinite.
max-file-count	Integer	Specifies the max number of waveforms to store. If reached, the oldest waveform is deleted. Set 0 for infinite.
waveform-base-uri	String	Specifies the base URI used to compose an absolute waveform URI.
waveform-folder	Dir path	Specifies a folder the waveforms should be stored in.
file-prefix	String	Specifies a prefix used to compose the name of the file to be stored. Defaults to 'umstranscribe-', if not specified.
use-logging-tag	Boolean	Specifies whether to use the MRCP header field Logging-Tag, if present, to compose the name of the file to be stored.

Name	Unit	Description
save-records	Boolean	Specifies whether to save recognition details records or not.
purge-existing	Boolean	Specifies whether to delete existing records on start-up.
max-file-age	Time interval [min]	Specifies a time interval in minutes after expiration of which a record is deleted. Set 0 for infinite.
max-file-count	Integer	Specifies the max number of records to store. If reached, the oldest record is deleted. Set 0 for infinite.
record-folder	Dir path	Specifies a folder to store recognition details records in. Defaults to ${UniMRCPInstallDir}/var.
file-prefix	String	Specifies a prefix used to compose the name of the file to be stored. Defaults to 'umstranscribe-', if not specified.
use-logging-tag	Boolean	Specifies whether to use the MRCP header field Logging-Tag, if present, to compose the name of the file to be stored.

Name	Unit	Description
refresh-period	Time interval [sec]	Specifies a time interval in seconds used to periodically refresh usage details. See `<usage-refresh-handler>`.

Name	Unit	Description
enable	Boolean	Specifies whether the use of license server is enabled or not. If enabled, the license-file attribute is not honored.
server-address	String	Specifies the IP address or host name of the license server.
certificate-file	File path	Specifies the client certificate used to connect to the license server. File name may include patterns containing a * sign. If multiple files match the pattern, the most recent one gets used.
ca-file	File path	Specifies the certificate authority used to validate the license server.
channel-count	Integer	Specifies the number of channels to check out from the license server. If not specified or set to 0, either all available channels or a pool of channels will be checked based on the configuration of the license server.
http-proxy-address	String	Specifies the IP address or host name of the HTTP proxy server, if used.
http-proxy-port	Integer	Specifies the port number of the HTTP proxy server, if used.
security-level	Integer	Specifies the SSL security level, which defaults to 1. Applicable since OpenSSL 1.1.0. Available since 1.6.0.

Name	Unit	Description
aws-credentials-file	File path	Specifies the AWS credentials file to use. File name may include patterns containing '*' sign. If multiple files match the pattern, the most recent one gets used. Available since 1.1.0.
aws-credentials-provider	String	Specifies a credentials provider. Use one of: custom for credentials read from the specified file default for the AWS default credentials provider chain sts for the AWS STS profile credentials provider Available since 1.1.0.
aws-credentials-profile	String	Specifies a credentials profile to reference and/or create. Available since 1.1.0.
aws-credentials-duration	Integer	Specifies a lifetime of the credentials profile to create. Available since 1.1.0.
aws-arn-role	String	Specifies an ARN role. Available since 1.1.0.
aws-region	String	Specifies an AWS region. Available since 1.1.0.

Sensitivity-Level	Vad-Mode
[0.00 ... 0.25)	0
[0.25 … 0.50)	1
[0.50 ... 0.75)	2
[0.75 ... 1.00]	3

Name	Unit	Description
start-of-input	String	Specifies the source of start of input event sent to the client (use "service-originated" for an event originated based on a first-received interim result and "internal" for an event determined by plugin).
alternatives-below-threshold	Boolean	Specifies whether to return speech recognition result alternatives with the confidence score below the confidence threshold.
single-utterance	Boolean	Specifies whether to detect a single spoken utterance or perform continuous recognition.
speech-start-timeout	Time interval [msec]	Specifies how long to wait in transition mode before triggering a start of speech input event.
interim-result-timeout	Time interval [msec]	Specifies a timeout between interim results containing transcribed speech. If the timeout is elapsed, input is considered complete.
vocabulary-name	String	Specifies an optional custom vocabulary (https://docs.aws.amazon.com/transcribe/latest/dg/how-vocabulary.html). Available since 1.3.0.
vocabulary-filter-name	String	Specifies a custom vocabulary filter name. Available since 1.3.0.
vocabulary-filter-method	String	Specifies a vocabulary filter method (one of "remove", "mask", "tag"). Available since 1.3.0.
language-model-name	String	Specifies a custom language model name. Available since 1.3.0.
show-speaker-label	Boolean	Specifies whether to enable speaker identification. Available since 1.3.0.
enable-partial-results-stabilization	Boolean	Specifies whether to enable partial results stabilization. Available since 1.3.0.
partial-results-stability	String	specifies a partial results stability level (one of "high", "medium", "low"). Available since 1.3.0.
grammar-param-separator	Char	Specifies a seprator of optional parameters passed to a built-in grammar. The separator defaults to ';'. Available since 1.3.0.
service-uri	String	Specifies a custom service endpoint. Available since 2.9.0.

¶ 1 Overview

¶ 1.1 Installation

¶ 1.2 Applicable Versions

¶ 2 Supported Features

¶ 2.1 MRCP Methods

¶ 2.2 MRCP Events

¶ 2.3 MRCP Header Fields

¶ 2.4 Grammars

¶ 2.5 Results

¶ 3 Configuration Format

¶ 3.1 Document

¶ 3.2 Streaming Recognition

¶ 3.3 Results

¶ 3.4 Speech Contexts

¶ 3.5 Speech Context

¶ 3.6 Phrase

¶ 3.7 Utterance Manager

¶ 3.8 RDR Manager

¶ 3.9 Monitoring Agent

¶ 3.10 Usage Change Handler

¶ 3.11 Usage Refresh Handler

¶ 3.12 License Server

¶ 3.13 Credentials Profiles

¶ 4 Configuration Steps

¶ 4.1 Using Default Configuration

¶ 4.2 Using with Polly

¶ 4.3 Specifying AWS Credentials

¶ 4.4 Specifying Recognition Language

¶ 4.5 Specifying Sampling Rate

¶ 4.6 Specifying Speech Input Parameters

¶ 4.7 Specifying DTMF Input Parameters

¶ 4.8 Specifying No-Input and Recognition Timeouts

¶ 4.9 Specifying Vendor Specific Parameters

¶ 4.10 Maintaining Utterances

¶ 4.11 Maintaining Recognition Details Records

¶ 5 Recognition Grammars and Results

¶ 5.1 Using Built-in Speech Grammar

¶ 5.2 Using Built-in DTMF Grammars

¶ 5.3 Retrieving Results

¶ 6 Monitoring Usage Details

¶ 6.1 Log Usage

¶ 6.2 Update Usage

¶ 6.3 Dump Channels

¶ 7 Usage Examples

¶ 7.1 Speech Recognition

¶ 7.2 DTMF Recognition

¶ 7.3 Speech and DTMF Recognition

¶ 8 Sequence Diagram

¶ 8.1 MRCPv1

¶ 8.2 MRCPv2

¶ 9 References

¶ 9.1 AWS Transcribe

¶ 9.2 Specifications