Speech Analytics

From RecordingWiki

Jump to: navigation, search

Search, Analyze, and Act on Captured Information

Speech Analytics is a term used to describe automatic methods of analyzing speech to extract useful information about the speech content or the speakers. Although it often includes elements of automatic speech recognition, where the identities of spoken words or phrases are determined, it may also include analysis of one or more of the following:

  • the topic(s) being discussed
  • the identities of the speaker(s)
  • the genders of the speakers
  • the emotional character of the speech
  • the amount and locations of speech versus non-speech (e.g. background noise or silence)

One use of speech analytics applications is to spot spoken keywords or phrases, either as real-time alerts on live audio or as a post-processing step on recorded speech. This technique is also known as audio mining. Other uses include categorization of speech, for example in the contact center environment, to identify calls from unsatisfied customers. Speech analytics technology may combine results from different techniques to achieve its aims. For example knowledge about where certain keywords were spoken in a customer telephone conversation could be combined with knowledge about which speaker (customer or contact center agent) spoke the words and perhaps knowledge of how often the two speakers were talking at the same time as each other.

Speech Analytics in contact centers can be used to extract critical business intelligence that would otherwise be lost. By analyzing and categorizing recorded phone conversations between companies and their customers, useful information can be discovered relating to strategy, product, process, and operational issues. This information gives decision-makers insight into what customers really think about their company so that they can quickly react.

Contents

[edit] Business Value

Competitive advantage often depends on anticipating market needs faster and more visibly than your competitors. And nothing can tell you more about your business than the voice of your customers. Speech Analytics provides advanced functionality that can help you glean valuable intelligence from thousands — even millions — of customer calls, so you can take action quickly. Although your contact center records customer conversations, the sheer number of recordings can easily exceed your ability to review and analyze them. But Speech Analytics solution can mine recorded customer interactions to surface the intelligence essential for building effective cost containment and customer service strategies. Used in combination with other Workforce Optimization suite components like Quality Monitoring, Recording solution & Scorecards Speech Analytics can help you pinpoint cost drivers, trends, and opportunities, identify strengths and weaknesses with processes and products, and understand how your offerings are perceived by the marketplace. With Speech Analytics, you can turn captured interactions into actionable intelligence for your entire enterprise.

Speech Analytics is designed with the business user in mind. Usually it can provide automated trend analysis to show what’s happening in your contact center. The solution can isolate the words and phrases used most frequently within a given time period, as well as indicate whether usage is trending up or down. This information makes it easy for supervisors, analysts, and others in your organization to spot changes in consumer behavior and take action to reduce call volumes — and increase customer satisfaction.

[edit] Technology

There are four main approaches "under the hood", phonetic approach, keyword spotting, grammar based and Large-Vocabulary Continuous Speech Recognition (LVCSR, better known as full transcription). Some Speech Analytics vendors use the "Engine" of a 3rd party and there are some Speech Analytics vendors that have developed their own proprietary engine.

[edit] Phonetic

This is the fastest approach for processing, mostly because the size of the grammar is very small. The basic recognition unit is a phoneme. There are only few tens of unique phonemes in most languages, and the output of this recognition is a stream (text) of phonemes.

[edit] KWS (Key Word Spotting)

Slower processing, basic unit is a word, the matching between the set of phonemes in the audio and the predefined words list phonetic representation is done during processing. limited approach because adding words to the search requires re-processing of all audio. The iterative discovery process is not on-line and requires a long time to converge. There is a business problem with pre-defining terms because you can't surface new business issues or changes in customers\competitors behaviours.

[edit] LVCSR (Large-Vocabulary Continuous Speech Recognition)

Much slower processing, since the basic unit is a set of words (bi-grams,tri-grams etc..), it needs to have hundred of thousands of words to match the audio against. The output however is a stream of words, making it richer to work with. it can surface new business issues, the queries are much faster, the accuracy is high than in any of the other methods. most importantly because the complete semantic context is in the index you can find and focus on business issues very rapidly.

[edit] Quality

The best LVCSR engines may reach about 50% WER (word error rate). This is considered top of the line performance in today's standards. Still, it may provide more than enough accuracy for statistical analytics. The quality of the phonetic engines is considerably lower. While hard to compare apples to apples, it is in the range of what would be 20% WER.

Personal tools