-
- All Implemented Interfaces:
-
com.sun.jna.NativeMapped,java.lang.AutoCloseable
public class Recognizer extends PointerType implements AutoCloseable
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classRecognizer.EndpointerModeEndpointer delay mode
-
Constructor Summary
Constructors Constructor Description Recognizer(Model model, float sampleRate)Creates the recognizer object. Recognizer(Model model, float sampleRate, SpeakerModel spkModel)Creates the recognizer object with speaker recognition. Recognizer(Model model, float sampleRate, String grammar)Creates the recognizer object with the phrase list.Sometimes when you want to improve recognition accuracy and when you don't needto recognize large vocabulary you can specify a list of phrases to recognize.
-
Method Summary
Modifier and Type Method Description voidsetMaxAlternatives(int maxAlternatives)Configures recognizer to output n-best results. voidsetWords(boolean words)Enables words with times in the output "result" : [{ "conf" : 1.000000, "end" : 1.110000, "start" : 0.870000, "word" : "what" }, { "conf" : 1.000000, "end" : 1.530000, "start" : 1.110000, "word" : "zero" }, { "conf" : 1.000000, "end" : 1.950000, "start" : 1.530000, "word" : "zero" }, { "conf" : 1.000000, "end" : 2.340000, "start" : 1.950000, "word" : "zero" }, { "conf" : 1.000000, "end" : 2.610000, "start" : 2.340000, "word" : "one" }],voidsetPartialWords(boolean partial_words)Like above return words and confidences in partial results. voidsetSpeakerModel(SpeakerModel spkModel)Adds speaker model to already initialized recognizer.Can add speaker recognition model to already created recognizer.Helps to initialize speaker recognition for grammar-based recognizer. booleanacceptWaveForm(Array<byte> data, int len)Accept and process new chunk of voice data. booleanacceptWaveForm(Array<short> data, int len)booleanacceptWaveForm(Array<float> data, int len)StringgetResult()Returns speech recognition result StringgetPartialResult()Returns partial speech recognition. StringgetFinalResult()Returns speech recognition result. voidsetGrammar(String grammar)Reconfigures recognizer to use grammar. voidreset()Resets the recognizer.Resets current results so the recognition can continue from scratch. voidsetEndpointerMode(int mode)Configures endpointer mode for recognizer voidsetEndpointerDelays(float t_start_max, float t_end, float t_max)Set endpointer delays voidclose()Releases recognizer object.Underlying model is also unreferenced and if needed, released. -
Methods inherited from class com.sun.jna.PointerType
equals, fromNative, getPointer, hashCode, nativeType, setPointer, toNative, toString -
Methods inherited from class java.lang.AutoCloseable
close -
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
Constructor Detail
-
Recognizer
Recognizer(Model model, float sampleRate)
Creates the recognizer object.- Parameters:
model- VoskModel containing static data for recognizer.sampleRate- The sample rate of the audio you are going to feed into the recognizer.Make sure this rate matches the audio content, it is a commonissue causing accuracy problems.
-
Recognizer
Recognizer(Model model, float sampleRate, SpeakerModel spkModel)
Creates the recognizer object with speaker recognition.- Parameters:
model- VoskModel containing static data for recognizer.sampleRate- The sample rate of the audio you are going to feed into the recognizer.Make sure this rate matches the audio content, it is a commonissue causing accuracy problems.spkModel- speaker model for speaker identification
-
Recognizer
Recognizer(Model model, float sampleRate, String grammar)
Creates the recognizer object with the phrase list.Sometimes when you want to improve recognition accuracy and when you don't needto recognize large vocabulary you can specify a list of phrases to recognize.- Parameters:
model- VoskModel containing static data for recognizer.sampleRate- The sample rate of the audio you are going to feed into the recognizer.Make sure this rate matches the audio content, it is a commonissue causing accuracy problems.grammar- The string with the list of phrases to recognize as JSON array of strings,for example "["one two three four five", "[unk]"]".
-
-
Method Detail
-
setMaxAlternatives
void setMaxAlternatives(int maxAlternatives)
Configures recognizer to output n-best results.
{ "alternatives": [ { "text": "one two three four five", "confidence": 0.97 }, { "text": "one two three for five", "confidence": 0.03 }, ] }- Parameters:
maxAlternatives- - maximum alternatives to return from recognition results
-
setWords
void setWords(boolean words)
Enables words with times in the output
"result" : [{ "conf" : 1.000000, "end" : 1.110000, "start" : 0.870000, "word" : "what" }, { "conf" : 1.000000, "end" : 1.530000, "start" : 1.110000, "word" : "zero" }, { "conf" : 1.000000, "end" : 1.950000, "start" : 1.530000, "word" : "zero" }, { "conf" : 1.000000, "end" : 2.340000, "start" : 1.950000, "word" : "zero" }, { "conf" : 1.000000, "end" : 2.610000, "start" : 2.340000, "word" : "one" }],- Parameters:
words- - boolean value
-
setPartialWords
void setPartialWords(boolean partial_words)
Like above return words and confidences in partial results.
- Parameters:
partial_words- - boolean value
-
setSpeakerModel
void setSpeakerModel(SpeakerModel spkModel)
Adds speaker model to already initialized recognizer.Can add speaker recognition model to already created recognizer.Helps to initialize speaker recognition for grammar-based recognizer.
- Parameters:
spkModel- Speaker recognition model
-
acceptWaveForm
boolean acceptWaveForm(Array<byte> data, int len)
Accept and process new chunk of voice data.
- Parameters:
data- - audio data in PCM 16-bit mono formatlen- - length of the audio data
-
acceptWaveForm
boolean acceptWaveForm(Array<short> data, int len)
-
acceptWaveForm
boolean acceptWaveForm(Array<float> data, int len)
-
getPartialResult
String getPartialResult()
Returns partial speech recognition.
-
getFinalResult
String getFinalResult()
Returns speech recognition result. Same as result, but doesn't wait for silence.You usually call it in the end of the stream to get final bits of audio. Itflushes the feature pipeline, so all remaining audio chunks got processed.
-
setGrammar
void setGrammar(String grammar)
Reconfigures recognizer to use grammar.
- Parameters:
grammar- Set of phrases in JSON array of strings or "[]" to use default model graph.
-
reset
void reset()
Resets the recognizer.Resets current results so the recognition can continue from scratch.
-
setEndpointerMode
void setEndpointerMode(int mode)
Configures endpointer mode for recognizer
-
setEndpointerDelays
void setEndpointerDelays(float t_start_max, float t_end, float t_max)
Set endpointer delays
- Parameters:
t_start_max- timeout for stopping recognition in case of initial silence (usually around 5.t_end- timeout for stopping recognition in milliseconds after we recognized something (usually around 0.5 - 1.t_max- timeout for forcing utterance end in milliseconds (usually around 20-30)
-
close
void close()
Releases recognizer object.Underlying model is also unreferenced and if needed, released.
-
-
-
-