Skip to content

Voice Activity Detection Node

AI/ML/ONNX/Audio

Voice Activity Detection

Detect speech segments in audio. Download Silero VAD model from: https://github.com/snakers4/silero-vad/raw/master/src/silero_vad/data/silero_vad.onnx

onnx_vadonnx
Inputs6
Outputs3
Security exposure10/10
Packageonnx

Ratings

Scores range from 0 to 10. Higher values mean more impact, exposure, or operational weight.

No score metadata has been set for this node yet.

Input Pins

6

Input

Execution
exec_in

Initiate Execution

Model

Struct
model

ONNX VAD Model

NodeOnnxSessionNodeOnnxSession1 fields
session_refstringrequired

Cache ID for Session

Schema enforced

Audio

Struct
audio

Input audio data

AudioDataAudioData4 fields
sample_rateinteger:uint32required

Sample rate in Hz

format uint32min 0
channelsinteger:uint16required

Number of channels (1 = mono, 2 = stereo)

format uint16min 0max 65535
samplesArray<number:float>required

Audio samples (normalized to -1.0 to 1.0)

itemsnumber:floatarray item
format float
duration_secsnumber:floatrequired

Duration in seconds

format float

Threshold

Float
threshold

Speech probability threshold

Default 0.5

Min Speech

Integer
min_speech_ms

Minimum speech duration (ms)

Default 250

Min Silence

Integer
min_silence_ms

Minimum silence duration (ms)

Default 100

Output Pins

3

Output

Execution
exec_out

Done

Result

Struct
result

VAD result

VadResultVadResult3 fields
segmentsArray<SpeechSegment>required

Speech segments detected

itemsSpeechSegmentarray item
startnumber:floatrequired

Start time in seconds

format float
endnumber:floatrequired

End time in seconds

format float
confidencenumber:floatrequired

Average confidence

format float
probabilitiesArray<number:float>required

Frame-level speech probabilities

itemsnumber:floatarray item
format float
frame_durationnumber:floatrequired

Frame duration in seconds

format float

Segments

Generic
segments

Speech segments

Node Info

Internal name
onnx_vad
Category
AI/ML/ONNX/Audio
Version
1