In this example we will use two artificial neural networks. After gathering a chunk of audio we will check if there it contains human speech. This process is called Voice Activity Detection (VAD). If a voice is detected we start accumulating audio in order to feed the Keyword Spotting System (KWS). The KWS may be exchanged with any other AI that feeds on speech.