Commit 925d2f4c authored by pbethge's avatar pbethge
Browse files

Merge branch 'develop' into thingsboard

parents 986152ec 77dd8d13
bin/data/model_4lang filter=lfs diff=lfs merge=lfs -text
bin/data/model_7lang/ filter=lfs diff=lfs merge=lfs -text
0.3.0: TBD
* fix crash due to wrong prev buffer value
* added audio support for multi-channel device input and settable samperate
* added OSC receiver on default port 9898
* added manual listening control via OSC /listen message and l key press
* added listening auto stop on detection via OSC /autostop message or a key press
* added commandline argument parsing
* added verbose printing
* don't enable recording on start
* dont draw extra graph start & end vertices
0.2.0: 2021 Jun 23
* added OSC sending
......
......@@ -3,11 +3,11 @@ Language Identifier
![screenshot](media/screenshot.png)
Identification of chosen languages from 5s long audio snippets
Identification of chosen languages from a live audio stream
This code base has been developed by [ZKM | Hertz-Lab](https://zkm.de/en/about-the-zkm/organization/hertz-lab) as part of the project [»The Intelligent Museum«](#the-intelligent-museum).
Please raise issues, ask questions, throw in ideas or submit code, as this repository is intended to be an open platform to collaboratively improve langugae identification.
Please raise issues, ask questions, throw in ideas or submit code, as this repository is intended to be an open platform to collaboratively improve language identification.
Copyright (c) 2021 ZKM | Karlsruhe.
Copyright (c) 2021 Paul Bethge.
......@@ -22,7 +22,8 @@ Dependencies
* openFrameworks addons:
- ofxOSC (included with oF)
- [ofxTensorFlow2](https://github.com/zkmkarlsruhe/ofxTensorFlow2)
* Pre-trained language id model(s) placed in `bin/data` (included in this repo)
* [CLI11 parser](https://github.com/CLIUtils/CLI11): included in `src`
* [Language Identification](https://github.com/zkmkarlsruhe/language-identification): trained neural networks placed in `bin/data`
Tested Platforms
----------------
......@@ -90,21 +91,96 @@ Usage
The openFrameworks application runs the language identification model using audio input. The detection status and detected language is sent out using OSC (Open Sound Control) messages.
### Key Commands
* `l`: toggle start/stop listening
* `a`: toggle listening auto stop after detection
### OSC Communication
Sends to:
#### Sending
By default, sends to:
* address: `localhost` ie. `127.0.0.1`
* port: `9999`
Message specification:
* **/detected status**: detection status
* **/detected _status_**: detection status
- status: float, boolean 1 found - 0 lost
* **/lang index name confidence**: detected language
* **/lang _index_ _name_ _confidence_**: detected language
- index: int, language map index
- name: string, language map name
- confidence: float, confidence percentage 0 - 100
#### Receiving
By default, listens on:
* port `9898`
Message specification:
* **/listen**: start listening
* **/listen _state_**: start/stop listening
- state: bool, 0 - stop, 1 - start
* **/autostop**: enable listening auto stop after detection
* **/autostop _state_**: enable/disable listening auto stop after detection
- state: bool, 0 - keep listening, 1 - stop on detection
### Commandline Options
Additional run time settings are available via commandline options as shown via the `--help` flag output:
```shell
% bin/LanguageIdentifier --help
identifies spoken language from audio stream
Usage: LanguageIdentifier [OPTIONS]
Options:
-h,--help Print this help message and exit
-s,--senders TEXT ... OSC sender addr:port host pairs, ex. "192.168.0.100:5555" or multicast "239.200.200.200:6666", default "localhost:9999"
-p,--port INT OSC receiver port, default 9898
-c,--confidence FLOAT:FLOAT bounded to [0 - 1]
min confidence, default 0.75
-t,--threshold FLOAT:INT bounded to [0 - 100]
volume threshold, default 25
-l,--list list audio input devices and exit
--inputdev INT audio input device number
--inputname TEXT audio input device name, can do partial match, ex. "Microphone"
--inputchan INT audio input device channel, default 1
-r,--samplerate INT audio input device samplerate, can be 441000 or a multiple of 16000, default 48000
--nolisten do not listen on start
--autostop stop listening automatically after detection
-v,--verbose verbose printing
--version print version and exit
```
For example, to send OSC to multiple addresses use the `-s` option:
```shell
% bin/LanguageIdentifier -s localhost:9999 localhost:6666 192.168.0.101:7777
```
#### macOS
For macOS, the application binary can be invoked from within the .app bundle to pass commandline arguments:
```shell
bin/LanguageIdentifier.app/Contents/MacOS/LanguageIdentifier -h
```
This approach can also be wrapped up into a shell alias to be added to the account's `~/.bash_profile` or `~/.zshrc` file:
```
alias langid="/Applications/LanguageIdentifier.app/Contents/MacOS/LanguageIdentifier"
```
Reload the shell and application can now be invoked via:
```shell
% langid -v --inputdev 2
```
Demos
-----
......@@ -117,10 +193,32 @@ Custom visual front ends are written in Lua for [loaf](http://danomatika.com/cod
### Usage
To set up a run environment on macOS, download oaf and place the .app in the system `/Applications` folder.
To set up a run environment on macOS, download loaf and place the .app in the system `/Applications` folder.
To run a loaf project, drag the main Lua script or project folder onto the loaf.app.
Notes
-----
### Sample Rate
The model inputs audio with a sample rate of 16 kHz, so the incoming stream is downsampled and the app's input sample rate needs to be a multiple of 16, ie. 48kHz, 92kHz, etc.
As 44.1kHz is also common, it is accepted and treated as 48kHz but the downsampled audio is then higher in pitch and may be noisy. In our tests, however detection is still acceptable.
Develop
-------
### Release steps
1. Update changelog
2. Update app version in Xcode project and ofApp.h define
3. Tag version commit, ala "0.3.0"
4. Push commit and tags to server:
git commit push
git commit push --tags
The Intelligent Museum
----------------------
......
# Dataset
languages: ["__noise","chinese","english","french","german","italian","spanish","russian"]
train_dir: "/data/common_voice_filtered/five_sec_vad/wav/train"
val_dir: "/data/common_voice_filtered/five_sec_vad/wav/test"
augment: True
# Audio
audio_length_s: 5
sample_rate: 16000
feature_type: "stft" # one of: mel, stft, fbank
feature_nu: 1024
# Training
batch_size: 64
learning_rate: 0.001
num_epochs: 26
model: "AttRnn" # name the file in src/models
model_path: "" # path to a trained SavedModel to start from
......@@ -5,17 +5,23 @@
local lang = {
{name = "noise", locale = "xx"},
{name = "chinese", locale = "zh_cn"},
{name = "english", locale = "en"},
{name = "french", locale = "fr"},
{name = "german", locale = "de"},
{name = "italian", locale = "it"},
{name = "russian", locale = "ru"},
{name = "spanish", locale = "es"}
}
local greeting = {
xx = "...",
zh_cn = "你好", -- Ni Hao
en = "Hello",
de = "Guten Tag",
fr = "Bonjour",
de = "Guten Tag",
it = "Ciao",
ru = "Привет", -- Privet
es = "Hola"
}
......
#! /bin/sh
#
# LanguageIdentifier wrapper script
#
# Copyright (c) 2021 ZKM | Hertz-Lab
# Dan Wilcox <dan.wilcox@zkm.de>
#
# BSD Simplified License.
# For information on usage and redistribution, and for a DISCLAIMER OF ALL
# WARRANTIES, see the file, "LICENSE.txt," in this distribution.
#
# This code has been developed at ZKM | Hertz-Lab as part of „The Intelligent
# Museum“ generously funded by the German Federal Cultural Foundation.
LID=LanguageIdentifier
DIR="$(dirname $0)/bin"
EXEC=${DIR}/${LID}
# platform specifics
case "$(uname -s)" in
Linux*) ;;
Darwin*)
# invoke executable inside .app bundle
EXEC=$DIR/${LID}.app/Contents/MacOS/${LID}
;;
CYGWIN*) ;;
MINGW*) ;;
*) ;;
esac
# TODO: add lib paths for Linux?
# go
$EXEC $@
......@@ -28,7 +28,7 @@
#include "WavFileWriterBeta.h"
#endif
// a simple Fifo with adjustable max length
/// a simple Fifo with adjustable max length
template <typename T, typename Container=std::deque<T>>
class FixedFifo : public std::queue<T, Container> {
......@@ -43,6 +43,12 @@ class FixedFifo : public std::queue<T, Container> {
std::queue<T, Container>::push(value);
}
void clear() {
while(!this->empty()) {
this->pop();
}
}
void setMaxLen(const std::size_t maxLength) {
maxLen = maxLength;
}
......@@ -55,7 +61,7 @@ class FixedFifo : public std::queue<T, Container> {
typedef std::vector<float> SimpleAudioBuffer;
typedef FixedFifo<SimpleAudioBuffer> AudioBufferFifo;
// custom ofxTF2::Model implementation to handle audio sample conversion, etc
/// custom ofxTF2::Model implementation to handle audio sample conversion, etc
class AudioClassifier : public ofxTF2::Model {
public:
......
This diff is collapsed.
/*
* Language Identifier
*
* Copyright (c) 2021 ZKM | Hertz-Lab
* Paul Bethge <bethge@zkm.de>
* Dan Wilcox <dan.wilcox@zkm.de>
*
* BSD Simplified License.
* For information on usage and redistribution, and for a DISCLAIMER OF ALL
* WARRANTIES, see the file, "LICENSE.txt," in this distribution.
*
* This code has been developed at ZKM | Hertz-Lab as part of „The Intelligent
* Museum“ generously funded by the German Federal Cultural Foundation.
*/
#include "Commandline.h"
Commandline::Commandline(ofApp *app) : app(app) {
parser.description(DESCRIPTION);
}
bool Commandline::parse(int argc, char **argv) {
// local options, the rest are ofApp instance variables
std::vector<std::string> senders;
int port = 0;
bool list = false;
int inputNum = -1;
std::string inputName = "";
int inputChannel = 0;
int sampleRate = 0;
bool nolisten = false;
bool autostop = false;
bool verbose = false;
bool version = false;
parser.add_option("-s,--senders", senders,
"OSC sender addr:port host pairs, ex. \"192.168.0.100:5555\" "
"or multicast \"239.200.200.200:6666\", default \"localhost:9999\"")->expected(-1);
parser.add_option("-p,--port", port, "OSC receiver port, default " + ofToString(app->port));
parser.add_option("-c,--confidence", app->minConfidence,
"min confidence, default " + ofToString(app->minConfidence))->transform(CLI::Bound(0.0, 1.0));
parser.add_option("-t,--threshold", app->volThreshold,
"volume threshold, default " + ofToString(app->volThreshold))->transform(CLI::Bound(0, 100));
parser.add_flag( "-l,--list", list, "list audio input devices and exit");
parser.add_option("--inputdev", inputNum, "audio input device number");
parser.add_option("--inputname", inputName, "audio input device name, can do partial match, ex. \"Microphone\"");
parser.add_option("--inputchan", inputChannel, "audio input device channel, default 1");
parser.add_option("-r,--samplerate", sampleRate, "audio input device samplerate, can be 441000 or a multiple of " +
ofToString(ofApp::modelSampleRate) + ", default " + ofToString(app->sampleRate));
parser.add_flag( "--nolisten", nolisten, "do not listen on start");
parser.add_flag( "--autostop", autostop, "stop listening automatically after detection");
parser.add_flag( "-v,--verbose", verbose, "verbose printing");
parser.add_flag( "--version", version, "print version and exit");
try {
parser.parse(argc, argv);
}
catch(const CLI::ParseError &e) {
error = e;
return false;
}
// verbose printing?
ofSetLogLevel(PACKAGE, (verbose ? OF_LOG_VERBOSE : OF_LOG_NOTICE));
// print version
if(version) {
std::cout << VERSION << std::endl;
return false;
}
// list audio input devices
if(list) {
auto devices = app->soundStream.getDeviceList();
std::size_t count = 0;
std::cout << "input devices (# NAME [CHANNELS]):" << std::endl;
for(std::size_t i = 0; i < devices.size(); i++) {
auto device = devices[i];
if(device.inputChannels > 0) {
std::cout << i << " " << device.name
<< " [" << device.inputChannels << "]" << std::endl;
count++;
}
}
if(count == 0) {
std::cout << "none" << std::endl;
}
return false;
}
// set audio input from device number
if(inputNum >= 0) {
auto devices = app->soundStream.getDeviceList();
if(inputNum >= devices.size()) {
ofLogError(PACKAGE) << "invalid audio device number: " << inputNum;
error = CLI::RuntimeError("invalid audio device number", EXIT_FAILURE);
return false;
}
ofSoundDevice &device = devices[inputNum];
if(device.inputChannels == 0) {
ofLogError(PACKAGE) << "audio device " << inputNum << " has no input channels";
error = CLI::RuntimeError("audio device has no input channels", EXIT_FAILURE);
return false;
}
app->inputDevice = inputNum;
}
// set audio input from device name
if(inputName != "") {
inputNum = -1;
auto devices = app->soundStream.getDeviceList();
for(std::size_t i = 0; i < devices.size(); ++i) {
auto device = devices[i];
if(device.name.find(inputName) != std::string::npos && device.inputChannels > 0) {
inputNum = i;
break;
}
}
if(inputNum >= 0) {
app->inputDevice = inputNum;
}
else {
ofLogWarning(PACKAGE) << "audio input name not found: " << inputName;
}
}
// set audio input channel
if(inputChannel > 0) {
app->inputChannel = inputChannel-1; // 1-index to 0-index
}
// set audio input rate
if(sampleRate > 0) {
bool set = true;
if(sampleRate == 44100) {
// treat as 48k default, pitch change is minimal enough to not affect detection
// and we don't handle non-integer downsampling factors
app->sampleRate = sampleRate;
app->downsamplingFactor = 3;
set = false;
}
else if(sampleRate % ofApp::modelSampleRate != 0) {
ofLogWarning(PACKAGE) << "ignoring input sample rate which is not a multiple of "
<< ofApp::modelSampleRate << ": " << sampleRate;
}
if(set) {
app->sampleRate = sampleRate;
app->downsamplingFactor = sampleRate / ofApp::modelSampleRate;
}
}
// parse sender host strings
// split string by last : to get address & port pair,
// handle bracketed IPv6 hostnames: [::1]:8081
for(auto host : senders) {
std::size_t found = host.find_last_of(":");
if(found == std::string::npos) {
ofLogWarning(PACKAGE) << "ignoring sender host without port: " << host;
continue;
}
std::string addr = host.substr(0, found);
std::string port = host.substr(found+1);
if(addr.size() == 0 || port.size() == 0) {
ofLogWarning(PACKAGE) << "ignoring sender host with empty address or port: " << host;
continue;
}
if(addr[0] == '[' && addr[addr.size()-1] == ']') {
addr = addr.substr(1, addr.size()-2);
}
int p = ofToInt(port);
if(p <= 1024) {
ofLogWarning(PACKAGE) << "ignoring sender host with invalid port or system port: " << host;
continue;
}
app->hosts.push_back(ofApp::OscHost(addr, p));
}
if(app->hosts.empty()) { // default
app->hosts.push_back(ofApp::OscHost("localhost", 9999));
}
// receiver port
if(port > 0) {
if(port <= 1024) {
ofLogWarning(PACKAGE) << "ignoring receiver port in system range (0-1024): " << port;
}
else {
app->port = port;
}
}
// no listen
if(nolisten) {
app->stopListening();
}
// auto stop
if(autostop) {
app->autostop = true;
}
return true;
}
int Commandline::exit() {
return parser.exit(error);
}
/*
* Language Identifier
*
* Copyright (c) 2021 ZKM | Hertz-Lab
* Paul Bethge <bethge@zkm.de>
* Dan Wilcox <dan.wilcox@zkm.de>
*
* BSD Simplified License.
* For information on usage and redistribution, and for a DISCLAIMER OF ALL
* WARRANTIES, see the file, "LICENSE.txt," in this distribution.
*
* This code has been developed at ZKM | Hertz-Lab as part of „The Intelligent
* Museum“ generously funded by the German Federal Cultural Foundation.
*/
#pragma once
#include "ofApp.h"
#include "CLI11.hpp"
/// commandline option parser
class Commandline {
public:
/// constructor with required app instance
Commandline(ofApp *app);
/// parse commandline options
/// returns true if program should continue or false if it should exit
bool parse(int argc, char **argv);
/// print parser error and return exit code
int exit();
ofApp *app = nullptr; //< required app instance
CLI::App parser; //< parser instance
CLI::Error error = CLI::Success(); //< parse error if program should exit
};
......@@ -20,10 +20,25 @@
typedef std::map<int, std::string> Labels;
static Labels labelsMap = {
{0, "noise"},
{1, "english"},
{2, "french"},
{3, "german"},
{4, "spanish"},
};
#define USE_MODEL_V2
#ifdef USE_MODEL_V1
static Labels labelsMap = {
{0, "noise"},
{1, "english"},
{2, "french"},
{3, "german"},
{4, "spanish"}
};
#else
static Labels labelsMap = {
{0, "noise"},
{1, "chinese"},
{2, "english"},
{3, "french"},
{4, "german"},
{5, "italian"},
{6, "russian"},
{7, "spanish"},
};
#endif
......@@ -19,7 +19,7 @@ class WavFileWriterBeta {
std::string filename;
/// open file for a fixed-length number of samples
/// open file for a fixed-length number of samples
WavFileWriterBeta(std::string filename, unsigned short numChannels,
unsigned long sampleRate,
unsigned short bytesPerSample,
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment