so, what your saying is your trying to set up speaker identification from an audio feed without pre-training that can automatically map metadata to display information in realtime about who's talking?