Over the past 3-4 years, I have dedicated my efforts to a challenging use case involving speaker recognition or identification through the utilization of Convolutional Neural Network (CNN) networks. Notably, this area poses significant difficulties, primarily due to the scarcity of open-source software capable of handling speaker recognition for extensive datasets. Given the constraints of time, limited availability, and the absence of comprehensive tools, I opted for a less intricate approach. Nevertheless, the fundamental objective remains intact, leveraging similar technologies at its core.
The primary goals of this project are outlined as follows:
Recognize a designated sound within a WAV file.
Detect the sound in real-time as it is played, without waiting for the completion of the entire WAV file.
Utilize a Spectrogram to capture a snapshot of the signal in the frequency domain.
Implement standard model and transfer learning methodologies. Specifically, the algorithm will repurpose an existing model designed for image recognition, adapting it to the spectrogram representation of the WAV file.
Details TBD.