Background
VerificAudio is a cutting-edge tool designed to help journalists verify audio authenticity and detect deepfakes created with synthetic voices in Spanish. Developed by PRISA Media, the largest audio producer in Spanish worldwide, VerificAudio aims to support journalists and editors in radio stations while reinforcing listeners’ trust amidst rising misinformation.
The Problem
In 2022, PRISA Media began experimenting with synthetic voices, launching Victoria, a virtual assistant for smart speakers. As voice cloning technologies improved and became widely accessible, the potential for misuse grew. Fake audios began circulating on social media and messaging apps, posing a threat to public trust in news. Notable instances included fake robocalls from “Joe Biden” and audios falsely attributed to London mayor Sadiq Khan.
Additionally, the risk of individuals falsely claiming their voices had been cloned to evade responsibility in embarrassing situations presented another challenge. With elections on the horizon, particularly in Mexico and Spain, there was an urgent need to address the potential for AI-generated audio to spread misinformation.
Finding a Solution
To tackle these challenges, PRISA Media embarked on developing an audio fact-checking platform with support from the Google News Initiative. VerificAudio was initially conceived as a project for Caracol Radio, PRISA Media’s radio unit in Colombia, and was developed by Minsait, a Spanish tech company. Plaiground, Minsait’s AI business unit, employed natural language processing (NLP) and deep learning to evaluate audio manipulations.
The tool’s development involved preprocessing audio files to ensure data quality through noise reduction, format reconversion, volume equalisation, and temporal cropping. Two complementary approaches were adopted: neural networks and a machine learning-based model.
Implementation
- Machine Learning Model: Plaiground’s team identified key audio features influencing the AI model’s predictions. They defined key performance indicators (KPIs) to validate the model’s accuracy in detecting deepfakes while minimizing false positives. An explainability graph highlights the impact of each audio feature on the model’s decisions.
- Neural Networks Model: This approach involved fine-tuning an open-source model for the Spanish language, comparing audios to determine if they were from the same person, different people, or synthetically generated. The model converts audio into feature vectors representing voice characteristics like pitch and intonation, which are then compared.
Both models were incorporated into a double-check protocol to enhance accuracy and minimize false results. An online interface was developed to facilitate use by non-technical staff.
Challenges Encountered
Creating a comprehensive dataset of fake Spanish audios proved difficult. The team generated its dataset using various cloning technologies, mimicking real fake audios found online. Caracol Radio’s contributions of archive and newly generated audios were crucial. The ongoing nature of the project requires continuous refinement of models to keep pace with evolving cloning technologies.
Two common hurdles with AI are traceability and explainability. While neural models operate like a black box, the machine learning model’s explainability graph provided clarity on decision-making.
Testing the Tool
Validation involved comparing the tool’s results with human judgment. A new dataset of real and fake audios was compiled for both mechanical and manual testing. This dual approach ensured a more accurate and comprehensive assessment, helping refine models and improve the user interface. VerificAudio offers a probabilistic approach, guiding journalists with a percentage likelihood rather than absolute results.
Practical Application
VerificAudio is now accessible to verification teams across PRISA Media’s radio stations in Spain, Colombia, Mexico, and Chile. Journalists encountering suspicious audios can submit them for analysis, considering the file’s origin, distribution channels, news context, and AI analysis from VerificAudio. The interface allows users to upload suspicious and real audio files, offering two verification modes: comparative and identification.
- Comparative Mode: Determines if the files belong to the same speaker, different speakers, or a cloned voice.
- Identification Mode: Indicates whether an audio file is likely real or synthetic, providing a coefficient and a list of key audio attributes influencing the result.
Future Improvements
VerificAudio will continuously evolve to incorporate new technologies and expand its dataset, considering various Spanish accents. Future plans include expanding to other languages like Catalan, Euskara, and Gallego. The tool will be scaled for broader access, with an online platform to publish newsroom analyses and educational resources on deepfakes.
Potential for Other Newsrooms
Currently, VerificAudio is an internal resource for PRISA Media newsrooms. Verification teams will also analyze external submissions. The developing web platform aims to provide an overview of the deepfake audio verification landscape, with potential future access for other media companies once the tool is refined and scaled.
By addressing the challenge of audio deepfakes, VerificAudio exemplifies how innovative AI solutions can support journalists in maintaining public trust and combating misinformation.