Environmental Acoustic Features Robustness Analysis: A Multi-Aspecs Study

Andi Bahtiar Semma; Kusrini Kusrini; Arif Setyanto; Bruno da Silva; An Braeken

doi:10.29407/intensif.v9i1.23723

Authors

Andi Bahtiar Semma Universitas AMIKOM Yogyakarta https://orcid.org/0000-0002-6487-1791
Kusrini Universitas AMIKOM Yogyakarta https://orcid.org/0000-0001-9573-3909
Arif Setyanto Universitas AMIKOM Yogyakarta https://orcid.org/0000-0003-0721-3941
Bruno da Silva Vrije Universiteit Brussels https://orcid.org/0000-0002-4877-9688
An Braeken Vrije Universiteit Brussels https://orcid.org/0000-0002-9965-915X

DOI:

https://doi.org/10.29407/intensif.v9i1.23723

Keywords:

Acoustic Fingerprinting, Signal Processing, MFCC, Environmental Sound

Abstract

Abstract—Background: Acoustic signals are complex, with temporal, spectral, and amplitude variations. Their non-stationarity complicates analysis, as traditional methods often fail to capture their richness. Environmental factors like reflections, refractions, and noise further distort signals. While advanced techniques such as adaptive filtering and deep learning exist, comprehensive acoustic feature analysis remains limited. Objective: This study investigates which acoustic features maintain the highest robustness across diverse environments while preserving discriminative power. Methods: Audio samples were recorded in controlled environments (jungles, cafés, factories, streets) with varying noise levels. Standardized equipment captured 22050 Hz, 16-bit audio at multiple positions and distances. After amplitude standardization, various acoustic features were extracted and analyzed. Results: MFCCs demonstrated exceptional reliability, with correlation coefficients of 0.98819 and 0.98889 for closely positioned devices and a robustness score of 0.99. Across different acoustic scenes and sample lengths (1, 3, 5s), MFCCs maintained high correlation (≈0.978) and robustness (0.98), confirming their versatility. Conclusion: MFCCs proved highly effective for acoustic fingerprinting across settings. Despite limitations in tested environments (≤5m distance, ≤5s samples), their consistent performance validates the methodology. Future research should explore combining MFCCs with spectral features and expanding studies to broader environments and device types.

Downloads

Download data is not yet available.

Abstract views: 124 , PDF downloads: 131

Author Biographies

Andi Bahtiar Semma, Universitas AMIKOM Yogyakarta

Informatics, Universitas AMIKOM Yogyakarta
Kusrini, Universitas AMIKOM Yogyakarta

Informatics, Universitas AMIKOM Yogyakarta
Arif Setyanto, Universitas AMIKOM Yogyakarta

Informatics, Universitas AMIKOM Yogyakarta
Bruno da Silva, Vrije Universiteit Brussels

Industrial engineering, Vrije Universiteit Brussels
An Braeken, Vrije Universiteit Brussels

Industrial engineering, Vrije Universiteit Brussels

References

T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions,” Nov. 02, 2020, arXiv: arXiv:2005.14623. Accessed: Sep. 25, 2024. [Online]. Available: http://arxiv.org/abs/2005.14623

S. Dröge et al., “Listening to a changing landscape: Acoustic indices reflect bird species richness and plot-scale vegetation structure across different land-use types in north-eastern Madagascar,” Ecol. Indic., vol. 120, p. 106929, 2021, Accessed: Sep. 25, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1470160X20308682 DOI: https://doi.org/10.1016/j.ecolind.2020.106929

Z. Zhang, Y. Yu, L. Chen, and R. Chen, “Hybrid Indoor Positioning System Based on Acoustic Ranging and Wi-Fi Fingerprinting under NLOS Environments,” Remote Sens., vol. 15, no. 14, p. 3520, 2023. DOI: https://doi.org/10.3390/rs15143520

S. Ramesh, T. Pathier, and J. Han, “Sounduav: Towards delivery drone authentication via acoustic noise fingerprinting,” presented at the Proceedings of the 5th Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, 2019, pp. 27–32. DOI: https://doi.org/10.1145/3325421.3329768

B. Thoen, S. Wielandt, and L. De Strycker, “Fingerprinting Method for Acoustic Localization Using Low-Profile Microphone Arrays,” presented at the 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN), IEEE, 2018, pp. 1–7. DOI: https://doi.org/10.1109/IPIN.2018.8533866

A. Berdich, B. Groza, R. Mayrhofer, E. Levy, A. Shabtai, and Y. Elovici, “Sweep-to-unlock: Fingerprinting smartphones based on loudspeaker roll-off characteristics,” IEEE Trans. Mob. Comput., vol. 22, no. 4, pp. 2417–2434, 2021. DOI: https://doi.org/10.1109/TMC.2021.3119987

I. Algredo-Badillo, B. Sánchez-Juárez, K. A. Ramírez-Gutiérrez, C. Feregrino-Uribe, F. López-Huerta, and J. J. Estrada-López, “Analysis and hardware architecture on fpga of a robust audio fingerprinting method using ssm,” Technologies, vol. 10, no. 4, p. 86, 2022. DOI: https://doi.org/10.3390/technologies10040086

S. Zhidkov, A. Sychev, A. Zhidkov, and A. Petrov, “On smartphone power consumption in acoustic environment monitoring applications,” Appl. Syst. Innov., vol. 1, no. 1, p. 8, 2018. DOI: https://doi.org/10.3390/asi1010008

H. Nam, S.-H. Kim, and Y.-H. Park, “Filteraugment: An acoustic environmental data augmentation method,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 4308–4312. Accessed: Sep. 25, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9747680/

D. Li, S. Cao, S. I. Lee, and J. Xiong, “Experience: practical problems for acoustic sensing,” in Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, Sydney NSW Australia: ACM, Oct. 2022, pp. 381–390. doi: 10.1145/3495243.3560527. DOI: https://doi.org/10.1145/3495243.3560527

G. W. Lyons, C. R. Hart, and R. Raspet, “As the wind blows: Turbulent noise on outdoor microphones,” Acoust. Today, vol. 17, no. 4, p. 20, 2021, Accessed: Sep. 25, 2024. [Online]. Available: https://acousticstoday.org/wp-content/uploads/2021/11/As-the-Wind-Blows-Turbulent-Noise-on-Outdoor-Microphones-Gregory-W.-Lyons-Carl-R.-Hart-and-Richard-Raspet.pdf DOI: https://doi.org/10.1121/AT.2021.17.4.20

W. Lambert, L. A. Cobus, T. Frappart, M. Fink, and A. Aubry, “Distortion matrix approach for ultrasound imaging of random scattering media,” Proc. Natl. Acad. Sci., vol. 117, no. 26, pp. 14645–14656, Jun. 2020, doi: 10.1073/pnas.1921533117. DOI: https://doi.org/10.1073/pnas.1921533117

A. Novak and P. Honzík, “Measurement of nonlinear distortion of MEMS microphones,” Appl. Acoust., vol. 175, p. 107802, 2021, Accessed: Sep. 25, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0003682X20309075 DOI: https://doi.org/10.1016/j.apacoust.2020.107802

N. Kamuni, S. Chintala, N. Kunchakuri, J. S. A. Narasimharaju, and V. Kumar, “Advancing Audio Fingerprinting Accuracy with AI and ML: Addressing Background Noise and Distortion Challenges,” in 2024 IEEE 18th International Conference on Semantic Computing (ICSC), IEEE, 2024, pp. 341–345. Accessed: Sep. 25, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10475656/ DOI: https://doi.org/10.1109/ICSC59802.2024.00064

D. Grant, I. McLane, V. Rennoll, and J. West, “Considerations and Challenges for Real-World Deployment of an Acoustic-Based COVID-19 Screening System,” Sensors, vol. 22, no. 23, p. 9530, 2022. DOI: https://doi.org/10.3390/s22239530

Z. Aldeneh and E. M. Provost, “You’re Not You When You’re Angry: Robust Emotion Features Emerge by Recognizing Speakers,” IEEE Trans. Affect. Comput., vol. 14, no. 2, pp. 1351–1362, 2021. DOI: https://doi.org/10.1109/TAFFC.2021.3086050

C. Abrahams, C. Desjonquères, and J. Greenhalgh, “Pond Acoustic Sampling Scheme: A draft protocol for rapid acoustic data collection in small waterbodies,” Ecol. Evol., vol. 11, no. 12, pp. 7532–7543, Jun. 2021, doi: 10.1002/ece3.7585. DOI: https://doi.org/10.1002/ece3.7585

D. Shi, W.-S. Gan, B. Lam, and S. Wen, “Feedforward selective fixed-filter active noise control: Algorithm and implementation,” IEEEACM Trans. Audio Speech Lang. Process., vol. 28, pp. 1479–1492, 2020, Accessed: Sep. 25, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9086868/

A. Li, W. Liu, C. Zheng, and X. Li, “Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 6487–6491. Accessed: Sep. 25, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9746432/ DOI: https://doi.org/10.1109/ICASSP43922.2022.9746432

G. Huang, J. Benesty, I. Cohen, and J. Chen, “A simple theory and new method of differential beamforming with uniform linear microphone arrays,” IEEEACM Trans. Audio Speech Lang. Process., vol. 28, pp. 1079–1093, 2020, Accessed: Sep. 25, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9037110/ DOI: https://doi.org/10.1109/TASLP.2020.2980989

S. Cortesi, C. Vogt, E. Reinschmidt, and M. Magno, “Latency and Power Consumption in 2.4GHz IoT Wireless Mesh Nodes: An Experimental Evaluation of Bluetooth Mesh and Wirepas Mesh,” 2023 19th Int. Conf. Wirel. Mob. Comput. Netw. Commun. WiMob, pp. 200–205, 2023, doi: 10.1109/WiMob58348.2023.10187799. DOI: https://doi.org/10.1109/WiMob58348.2023.10187799

D. Uchida, Y. Yonezawa, and K. Akita, “Measurement-Based Latency Evaluation and the Theoretical Analysis for Massive IoT Applications Using Bluetooth Low Energy,” 2023 IEEE 97th Veh. Technol. Conf. VTC2023-Spring, pp. 1–5, 2023, doi: 10.1109/VTC2023-Spring57618.2023.10200332. DOI: https://doi.org/10.1109/VTC2023-Spring57618.2023.10200332

C. Gomez, J. Oller, and J. Aspas, “Overview and Evaluation of Bluetooth Low Energy: An Emerging Low-Power Wireless Technology,” Sensors, vol. 12, pp. 11734–11753, 2012, doi: 10.3390/s120911734. DOI: https://doi.org/10.3390/s120911734

Y. Bai, L. Lu, J. Cheng, J. Liu, Y. Chen, and J. Yu, “Acoustic-based sensing and applications: A survey,” Comput. Netw., vol. 181, p. 107447, 2020. DOI: https://doi.org/10.1016/j.comnet.2020.107447

S. S. Sethi et al., “Characterizing soundscapes across diverse ecosystems using a universal acoustic feature set,” Proc. Natl. Acad. Sci., vol. 117, no. 29, pp. 17049–17055, Jul. 2020, doi: 10.1073/pnas.2004702117. DOI: https://doi.org/10.1073/pnas.2004702117

S. R.-J. Ross, N. R. Friedman, M. Yoshimura, T. Yoshida, I. Donohue, and E. P. Economo, “Utility of acoustic indices for ecological monitoring in complex sonic environments,” Ecol. Indic., vol. 121, p. 107114, 2021, Accessed: Aug. 15, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1470160X20310530 DOI: https://doi.org/10.1016/j.ecolind.2020.107114

H. Nam, S.-H. Kim, and Y.-H. Park, “Filteraugment: An acoustic environmental data augmentation method,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 4308–4312. Accessed: Aug. 15, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9747680/ DOI: https://doi.org/10.1109/ICASSP43922.2022.9747680

Y. Diao, Y. Zhang, G. Zhao, and M. Khamis, “Drone authentication via acoustic fingerprint,” in Proceedings of the 38th Annual Computer Security Applications Conference, 2022, pp. 658–668. DOI: https://doi.org/10.1145/3564625.3564653

S. Chamishka et al., “A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling,” Multimed. Tools Appl., vol. 81, no. 24, pp. 35173–35194, Oct. 2022, doi: 10.1007/s11042-022-13363-4. DOI: https://doi.org/10.1007/s11042-022-13363-4

X. Xu et al., “Survey on discriminative feature selection for speech emotion recognition,” presented at the The 9th International Symposium on Chinese Spoken Language Processing, IEEE, 2014, pp. 345–349. DOI: https://doi.org/10.1109/ISCSLP.2014.6936641

A. Bénard, T. Lengagne, and C. Bonenfant, “A biologically realistic model to predict wildlife-vehicle collision risks,” bioRxiv, pp. 2023–02, 2023. DOI: https://doi.org/10.1101/2023.02.15.528614

M. Malik, M. K. Malik, K. Mehmood, and I. Makhdoom, “Automatic speech recognition: a survey,” Multimed. Tools Appl., vol. 80, no. 6, pp. 9411–9457, Mar. 2021, doi: 10.1007/s11042-020-10073-7. DOI: https://doi.org/10.1007/s11042-020-10073-7

J. L. K. E. Fendji, D. C. M. Tala, B. O. Yenke, and M. Atemkeng, “Automatic Speech Recognition Using Limited Vocabulary: A Survey,” Appl. Artif. Intell., vol. 36, no. 1, p. 2095039, Dec. 2022, doi: 10.1080/08839514.2022.2095039. DOI: https://doi.org/10.1080/08839514.2022.2095039

Trundlefly, “Amazon Jungle Morning | Royalty-free Music.” Accessed: Sep. 10, 2024. [Online]. Available: https://pixabay.com/sound-effects/amazon-jungle-morning-24939/

Hitrison, “Restaurant Sounds (Sunny Point Cafe).” Accessed: Sep. 10, 2024. [Online]. Available: https://pixabay.com/sound-effects/restaurant-sounds-sunny-point-cafe-25092/

CSNmedia, “industrial sounds | Royalty-free Music.” Accessed: Sep. 10, 2024. [Online]. Available: https://pixabay.com/sound-effects/industrial-sounds-25817/

Aatreya_v, “Busy Street Ambience | Royalty-free Music.” Accessed: Sep. 10, 2024. [Online]. Available: https://pixabay.com/sound-effects/busy-street-ambience-195884/

D. Prabakaran and S. Sriuppili, “Speech processing: MFCC based feature extraction techniques-an investigation,” in Journal of Physics: Conference Series, IOP Publishing, 2021, p. 012009. Accessed: Aug. 14, 2024. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/1717/1/012009/meta DOI: https://doi.org/10.1088/1742-6596/1717/1/012009

D. Á. Villafuerte-Lucio, “MFCC feature extraction for COPD detection,” J. Technol. Innov., 2023, Accessed: Aug. 22, 2024. [Online]. Available: https://www.semanticscholar.org/paper/MFCC-feature-extraction-for-COPD-detection-Villafuerte-Lucio/b07f35f25dee7c44de22fef153b70189655e28e1

I. A. Thukroo, R. Bashir, and K. J. Giri, “A comparison of cepstral and spectral features using recurrent neural network for spoken language identification,” Comput. Artif. Intell., vol. 2, no. 1, Feb. 2024, doi: 10.59400/cai.v2i1.440. DOI: https://doi.org/10.59400/cai.v2i1.440

G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extraction methods,” Appl. Acoust., vol. 158, p. 107020, Jan. 2020, doi: 10.1016/j.apacoust.2019.107020. DOI: https://doi.org/10.1016/j.apacoust.2019.107020

S. Böck and G. Widmer, “Maximum filter vibrato suppression for onset detection,” in Proc. of the 16th Int. Conf. on Digital Audio Effects (DAFx). Maynooth, Ireland (Sept 2013), Citeseer, 2013, p. 4. Accessed: Nov. 14, 2024. [Online]. Available: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b9588a8840f9f195c03847d9e3a95ac63d2bc5f2

P. Grosche, M. Müller, and F. Kurth, “Cyclic tempogram—a mid-level tempo representation for musicsignals,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2010, pp. 5522–5525. Accessed: Nov. 14, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/5495219/ DOI: https://doi.org/10.1109/ICASSP.2010.5495219

M. J. Carey, E. S. Parris, and H. Lloyd-Thomas, “A comparison of features for speech, music discrimination,” in 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), Mar. 1999, pp. 149–152 vol.1. doi: 10.1109/ICASSP.1999.758084. DOI: https://doi.org/10.1109/ICASSP.1999.758084

K. El-Maleh, M. Klein, G. Petrucci, and P. Kabal, “Speech/music discrimination for multimedia applications,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), Jun. 2000, pp. 2445–2448 vol.4. doi: 10.1109/ICASSP.2000.859336. DOI: https://doi.org/10.1109/ICASSP.2000.859336

M. A. Aslam, M. Umer, M. Kashif, R. Talib, and U. Khalid, “Acoustic Classification using Deep Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 8, 2018, doi: 10.14569/IJACSA.2018.090820. DOI: https://doi.org/10.14569/IJACSA.2018.090820

R. Cohn, “Introduction to Neo-Riemannian Theory: A Survey and a Historical Perspective,” J. Music Theory, vol. 42, no. 2, pp. 167–180, 1998, doi: 10.2307/843871. DOI: https://doi.org/10.2307/843871

D. Weisburd, C. Britt, D. B. Wilson, and A. Wooditch, “Measuring Association for Scaled Data: Pearson’s Correlation Coefficient,” in Basic Statistics in Criminology and Criminal Justice, Cham: Springer International Publishing, 2020, pp. 479–530. doi: 10.1007/978-3-030-47967-1_14. DOI: https://doi.org/10.1007/978-3-030-47967-1_14

R. Chattamvelli, “Pearson’s Correlation,” in Correlation in Engineering and the Applied Sciences, in Synthesis Lectures on Mathematics & Statistics. , Cham: Springer Nature Switzerland, 2024, pp. 55–76. doi: 10.1007/978-3-031-51015-1_2. DOI: https://doi.org/10.1007/978-3-031-51015-1_2

D. Villafuerte-Lucio, “MFCC feature extraction for COPD detection,” J. Technol. Innov., pp. 1–7, Dec. 2023, doi: 10.35429/JTI.2023.27.10.1.7. DOI: https://doi.org/10.35429/JTI.2023.27.10.1.7

M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, and H. Fan, “Heart sound classification based on improved MFCC features and convolutional recurrent neural networks,” Neural Netw., vol. 130, pp. 22–32, 2020, Accessed: Aug. 14, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608020302306 DOI: https://doi.org/10.1016/j.neunet.2020.06.015

B. S. Soares, J. S. Luz, V. F. de Macêdo, R. R. V. e Silva, F. H. D. de Araújo, and D. M. V. Magalhães, “MFCC-based descriptor for bee queen presence detection,” Expert Syst. Appl., vol. 201, p. 117104, 2022, Accessed: Aug. 14, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417422005085 DOI: https://doi.org/10.1016/j.eswa.2022.117104

A. Mahmood and U. Köse, “Speech recognition based on convolutional neural networks and MFCC algorithm,” Adv. Artif. Intell. Res., vol. 1, no. 1, pp. 6–12, 2021, Accessed: Aug. 14, 2024. [Online]. Available: https://dergipark.org.tr/en/pub/aair/issue/59650/768432

E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Othmani, “MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech,” Biomed. Signal Process. Control, vol. 71, p. 103107, 2022, Accessed: Aug. 14, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1746809421007047 DOI: https://doi.org/10.1016/j.bspc.2021.103107

Y. Arpitha, G. L. Madhumathi, and N. Balaji, “Spectrogram analysis of ECG signal and classification efficiency using MFCC feature extraction technique,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 2, pp. 757–767, Feb. 2022, doi: 10.1007/s12652-021-02926-2. DOI: https://doi.org/10.1007/s12652-021-02926-2