An Evaluation of Word Embeddings on Vulnerability Prediction with Software Metrics
Keywords:
vulnerability prediction, word embeddings, empirical studyAbstract
CONTEXT: Software vulnerability is a crucial risk for a digital world. Developers dedicate enormous effort to removing vulnerable code from their software products. Vulnerability prediction aims to spot which modules are more vulnerable using software metrics. Recent studies conducted empirical experiments using textual information and software metrics. The result showed that the textual information did not help improve the predictive performance. However, their evaluations only considered Bag-of-Words (BoW) as textual information, and semantic relations among words have never been examined. OBJECTIVE: To examine the performance of vulnerability prediction with textual information considering semantic relations. Word2Vec was employed for capturing semantic relations. METHOD: A comparative study among BoW and two Word2Vec embeddings was conducted. For easy evaluation, we replicated a recent study that employed BoW. The Word2Vec embeddings were obtained from pre-trained models based on Google News and Stack Overflow. The former used large but non-SE-related texts, while the latter used small but SE-related texts. RESULTS: The non-SE Word2Vec improved vulnerability prediction in term of prediction stability. The SE-specific Word2Vec was less effective. CONCLUSION: Practitioners should consider textual information with non-SE Word2Vec for better vulnerability prediction.
References
F. Lomio, E. Iannone, A. De Lucia, F. Palomba, and V. Lenarduzzi, “Just-in-time software vulnerability detection: Are we there yet?” The Journal of Systems & Software, vol. 188, p. 111283, 2022.
T. Mikolov, K. Chen, G. Cornado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” in Proc. of Workshop at the International Conference on Learning Representations, 2013.
V. Efstathiou, C. Chatzilenas, and D. Spinellis, “Word embeddings for the software engineering domain,” in Proc. of Working Conference on Mining Software Repositories, ser. MSR ’18. ACM, 2018, p. 38–41.
H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar, “Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York, NY, USA: ACM, 2015, p. 426–437.
S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” IEEE Transactions on Software Engineering, vol. 48, no. 09, pp. 3280–3296, 2022
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Sousuke Amasaki, Tomoyuki Yokogawa, Aman Hirohisa
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License Terms:
Except where otherwise noted, content on this website is lincesed under a Creative Commons Attribution Non-Commercial License (CC BY NC)
Use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes, is permitted.
Copyright to any article published by WiPiEC retained by the author(s). Authors grant WiPiEC Journal a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as it is not used for commercial purposes and its original authors, citation details, and publisher are identified, in accordance with CC BY NC license. Fore more information on license terms, click here.