An Evaluation of Word Embeddings on Vulnerability Prediction with Software Metrics

Sousuke Amasaki; Tomoyuki Yokogawa; Aman Hirohisa

An Evaluation of Word Embeddings on Vulnerability Prediction with Software Metrics

Authors

Sousuke Amasaki Department of Systems Engineering, Okayama Prefectural University
Tomoyuki Yokogawa Department of Systems Engineering, Okayama Prefectural University
Aman Hirohisa Center for Information Technology, Ehime University

Keywords:

vulnerability prediction, word embeddings, empirical study

Abstract

CONTEXT: Software vulnerability is a crucial risk for a digital world. Developers dedicate enormous effort to removing vulnerable code from their software products. Vulnerability prediction aims to spot which modules are more vulnerable using software metrics. Recent studies conducted empirical experiments using textual information and software metrics. The result showed that the textual information did not help improve the predictive performance. However, their evaluations only considered Bag-of-Words (BoW) as textual information, and semantic relations among words have never been examined. OBJECTIVE: To examine the performance of vulnerability prediction with textual information considering semantic relations. Word2Vec was employed for capturing semantic relations. METHOD: A comparative study among BoW and two Word2Vec embeddings was conducted. For easy evaluation, we replicated a recent study that employed BoW. The Word2Vec embeddings were obtained from pre-trained models based on Google News and Stack Overflow. The former used large but non-SE-related texts, while the latter used small but SE-related texts. RESULTS: The non-SE Word2Vec improved vulnerability prediction in term of prediction stability. The SE-specific Word2Vec was less effective. CONCLUSION: Practitioners should consider textual information with non-SE Word2Vec for better vulnerability prediction.

References

F. Lomio, E. Iannone, A. De Lucia, F. Palomba, and V. Lenarduzzi, “Just-in-time software vulnerability detection: Are we there yet?” The Journal of Systems & Software, vol. 188, p. 111283, 2022.

T. Mikolov, K. Chen, G. Cornado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” in Proc. of Workshop at the International Conference on Learning Representations, 2013.

V. Efstathiou, C. Chatzilenas, and D. Spinellis, “Word embeddings for the software engineering domain,” in Proc. of Working Conference on Mining Software Repositories, ser. MSR ’18. ACM, 2018, p. 38–41.

H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl, and Y. Acar, “Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York, NY, USA: ACM, 2015, p. 426–437.

S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” IEEE Transactions on Software Engineering, vol. 48, no. 09, pp. 3280–3296, 2022

Downloads

Published

2023-09-11

How to Cite

Amasaki, S., Yokogawa, T., & Hirohisa, A. (2023). An Evaluation of Word Embeddings on Vulnerability Prediction with Software Metrics. WiPiEC Journal - Works in Progress in Embedded Computing Journal, 9(2). Retrieved from https://wipiec.digitalheritage.me/index.php/wipiecjournal/article/view/43

Download Citation

Issue

Vol. 9 No. 2 (2023): WiPiEC Journal

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

License Terms:

Except where otherwise noted, content on this website is lincesed under a Creative Commons Attribution Non-Commercial License (CC BY NC)

Use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes, is permitted.

Copyright to any article published by WiPiEC retained by the author(s). Authors grant WiPiEC Journal a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as it is not used for commercial purposes and its original authors, citation details, and publisher are identified, in accordance with CC BY NC license. Fore more information on license terms, click here.

An Evaluation of Word Embeddings on Vulnerability Prediction with Software Metrics

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Announcements

MECO’2025, CPSIoT’2025 and SS-CPSIoT’2025 ANNOUNCED

Program of DSD & SEAA 2024 and WiP Proceedings Announced!

Volume 10 Issue 1 has been published!

Keywords