ECE PhD Prospectus Defense: Şevval Şimşek
- Starts: 1:00 pm on Tuesday, October 29, 2024
- Ends: 3:00 pm on Tuesday, October 29, 2024
ECE PhD Prospectus Defense: Şevval Şimşek
Title: Improving the Quality of Vulnerability Databases using Knowledge Graphs and LLMs
Presenter: Şevval Şimşek
Advisor: Professor David Starobinski
Chair: Ari Trachtenberg
Committee: Professor David Starobinski, Professor Gianluca Stringhini, Professor Ari Trachtenberg and Professor Alex Olshevsky
Google Scholar Profile: https://scholar.google.com/citations?user=y3Fa4gcAAAAJ&hl=en&oi=ao
Abstract: The National Vulnerability Database (NVD), established by NIST, enhances CVE data with root cause CWE mappings and affected platforms (CPE), serving as a vital resource for security analysis tools and various vulnerability databases. Correctness of these vulnerability databases is crucial for open-source developers to ensure software security, influencing vulnerability scoring, patch management, and threat intelligence.
In our first research thrust, we identify NVD's limitations through a longitudinal study. Our findings indicate that 15% to 30% of CVEs lack proper CWE mapping, and nearly 40% of updates to CVE-CWE mappings are non-informative. We also discovered that even recent CVE records contain deprecated or incorrect CPE mappings, which compromise data integrity. Thus, there is an urgent need to correct these mappings in the NVD and other databases.
To address these issues, we developed a threat knowledge graph that represents the relationships between CVE, CWE, and CPE, and we trained embedding models for predictive tasks. Our results are promising as a first step in automating the CVE-CWE mappings, as well as pinpointing the erroneous mappings in the database.
For future work, we propose enhancing our knowledge graph with Large Language Models (LLMs) using a Retrieval-Augmented Generation (RAG) approach to improve automated mappings while minimizing inaccuracies. Additionally, we plan to create a modular knowledge graph integrating Software Composition Analysis (SCA) tool outputs and Software Bill of Materials (SBOM) information to uncover vulnerabilities beyond what these tools typically report, bridging gaps in transitive dependency detection. This approach promises to enrich our understanding of vulnerabilities and their sources.
- Location:
- PHO 339