BU Computer Systems Seminar

  • Starts: 12:00 pm on Thursday, April 18, 2024
  • Ends: 1:00 pm on Thursday, April 18, 2024

Speaker: Saad Ullah, Ph.D. Candidate, Boston University

Talk Title: “LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks”

Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our framework. Our evaluation shows LLMs provide non-deterministic responses, incorrect and unfaithful reasoning, and perform poorly in real-world scenarios. Most importantly, our findings reveal significant non-robustness in even the most advanced models like `PaLM2′ and `GPT-4′: by merely changing function or variable names, or by the addition of library functions in the source code, these models can yield incorrect answers in 26% and 17% of cases, respectively. These findings demonstrate that further LLM advances are needed before LLMs can be used as general purpose security assistants.

Bio: Saad Ullah is a PhD Candidate in the Electrical and Computer Engineering Department at Boston University, under the guidance of Dr. Gianluca Stringhini. His research focuses on Cybersecurity and Generative AI, with a focus on large language models (LLMs). His aim is to harness LLMs to mimic the reasoning of human experts for efficient handling of programming languages and security tasks. Notably, he has developed SecLLMHolmes, the first automated evaluation framework for assessing the efficiency and reasoning capabilities of LLMs in code security tasks. This tool has also helped identify several critical issues in LLMs, thus enhancing the reliability and trustworthiness of AI-driven programming tools. His ongoing work seeks to create data and resource-efficient solutions that address key challenges in LLMs, such as non-robustness, unfaithful reasoning, and difficulties in analysis of complex code, to improve their applicability in security-centric tasks.

Speaker: Ross Mikulskis

Talk Title: “OPE Gradescope Bridge”

Abstract: Professors use Gradescope, an academic SaaS, to handle autograding student code assignments; however, the suite of kernels available for running autograder containers is limited. This poses a problem when autograder tests rely on packages that are incompatible with the kernel, and in many cases student submissions may receive lower scores due to test failure. The OPE (Open Education) Bridge uses a lightweight container on Gradescope to invoke a scalable OpenShift service to grade the student submission, and this runs on the same cluster kernel on which the autograder tests are developed. This mode of grading is especially relevant to the Red Hat OPE framework, which provisions standardized cloud containers for students to code their assignments in since everything is centralized on one cluster. Upon completion of grading the student submission, the service sends back the results as JSON, and the Gradescope container publishes it for the student to view. The OPE Bridge allows professors to circumvent grading inaccuracies due to kernel incompatibilities to ensure the most accurate evaluation of student code submissions. This service is currently being used in Orran Krieger’s EC 440 Operating Systems class.

Bio: Ross Mikulskis is a 3rd year BA/MS computer science student at Boston University. He has developed testing infrastructure for the Red Hat OPE project with Isaiah Stapleton and is currently using the OPE framework to develop his own free online computer science educational nonprofit, Bits of CS Inc, which has a corresponding free online textbook and has received some funding from Boston University. Ross is very interested in the democratization of education and has relevant experience in teaching, having been a computer science TA for four semesters in CS 330 Algorithms and CS 131 Combinatorics. He hopes to develop solutions for sharing education in an open source, transparent, and collaborative manner to empower underserved communities and incorporate voices from all perspectives.

665 Commonwealth Ave, Room 1101 (11th floor)

Back to Calendar