Streamlining Code Reviews with AI
5 min read
Table of contents
Within the last couple of years, the number of AI tools has exploded thanks to the introduction of GPT 3.5. Software development is certainly amongst the crafts that were and continue to be heavily impacted by the advancements in machine learning and artificial intelligence. Tools like Microsoft Copilot and Gemini Code Assist help developers in writing code. With such generative coding tools, software development has become more and more accessible. Instead of learning how to write scripts for certain purposes in different programming languages, people can now prompt an AI tool and get the task done effectively. This has truly changed the way developers work. Thomas Dohmke, the CEO of GitHub, previously stated that "sooner than later, 80% of the code will be written by Copilot."
While these coding assistants are amazing in helping software developers write code faster, they do not effectively alleviate two big pains in the industry: addressing technical debt and debugging. The principal cost of technical debt, meaning the cost of refactoring software artifacts to reach a desired level of maintainability, has become a costly activity for companies. A report from CISQ concluded that an average developer at a company spends 33% of their time addressing technical debt. This has raised the estimated principal cost of technical debt to ~1.52 trillion USD in the US alone (2022). On top of that, the estimated cost of finding and fixing software defects (in other words: debugging) was $607B in the US in 2022.
The largest amounts of technical debt usually exist in large legacy codebases that may be millions of lines of code long and written in ancient programming languages like VB.NET. Due to their size, it is practically impossible to analyze them using solely LLM-based tools. There isn't a sufficient way for an LLM to dump a whole repository for analysis. Even if there was, LLMs would suffer from recency bias based on how the data is presented to them.
These limitations create the need for different AI models to make the technology suitable for code analysis. After some searching I came across a promising tool: Metabob. Metabob is a static code analysis tool that tackles the limitations that LLMs have by connecting them with graph neural networks (GNNs) to perform the analysis. The GNN allows Metabob to analyze the whole repository, pinpoint code sections that have defects in them or can be refactored to reach better performance and retrieve additional contextual information about the analyzed code. Its technology is good at understanding the use case of the analyzed code. Metabob then utilizes an LLM to explain the pinpointed problems and to generate fixes for them.
Introduction to Metabob
Let's unpack what Metabob is about. The tool's vision is to reduce time spent debugging and refactoring code. Now, this sounds interesting but what's the difference between them and traditional linters? The difference is the types of errors the AI can detect. Their AI doesn't focus on syntactical errors, but more on detecting errors that are likely to occur in the runtime of the analyzed code. It detects errors like memory leaks, race conditions, and unhandled edge cases, to name a few. Additionally, it can detect refactoring opportunities, such as improvements for code modularity, duplication reductions, etc. Last but not least, Metabob can holistically analyze the complete codebase and examine the relationships between different parts within the codebase. In the screenshot below I've included a detection from C++ code where Metabob detected a lack in error handling for resource management. I also included a Python example where it detected an opportunity to improve code modularity (note: I planted non-modular code to this file on purpose to test the tool's capability).
So why is Metabob able to detect problems like this and what advantages does the utilization of GNNs provide? When Metabob’s GNN analyzes code, it can extract a lot of different information in addition to the problematic code segment. This information includes source code deltas and rectification instructions, internal code documentation and project descriptors, information about libraries and frameworks that are used in the code, and the normalized root cause analysis results.
COOL! What then? Well, next this information is stored in a vector database. When it comes to generating the responses to the detected problems using the integrated LLM, Metabob conducts retrieval augmented generation (RAG) by accurately retrieving relevant information to the detected problem from a vector database. This enhances the LLM's contextual understanding of the according code snippet and enables it to generate a useful answer. The GNN-RAG connection is a fairly new and fascinating area of development.
Next, let's take a look at the tool in action. You can view a quick demo where I use the tool here. In the first one, I'm using the company's VS Code extension, which presents results for the file I am currently viewing but it still extracts useful information from other parts of the repository. You can download the VS Code extension for yourself here.
To summarize, what I haven't seen from other tools is that Metabob excels in predicting runtime errors. Right on my VS Code, it detected areas with potential memory leaks, race conditions, and unhandled edge cases, but it also detected logical problems such as risks for infinite recursive loops.
As Metabob works right on my editor, the user experience is seamless. It highlights problematic areas and describes the problems in a way that is easy to understand. Afterward, I can request it to generate a code recommendation to fix the detected problems. All in all, I use the tool as a feedback loop in my own development workflow through VS Code. For organizational use, I see a lot of benefits in using Metabob, ranging from the AI's ability to analyze a large codebase and help address technical debt and the integration for continuous analysis. Among others, the tool ensures that developers get code reviews done before they pass their code forward in their team's pipeline.
Shoutout to Metabob for collaborating with me on this blog.