The State of AI in Large Scale Automated Refactoring
Tuesday, January 7th, 2025: 6:30 PM to 8:30 PM
LLMs are data-hungry, and when it comes to source code, more than the text is needed to make large-scale inferences about a codebase.
Moderne, Miami
Code has a unique structure, strict grammar, dependencies, and type information that a compiler must deterministically resolve. This information could be beneficial for AI but is invisible to the text of the source code.
For example, try to answer even a simple question about where Guava is used or where a particular logging library is used. While developers can find references in the code, the code-as-text may not have a reference to the library you are looking for. Imagine a logger instance inherited as a protected field from a base class defined in a binary dependency. The import statement that identifies which logging library that logger is coming from is IN the binary dependency, not in the text of the call site. A human would do no better in this situation.
Hosted by Eugenio Alvarez