Source Code Authorship Attribution using File Embeddings
The problem of source code authorship attribution is crucial for a few reasons. Security and legal issues are the most popular ones. However, this domain could also help to understand the nature of the personal code style. This type of information could be used, for instance, by IDEs to improve the developer's experience of writing the code.
The goal of this study is to construct an interpretable model for source code embeddings generation. Such embeddings should represent the correspondence between the source code and its author.