Code2Vec: Learning code representations

Code2Vec is a neural model for representing snippets of code as fixed-length continuous vectors (code embeddings) that encode some semantic similarities , which enables the application of neural techniques to a wide-range of programming-languages tasks. Embeddings can be applied to performance measurement of program execution in CPU and smarter code completion and finding similar functions in analyzed code. This project analyzed semantic similarities of learned code embeddings parsed from open source python libraries such as numpy, pandas and sklearn. Still in progress is another analysis that learns code embeddings in a supervised manner with the C++ codebase for performance measurement of program execution in CPU with performance counters (e.g. LLC misses to L1 requests, Cycles Per Instruction).

Additional Information

For more information on this project and the unique partnership that produced it, please visit Red Hat Research.