# codesearch.ai **Repository Path**: mirrors_sourcegraph/codesearch.ai ## Basic Information - **Project Name**: codesearch.ai - **Description**: codesearch.ai semantic code search engine - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-07-06 - **Last Updated**: 2026-04-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # codesearch.ai [codesearch.ai](https://codesearch.ai) is a semantic code search engine. It allows searching GitHub functions and StackOverflow answers using natural language queries. It uses HuggingFace Transformers under the hood, and the training procedure is inspired by a paper called [Text and Code Embeddings by Contrastive Pre-Training](https://arxiv.org/pdf/2201.10005.pdf) from OpenAI. The [CodeSearchNet project](https://github.com/github/CodeSearchNet) served as a basis for data collection and cleaning. The project is split into two sub-projects: data collection and model training. The `codesearch-ai-data` folder corresponds to the data collection part written in Go. And the `codesearch_ai_ml` folder corresponds to the model training part written in Python. ## Requirements - Go >= 1.18 - Python >= 3.7 - CUDA (for GPU model training) - Postgres ## Code walkthrough We prepared a detailed code walkthrough in the form of a [Sourcegraph Notebook](https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTM0Mw==).