Farukh Kitchlew | Nov 29, 2023 | 0
GitHub Copilot Detects Matching Public Code Suggestions
Last year, GitHub introduced GitHub Copilot, an AI-powered pair programmer that suggests code lines while you type. It offers code snippets, methods, tests, and algorithms. However, some developers wondered if the AI-generated code matched existing public code. In November, GitHub Copilot added a feature enabling developers to block suggestions that matched public code exceeding 150 characters.
GitHub has now introduced a private beta for an improved code referencing feature in GitHub Copilot. This new filter identifies and presents the context for code suggestions resembling public code on GitHub. With this enhancement, Copilot checks for suggestions against surrounding code (around 150 characters) and cross-references it with public code on GitHub.com. When a match is detected, Copilot provides the following:
- The corresponding code
- Repositories featuring the code
- Licensing information for each repository
GitHub Copilots Refinements and Considerations
Microsoft and GitHub, with many business users of Copilot, utilized the initial blocking tool, as stated by GitHub’s CEO, Thomas Dohmke. However, he emphasized that this tool had limitations and lacked precision.
“It gives you little control to decide for yourself whether you want to take that code and attribute it back to an open-source license. It doesn’t let you discover that there might be a library that you could use instead of synthesizing code. It prevents you from exploring these libraries and submitting pull requests. You might be reproducing everything that already exists in some open-source repo.”
Dohmke specifically emphasized this about basic computer algorithms like sorting, which exist in multiple places across various sources. Developers can now choose to reject, directly employ, or have Copilot alter the suggested code to distinguish it from the original.
The tool doesn’t solely present results based on specific licenses. The team is actively soliciting feedback to assess user interest in this feature.
“We’re letting people understand the match and then go on and explore or go and make the right decision,” Dohmke said. “I think it fills the gap that the original solution had.”
The code referencing feature becomes more active when dealing with limited context. Copilot offers fewer suggestions for matching code when it comprehends your ongoing code effectively. However, initially, it tends to propose similar codes more often.
At the core of this feature lies a rapid search engine (GitHub targets a low latency of 10-20ms). It swiftly identifies similar codes and its license. Currently, the tool lists matched code snippets based on search engine outcomes.
GitHub’s initial announcement hinted at forthcoming plans to enable developers to arrange this inventory by repository license and commit date. This functionality will likely be incorporated later.