This extension and a back-end service suggest code to developers right in their editors. It supports integrated development environments (IDE) such as Microsoft’s Visual Studio Code, Neovim, and JetBrains. Within these, the AI suggests the next line of code as developers type. The program can suggest complete methods and complex algorithms alongside boilerplate code and assistance with unit testing. For all intents and purposes, the back engine AI acts as a pair-programming assistant. Developers are free to accept, reject or edit Copilot’s suggestions. If you’re a new programmer, Copilot can interpret simple natural language commands and translate them into one of a dozen programming languages. These include Python, JavaScript, TypeScript, Ruby, and Go. Microsoft, GitHub, and OpenAI collaborated to build the program. It’s based on OpenAI’s Codex. The Codex was trained on billions of publicly available source code lines – including code in public repositories on GitHub – and on natural language, which means it can understand both programming and human languages. It sounds like a dream come true, doesn’t it? There’s a rather large fly in the soup, though. There are legal questions about whether Codex had the right to use the open source code to provide the foundation of a proprietary service. And, even if it is legal, can Microsoft, OpenAI, and GitHub, and thus Copilot’s users, ethically use the code it “writes?” According to Nat Friedman, GitHub’s CEO when Copilot was released in beta, GitHub is legally in the clear because “training ML systems on public data is fair use.” But, he also noted, “IP [intellectual property] and AI will be an interesting policy discussion around the world in the coming years.” You can say that again. Others venomously disagree. The Software Freedom Conservancy (SFC), a non-profit organization that provides legal services for open source software projects, holds the position that OpenAI was trained exclusively with GitHub-hosted projects. And many of these have been licensed under copyleft licenses. Therefore, as Bradley M. Kuhn., the SFC’s Policy Fellow and Hacker-in-Residence, stated, “Most of those projects are not in the ‘public domain,’ they are licensed under Free and Open Source Software (FOSS) licenses. These licenses have requirements including proper author attribution and, in the case of copyleft licenses, they sometimes require that works based on and/or that incorporate the software be licensed under the same copyleft license as the prior work. Microsoft and GitHub have been ignoring these license requirements for more than a year.” Therefore, the SFC bites the bullet and urges developers not only to avoid using Copilot but to stop using GitHub completely. They know that won’t be easy. Thanks to Microsoft and GitHub’s “effective marketing, GitHub has convinced Free and Open Source Software (FOSS) developers that GitHub is the best (and even the only) place for FOSS development. However, as a proprietary, trade-secret tool, GitHub itself is the very opposite of FOSS,” added Kuhn. Other people land between these extremes. For example, Stefano Maffulli, executive director of the Open Source Initiative (OSI), the organization that oversees open source licenses, understands “why so many open source developers are upset: They have made their source code available for the progress of computer science and humanity. Now that code is being used to train machines to create more code – something the original developers never envisioned nor intended. I can see how it’s infuriating for some.” That said, Maffulli thinks, “Legally, it appears that GitHub is within its rights.” However, it’s not worth getting “lost in the legal weeds discussing if there is an open source license issue here or a copyright issue. This would miss the wider point. Clearly, there is a fairness issue that affects the whole of society, not just open source developers.” Maffulli argues: We can’t expect organizations to use AI in the future with “goodwill” and “good faith,” so it’s time for a broader conversation about AI’s impact on society and on open source. That’s an excellent point. Copilot is the tip of an iceberg of a much larger issue. The OSI won’t be ignoring it. The organization has been working for several months on building a virtual event called Deep Dive: AI. This, the OSI hopes, will launch a conversation about the legal and ethical implications of AI and what’s acceptable for AI systems to be “open source”. It comprises a podcast series, which will launch soon, and a virtual conference, which will be held in October 2022. Focusing more on the legal elements, well-known open-source lawyer and OSS Capital General Partner Heather Meeker believes Copilot is legally in the clear. Meeker noted that even the Freedom Software Foundation (FSF) doesn’t claim that what Copilot does is copyright infringement. As John A. Rothchild, Professor of Law at Wayne State University, and Daniel H. Rothchild, Ph.D. candidate at the University of California at Berkeley, said in their FSF paper, “The use of Copilot’s output by its developer-customers is likely, not infringing.” That, however, “does not absolve GitHub of wrongdoing, but rather argues that Copilot and its developer-customers likely do not infringe developers’ copyrights.” Instead, the FSF argues that Copilot is immoral because it is a Software as a Service (SaaS). Open source legal expert and Columbia law professor Eben Moglen thinks Copilot doesn’t face serious legal problems, but GitHub and OpenAI do need to answer some concerns. That’s because, Moglen said, “like photocopiers, or scissors and paste, code recommendation programs can result in copyright infringement. Therefore, parties offering such recommendation services should proceed in a license-aware fashion so that users incorporating recommended code in their projects will be informed in a granular fashion of any license restrictions on recommended code. Ideally, users should have the ability to filter recommendations automatically to avoid the unintentional incorporation of code with conflicting or undesired license terms.” At this time, Copilot doesn’t do this. Therefore, since many “free software programmers are uncomfortable with code they have contributed to free software projects being incorporated in a GitHub code database through which it is distributed as snippets by the Copilot recommendation engine at a price,” said Moglen. GitHub should provide “a simple, persistent way to sequester their code from Copilot.” If GitHub doesn’t, they’ve given programmers a reason to move their projects elsewhere, as the SFC is suggesting. Therefore, Moglen expects GitHub to offer a way to protect concerned developers from having their code vacuumed into the OpenAI Codex. So, what happens now? Eventually, the courts will decide. Besides open source and copyright issues, there are still larger legal issues over the use of “public” data by private AI services. As Maffulli said, “We need to better understand the needs of all actors affected by AI in order to establish a new framework that will embed the value of open source into AI, providing the guardrails for collaboration and fair competition to happen at all levels of society.” Finally, it should be noted that GitHub isn’t the only company using AI to help programmers. Google’s DeepMind has its own AI developer system AlphaCode, Salesforce has CodeT5, and there’s also the open-source PolyCoder. In short, Copilot isn’t the only AI coder. The issue of how AI fits into programming, open-source, and copyright is much bigger than the simplistic “Microsoft is bad for open source!” Related Stories:
GitHub Copilot, Microsoft’s AI pair-programming service, is generally availableSorry, developers: Microsoft’s new tool fixes the bugs in software code written by AIMicrosoft brings AI to GitHub to create a smart programming Copilot tool