GNU와 AI 재구현

hackernews | | 🔬 연구
#ai #ai 서비스 #gnu #모델 재구현 #오픈 소스 #review #리뷰 #소프트웨어 #재구현
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

과거 80~90년대 리처드 스톨만의 GNU 프로젝트와 리누스 토르발스의 리눅스 커널 개발 사례에서 볼 수 있듯, 기존 소프트웨어의 아이디어와 기능을 참고해 코드를 새로 작성하는 '재구현'은 저작권법상 보호되는 표현(소스 코드)을 그대로 복제하지 않는 한 합법적으로 인정되어 왔다. 최근 인공지능(AI)을 활용한 기존 오픈소스 프로젝트 재구현에 대해 공정성을 비판하는 목소리가 높아지고 있지만, 이는 역사적 맥락을 무시한 이중잣대일 수 있다. 과거의 수작업 재구현이 기술 발전과 오픈소스 생태계에 기여했듯, 핵심은 AI가 단순 변형이나 기계적 번역이 아닌 독자적인 코드를 생성하는 데 있으며 이는 여전히 법적 테두리 안에 속한다. 결국 AI 시대로 접어들며 달라진 것은 저작권의 본질적 한계가 아니라, 재구현을 처리하는 압도적인 '속도'에 불과하다.

본문

Those who cannot remember the past are condemned to repeat it. A sentence that I never really liked, and what is happening with AI, about software projects reimplementations, shows all the limits of such an idea. Many people are protesting the fairness of rewriting existing projects using AI. But, a good portion of such people, during the 90s, were already in the field: they followed the final part (started in the ‘80s) of the deeds of Richard Stallman, when he and his followers were reimplementing the UNIX userspace for the GNU project. The same people that now are against AI rewrites, back then, cheered for the GNU project actions (rightly, from my point of view – I cheered too). Stallman is not just a programming genius, he is also the kind of person that has a broad vision across disciplines, and among other things he was well versed in the copyright nuances. He asked the other programmers to reimplement the UNIX userspace in a specific way. A way that would make each tool unique, recognizable, compared to the original copy. Either faster, or more feature rich, or scriptable; qualities that would serve two different goals: to make GNU Hurd better and, at the same time, to provide a protective layer against litigations. If somebody would claim that the GNU implementations were not limited to copying ideas and behaviours (which is legal), but “protected expressions” (that is, the source code verbatim), the added features and the deliberate push towards certain design directions would provide a counter argument that judges could understand. He also asked to always reimplement the behavior itself, avoiding watching the actual implementation, using specifications and the real world mechanic of the tool, as tested manually by executing it. Still, it is fair to guess that many of the people working at the GNU project likely were exposed or had access to the UNIX source code. When Linus reimplemented UNIX, writing the Linux kernel, the situation was somewhat more complicated, with an additional layer of indirection. He was exposed to UNIX just as a user, but, apparently, had no access to the source code of UNIX. On the other hand, he was massively exposed to the Minix source code (an implementation of UNIX, but using a microkernel), and to the book describing such implementation as well. But, in turn, when Tanenbaum wrote Minix, he did so after being massively exposed to the UNIX source code. So, SCO (during the IBM litigation) had a hard time trying to claim that Linux contained any protected expressions. Yet, when Linus used Minix as an inspiration, not only was he very familiar with something (Minix) implemented with knowledge of the UNIX code, but (more interestingly) the license of Minix was restrictive, it became open source only in 2000. Still, even in such a setup, Tanenbaum protested about the architecture (in the famous exchange), not about copyright infringement. So, we could reasonably assume Tanenbaum considered rewrites fair, even if Linus was exposed to Minix (and having himself followed a similar process when writing Minix). # What the copyright law really says To put all this in the right context, let’s zoom in on the copyright's actual perimeters: the law says you must not copy “protected expressions”. In the case of the software, a protected expression is the code as it is, with the same structure, variables, functions, exact mechanics of how specific things are done, unless they are known algorithms (standard quicksort or a binary search can be implemented in a very similar way and they will not be a violation). The problem is when the business logic of the programs matches perfectly, almost line by line, the original implementation. Otherwise, the copy is lawful and must not obey the original license, as long as it is pretty clear that the code is doing something similar but with code that is not cut & pasted or mechanically translated to some other language, or aesthetically modified just to look a bit different (look: this is exactly the kind of bad-faith maneuver a court will try to identify). I have the feeling that every competent programmer reading this post perfectly knows what a *reimplementation* is and how it looks. There will be inevitable similarities, but the code will be clearly not copied. If this is the legal setup, why do people care about clean room implementations? Well, the reality is: it is just an optimization in case of litigation, it makes it simpler to win in court, but being exposed to the original source code of some program, if the exposition is only used to gain knowledge about the ideas and behavior, is fine. Besides, we are all happy to have Linux today, and the GNU user space, together with many other open source projects that followed a similar path. I believe rules must be applied both when we agree with their ends, and when we don’t. # AI enters the scene So, reimplementations were always possible. What changes, now, is the fact they are brutally faster an

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →