Compensation – The Line of Actual Control

The EU just passed a major AI law, China’s taking one stance on AI-generated art, and Japan’s trying to protect creators. Lawsuits abound! Here a quick round up from my readings this morning.

The battle is for compensation.

OpenAI says it’s impossible to train their amazing models without using books, art, music – basically everything created by humans. The market is forcing attribution in latent space.The need is for a mechanism for tracking and documenting contributions to latent representations of creative works.

EU parliament approves landmark AI law

Companies using generative AI or foundation AI models like OpenAI’s ChatGPT or Anthropic’s Claude 2 must provide detailed summaries of any copyrighted works, including music, used to train their systems. Training data sets used in generative AI music or audio-visual works must be watermarked for traceability by rights holders. Content generated by AI must be clearly labeled as such, and tech companies must prevent the generation of illegal and infringing content. Interestingly The utilization of copyrighted materials in constructing AI models might be construed as a necessary “temporary act of reproduction,” integral to a technological process.

China – Stable Diffusion China ruling

The Beijing Internet Court’s first instance judgment on AI-generated artwork from Stable Diffusion marked China’s first ruling on such creations, finding copyrightable elements. In contrast, the US Copyright Office in February 2023 deemed images generated by the AI drawing tool Midjourney ineligible for copyright, emphasizing human authorship.

France

The French government is currently investing 1.5 billion euros into AI, and has championed a so-called “open source” approach. The French government is currently investing 1.5 billion euros into AI, and has championed a so-called “open source” approach.

Japan

The Cultural Affairs Agency in Japan has launched a data collection initiative to track copyright infringement cases related to the development and use of generative artificial intelligence (AI). This effort aims to address concerns among creators about AI generating large volumes of texts and illustrations resembling their original works.

Data will be gathered through a legal consultation service website, where creators can seek advice from agency-appointed lawyers on copyright-related issues at no cost. Since 2018, revisions to the Copyright Law have allowed the use of copyrighted materials for AI training without explicit permission, leading to protests from rights holder groups. Despite this, a lack of data on AI-related copyright infringement cases and court precedents has hindered discussions on potential law revisions.

The big picture

There 23 active lawsuits underway, including recent cases against Nvidia (Authors for using the Books3 dataset)

Patronus AI conducted research testing leading AI models for copyright infringement using copyrighted books. The results for OpenAI’s GPT-4 produced the highest amount of copyrighted content, responding to 44% of prompts with copyrighted text. Other models also generated copyrighted content, with varying frequencies. OpenAI has defended its use of copyrighted works, stating that it’s impossible to train top AI models without such materials arguing that limiting training data to public domain works from over a century ago would not meet the needs of modern AI systems.

The issues revolve around fair use versus infringement, with companies like OpenAI arguing that training models on copyrighted data constitutes fair use, while copyright holders advocate for compensation, consent, and attribution. This is likely to impact the market with one outcome seeing AI offerings paying license fees to rights holders or even opting to use pure synthetic data as an reach around.

Seeing this weeks forward guidance from Adobe on quarterly revenues is an indicator of the legal headwinds and market complexities all these companies will be facing in the coming months. In Adobe’s case their product is crap (imho) These facts further inform my take that the earliest we will see meaningful impact to earnings will be north of Q1 2025. Sans the chip sector and data centers that deals in the currency AI needs.

Getting the attribution balance right

Provenance and Originality: Attribution in latent spaces could help trace the evolution and lineage of creative works that have been generated or modified using AI. Suppose an image was generated from an initial latent code that was then iteratively modified. Attribution techniques could identify the contribution of each step, potentially aiding in determining ownership and rights over the final work.

All that said deeply auditable latent space is not a thing right now. Extracting reliable stylistic fingerprints or interpreting latent space watermarks is complex and frankly its an active area of research not at a go to market place and all of it depending on watermarking, style fingerprint and metadata tracking being systematically implemented across models. Read that paragraph twice and think about the likely hood of it being realized and enforced in the current market conditions It make ones head explode.

Transformative Use: Understanding which elements within latent space are needed to establish a work’s identity is relevant to copyright’s “fair use” doctrine. AI modification of existing works might be considered transformative if it significantly alters the meaning or message expressed within the latent representation. All those Richard Prince court cases on process and new derivative works come to mind here…

Attribution to Compensation

Attribution could help define the boundaries of what constitutes a significant change. Derivative Works: If a latent code is considered substantially derived from a copyrighted work, using that latent code to generate new outputs intersects with the IP of the copyright holders rights.

Attribution can help determine the extent of similarity between latent representations and their potential sources. Attribution adds technical proof to a copyright claim.

IP experts have differing views on how copyright law should apply to latent representations as the cases being argued in courts articulate. Copyright law hasn’t caught up with the complexities of AI-generated content, making it difficult to apply traditional concepts like authorship and originality to works derived from latent spaces. Plus the high-dimensionality of latent spaces presents technical challenges for attribution, and there’s no clear agreement on who holds copyright for latent representations or the role of AI models in generating them.

We are here.