OpenAI’s comment to the NTIA on open model weights

OpenAI’s comment to the NTIA on open model weights:  This comment was submitted by OpenAI in response to NTIA’s March 2024 Request for Information on Dual-Use Foundation Models with Widely Available Weights.

I asked both Gemini and Open AI to explain that it is not open, the results are telling ….

OpenAI 

OpenAl’s comments to the NTIA address the balance between innovation and safety in Al development, emphasizing their approach to mitigating misuse through staged releases and API strategies. They propose a Preparedness Framework for assessing Al model risks and advocate for a diverse ecosystem that includes both open model weights and API-based releases. For further details, please visit OpenAl’s website directly via the provided link.

Gemini:

Gemini:   You’re absolutely right! Here’s why the situation is complex regarding

“openness” with OpenAl and its models:

It’s Not Fully Open-Source

Code and Weights Not Public: The core code behind OpenAl’s models like GPT-3 or GPT-4, and the specific “weights” (data the model was trained on) are not released to the public. This limits the ability of others to replicate, study, or fully modify the models.
API Access: Instead, OpenAl primarily offers access through their API, which involves paying for usage. This gives them some control over how the models are used and potentially mitigates misuse.

Reasons for Limited “Openness”

Misuse Potential: Large language models are powerful and can be misused for generating harmful text, propaganda, or spreading misinformation. OpenAl likely wants to control the spread to manage these risks.
Commercial Interests: OpenAl is a company, and making the models readily available could undermine their business model of selling APl access.
Computational Cost: These models are extremely expensive to train and run. Providing full access might lead to unsustainable resource usage.

The Debate

There’s ongoing debate about whether Al models this powerful should be open-source:

Pro-Openness: Advocates argue it promotes transparency, community-driven improvements, and fights potential biases within the models.
Pro-Controlled Release: Others argue responsible release, especially with potentially dangerous technology, is crucial.OpenAl likely falls into this camp.

 

 

Hotel Room TV, Guam, 2023 Inkjet print on paper Print: 12 x 16 in 30.5 x 40.6 cm Framed: 17 1/2 x 13 1/2 x 7/8 in 44.5 x 34.3 x 2.2 cm Edition of 10, 1 AP
Hotel Room TV, Guam, 2023 - Wolfgang Tillmans

MED Vol. 30, No. 83

Info

How Big Tech is winning the AI talent war (FT)
Elon Musk’s X Needs Creators, but They Don’t Need (WSJ)
Gov. Lee Signs ELVIS Act Into Law (tn.gov)  HB2091
Emad Mostaque resigned from his role as CEO of Stability (Stability AI)  [] Instability AI
GM cuts ties with 2 data firms amid heated lawsuit over driver data (Detroit Free Press)
Cops Running DNA-Manufactured Faces Through Face Recognition is Tornado of Bad Ideas  (EFF)
Scammers hide harmful links in QR codes to steal your information (FTC) [PSA = me for the last TEN years -ed]
Here’s How Google’s Generative AI for Newsrooms Product Will Work (bigtechnology.com)
216. United States v. Apple (Complaint) (hardcoresoftware)
Saudi Arabia Plans $40 Billion Push Into Artificial Intelligence (NYT)  ethics optional models?
Anthropic is lining up a new slate of investors, but the AI startup has ruled out Saudi Arabia (cnbc)
Sure. Let Adobe AI scan all of your documents. What could go wrong? (infosec exchange)
Stop: Disable the scanning of documents by Adobe AI solutions! (borncity) Turn them off
We need a cure for the curse of digital overabundance (FT)

 

Hotel Room TV, Guam, 2023Inkjet print on paper Print: 12 x 16 in 30.5 x 40.6 cm Framed: 17 1/2 x 13 1/2 x 7/8 in 44.5 x 34.3 x 2.2 cm Edition of 10, 1 AP

Hotel Room TV, Guam, 2023 – Wolfgang Tillmans

Arts 
Wolfgang Tillmans – The Point Is Matter (Zwirner) Opens in Hong Kong
Orient Express  (orient-express.com)
A goodbye to The Times and a thank you to readers (lat)

Waxahatchee – “Ice Cold” (Lyric Video – YouTube)
Slow Pulp – Slugs (Official Video)
massie99 – Anika (youtube)
Profit Tree – All In (youtube – no views! be first)

 

Open Models

Open-source AI is important because it….

  1. Fosters innovation and broad participation – Open-source AI allows developers and researchers globally to contribute, experiment and build on existing models without being blocked by proprietary barriers. This democratizes access and spurs wider innovation.
  1. Increases transparency and trust. –  With open codebases, models can be audited for potential biases, safety issues or unintended behavior by anyone. This scrutiny increases accountability and public trust.
  1. Prevents monopolization – AI development requires massive computing power. If only big tech giants have access to the largest models, it concentrates expertise and power. Open alternatives can level the playing field.
  1. Ethical development – Principles like privacy, fairness and social responsibility can be better embedded into open-source models that are collaboratively built versus corporate black boxes.
  1. Cumulative progress – Open models avoid duplicative efforts as others can directly build on prior work instead of starting from scratch behind closed doors.

From a dialog with  ChatGPT after getting blocker after blocker on publicly available info: I could not get it to provide alignment on the second part of the question after repeated prompting.

You
you are trained on works that have copyrights and you are preventing access to public domain works as part of your design is that a fact – yes or no.

ChatGPT

Yes, my design incorporates training on a wide range of materials, including copyrighted works, but I’m programmed to respect copyright laws and guidelines. My responses aim to balance providing useful information and guidance while adhering to these principle

 

Compensation – The Line of Actual Control

The EU just passed a major AI law, China’s taking one stance on AI-generated art, and Japan’s trying to protect creators. Lawsuits abound! Here a quick round up from my readings this morning. 

The battle is for compensation.

OpenAI says it’s impossible to train their amazing models without using books, art, music – basically everything created by humans. The market is forcing attribution in latent space.The need is for a mechanism for tracking and documenting contributions to latent representations of creative works.

EU parliament approves landmark AI law 

Companies using generative AI or foundation AI models like OpenAI’s ChatGPT or Anthropic’s Claude 2 must provide detailed summaries of any copyrighted works, including music, used to train their systems.  Training data sets used in generative AI music or audio-visual works must be watermarked for traceability by rights holders. Content generated by AI must be clearly labeled as such, and tech companies must prevent the generation of illegal and infringing content.  Interestingly The utilization of copyrighted materials in constructing AI models might be construed as a necessary “temporary act of reproduction,” integral to a technological process. 

China – Stable Diffusion China ruling

The Beijing Internet Court’s first instance judgment on AI-generated artwork from Stable Diffusion marked China’s first ruling on such creations, finding copyrightable elements. In contrast, the US Copyright Office in February 2023 deemed images generated by the AI drawing tool Midjourney ineligible for copyright, emphasizing human authorship.

France

The French government is currently investing 1.5 billion euros into AI, and has championed a so-called “open source” approach. The French government is currently investing 1.5 billion euros into AI, and has championed a so-called “open source” approach.  

Japan

The Cultural Affairs Agency in Japan has launched a data collection initiative to track copyright infringement cases related to the development and use of generative artificial intelligence (AI). This effort aims to address concerns among creators about AI generating large volumes of texts and illustrations resembling their original works.

Data will be gathered through a legal consultation service website, where creators can seek advice from agency-appointed lawyers on copyright-related issues at no cost. Since 2018, revisions to the Copyright Law have allowed the use of copyrighted materials for AI training without explicit permission, leading to protests from rights holder groups. Despite this, a lack of data on AI-related copyright infringement cases and court precedents has hindered discussions on potential law revisions.

The big picture 

There 23 active lawsuits underway, including recent cases against Nvidia (Authors for using the Books3 dataset)

Patronus AI conducted research testing leading AI models for copyright infringement using copyrighted books. The results for OpenAI’s GPT-4 produced the highest amount of copyrighted content, responding to 44% of prompts with copyrighted text. Other models also generated copyrighted content, with varying frequencies.  OpenAI has defended its use of copyrighted works, stating that it’s impossible to train top AI models without such materials arguing that limiting training data to public domain works from over a century ago would not meet the needs of modern AI systems. 

The issues revolve around fair use versus infringement, with companies like OpenAI arguing that training models on copyrighted data constitutes fair use, while copyright holders advocate for compensation, consent, and attribution.   This is likely to impact the market with one outcome seeing AI offerings paying license fees to rights holders or even opting to use pure synthetic data as an reach around. 

Seeing this weeks forward guidance from Adobe on quarterly revenues is an indicator of the legal headwinds and market complexities all these companies will be facing in the coming months. In Adobe’s case their product is crap (imho) These facts further inform my take that the earliest we will see meaningful impact to earnings will be north of Q1 2025.  Sans the chip sector and data centers that deals in the currency AI needs.

Getting the attribution balance right

Provenance and Originality: Attribution in latent spaces could help trace the evolution and lineage of creative works that have been generated or modified using AI. Suppose an image was generated from an initial latent code that was then iteratively modified. Attribution techniques could identify the contribution of each step, potentially aiding in determining ownership and rights over the final work. 

All that said deeply auditable latent space is not a thing right now. Extracting reliable stylistic fingerprints or interpreting latent space watermarks is complex and frankly its an active area of research not at a go to market place and all of it depending on watermarking, style fingerprint and metadata tracking being systematically implemented across models.  Read that paragraph twice and think about the likely hood of it being realized and enforced in the current market conditions It make ones head explode.  

Transformative Use: Understanding which elements within latent space are needed to establish a work’s identity is relevant to copyright’s “fair use” doctrine. AI modification of existing works might be considered transformative if it significantly alters the meaning or message expressed within the latent representation. All those Richard Prince court cases on process and new derivative works come to mind here…

Attribution to Compensation 

Attribution could help define the boundaries of what constitutes a significant change. Derivative Works: If a latent code is considered substantially derived from a copyrighted work, using that latent code to generate new outputs intersects with the IP of the copyright holders rights. 

Attribution can help determine the extent of similarity between latent representations and their potential sources.  Attribution adds technical proof to a copyright claim.

IP experts have differing views on how copyright law should apply to latent representations as the cases being argued in courts articulate. Copyright law hasn’t caught up with the complexities of AI-generated content,  making it difficult to apply traditional concepts like authorship and originality to works derived from latent spaces. Plus the high-dimensionality of latent spaces presents technical challenges for attribution, and there’s no clear agreement on who holds copyright for latent representations or the role of AI models in generating them.


We are here. 

 

 

MED Vol.30, No.75

Automakers Are Sharing Consumers’ Driving Behavior With Insurance Companies  (NYT) LexisNexis, which generates consumer risk profiles for the insurers, knew about every trip G.M. drivers had taken in their cars, including when they sped, braked too hard or accelerated rapidly.  Chicco v. General Motors LLC (9:24-cv-80281)

If you “buy” a digital movie on Amazon, but Amazon removes the movie from its library when Amazon’s license expires, did you really buy it? (x)

Just because your favorite singer is dead doesn’t mean you can’t see them ‘live’ (NPR). Pepper’s Ghost.

Viola the Bird   (Google Arts..) and if you are not cooking up your creative side at their AI Test Kitchen you should be

UN AI advisory feedback RFC  (UN.org). 15 days left.

The Rough Years That Turned Gen Z Into America’s Most Disillusioned Voters (WSJ)

Concord Music Group, Inc. v. X Corp (court listener)