OpenAI Company

Suchir Balaji: The OpenAI Whistleblower’s Story

Suchir Balaji, former OpenAI researcher, challenged the company’s fair-use stance on AI training data. What’s known, disputed, and unproven.

By ChatAI Guide Editorial Updated May 5, 2026 11 min read

Document stack labeled FAIR USE beside training data blocks and a legal scale labeled COURT RECORD.

Suchir Balaji was a former OpenAI researcher who became one of the most visible critics of the company’s use of copyrighted material to train generative AI systems. He worked at OpenAI from 2020 to 2024, published a detailed fair-use critique on October 23, 2024, and was later named by The New York Times’ lawyers as someone who might have relevant documents in its copyright case against OpenAI.^[2]^[1]^[3] Balaji was found dead in San Francisco on November 26, 2024, at age 26.^[3] San Francisco police and medical examiner officials ruled his death a suicide; his family has disputed that conclusion and sought further records and outside review.^[7]^[8]^[9]

Who was Suchir Balaji?

Suchir Balaji was an AI researcher based in San Francisco. On his personal homepage, he described himself as a former OpenAI researcher who worked at the company from 2020 to 2024 and studied at UC Berkeley before that.^[2] That brief biography explains why his later criticism attracted attention. He was not an outside commentator. He had worked inside the company during the period when ChatGPT and later model systems became central to OpenAI’s business and public profile.

Balaji’s role has been described in press accounts as involving work on systems and data behind ChatGPT and GPT-4.^[3] That matters because his critique focused less on chatbot behavior at the user interface and more on the training-data process underneath it. For broader context on how the company evolved during those years, see our OpenAI history and our guide to OpenAI’s CTO and leadership team.

He became known publicly in October 2024, after speaking about his concerns over AI training and copyright. He then published his own essay, giving readers a direct look at his reasoning rather than only a secondhand account of his views.^[1]

Three timeline cards labeled UC BERKELEY, OPENAI 2020-24, and PUBLIC ESSAY.

What Balaji argued about OpenAI and copyright

Balaji’s central claim was that many generative AI products could struggle to justify their training practices as fair use. His October 23, 2024 essay did not say every AI training use is illegal. It argued that fair use is case-specific and that ChatGPT’s use of training data deserved a closer analysis under the four statutory factors.^[1]^[10]

His analysis emphasized market substitution. In plain English, Balaji worried that an AI system trained on a publisher’s, author’s, programmer’s, or community’s work could later compete with that same source. If a model can answer questions, draft prose, summarize documents, or produce code in ways that replace visits, subscriptions, licensing, or sales, then the fourth fair-use factor becomes central. The U.S. Copyright Office describes that factor as the effect of the use on the potential market for, or value of, the copyrighted work.^[10]

Balaji also focused on copying during training. He acknowledged that generative models may rarely output text that is substantially similar to any one training input, but he argued that the training process can still involve making copies of copyrighted data.^[1] That distinction is important. Many public debates focus on whether ChatGPT reproduces text verbatim. Balaji’s critique also asked whether the earlier act of copying data for model training needs a fair-use defense.

Process with 5 stages: Source works, Dataset copies, Model training, Generated outputs, Market impact.

OpenAI has taken the opposite position. The company says AI training is fair use and has argued that its models learn patterns rather than serve as databases of copied articles.^[4] OpenAI’s California training-data summary also says its systems are developed from a variety of data sources, including publicly available data, partner data, user or trainer content, researcher-generated information, synthetic data, and datasets that may include copyrighted material.^[5]

Four-part fair-use wheel labeled PURPOSE, NATURE, AMOUNT, and MARKET, with MARKET highlighted.

Why his allegations mattered to OpenAI’s lawsuits

Balaji’s comments landed during a high-stakes legal fight over AI training data. The New York Times sued OpenAI and Microsoft in December 2023, claiming that the companies unlawfully used Times works to train large language models.^[4]^[6] That lawsuit is one part of a wider copyright conflict involving publishers, authors, artists, AI companies, and platform partners. For background on the commercial relationship at the center of many OpenAI disputes, see our guide to OpenAI and Microsoft.

Balaji mattered to that litigation because he had internal technical experience and had publicly rejected OpenAI’s fair-use position. The Associated Press reported that he told it he would try to testify in the strongest copyright cases, and that Times lawyers named him in a November 18, 2024 court filing as someone who might have unique and relevant documents supporting allegations of willful copyright infringement.^[3]

The legal question is not simply whether AI systems are useful or whether training helps innovation. Courts must evaluate specific claims, evidence, works, uses, markets, and defenses. A November 22, 2024 federal discovery order in The New York Times Company v. Microsoft Corporation et al. described the case as alleging that OpenAI unlawfully used copyrighted works to train large language models and noted that OpenAI sought discovery relevant to its fair-use defense.^[6]

That is why Balaji’s story sits at the intersection of corporate accountability, copyright law, and AI infrastructure. It is also why it should be separated from broader personality coverage of OpenAI executives. Readers who want the company governance backdrop can start with who owns OpenAI, Sam Altman’s biography, and our continuing OpenAI news coverage.

Timeline of key events

The timeline below separates Balaji’s work history, his public criticism, the copyright litigation context, and the investigation into his death. It does not assume facts beyond the public record cited here.

Date	Event	Why it matters
2020 to 2024	Balaji worked at OpenAI, according to his personal homepage.	His later criticism came from someone with direct company experience.^[2]
December 27, 2023	The New York Times sued OpenAI and Microsoft, according to OpenAI’s own lawsuit page.	The case became a major test of AI training and fair use.^[4]
October 23, 2024	Balaji published “When does generative AI qualify for fair use?”	The essay laid out his case-specific critique of generative AI fair-use claims.^[1]
November 18, 2024	Times lawyers named Balaji as someone who might have unique and relevant documents, according to AP.	That linked him directly to discovery in the copyright dispute.^[3]
November 26, 2024	Balaji was found dead in his San Francisco apartment.	Officials later said he died by suicide; his family disputed that conclusion.^[3]^[7]
February 14, 2025	SFPD and OCME issued a joint response saying they found insufficient evidence of homicide and considered the case closed unless new evidence supports reopening.	This became the official account of the death investigation.^[7]

Six-node timeline labeled 2020-24, DEC 2023, OCT 2024, NOV 18, NOV 26, and FEB 2025.

Death investigation and family objections

San Francisco police found Balaji dead on November 26, 2024, after a welfare check, according to AP and the later city response.^[3]^[7] Officials said the death appeared to be a suicide, and the Office of the Chief Medical Examiner confirmed the manner of death.^[3]

The February 14, 2025 joint response from the San Francisco Police Department and the Office of the Chief Medical Examiner gave the most detailed official summary cited in this article. It said OCME found no evidence establishing a cause and manner of death other than suicide by self-inflicted gunshot wound to the head. It also said SFPD found insufficient evidence that Balaji’s death was the result of homicide.^[7]

The same response cited several investigative points: Balaji was found alone inside his residence; the single door had no signs of forced entry and had the deadbolt engaged; windows were not accessible entry points; video and key-fob records did not indicate another person entered; the pistol was purchased and registered to Balaji in January 2024; gunshot residue particles were detected on both hands; and ballistic testing confirmed the firearm caused his death.^[7]

Balaji’s parents have rejected the official conclusion. In late December 2024, The Guardian reported that they demanded an FBI investigation because they believed San Francisco police lacked the ability to investigate a case involving cybersecurity and whistleblower-protection issues.^[9] In early February 2025, SFist reported that the family sued SFPD and the City of San Francisco seeking access to investigation records.^[8]

Those two positions can coexist in the public record. The city’s official conclusion is suicide. The family’s public position is that the inquiry was inadequate and should be reopened or reviewed by outside authorities. A careful account should state both without treating allegations as proof.

Split evidence board labeled OFFICIAL REPORT, SUICIDE, FAMILY REQUEST, and RECORDS.

What is settled, disputed, and unproven

The phrase “OpenAI whistleblower Suchir Balaji” now carries several different claims. Some are well supported. Some are contested. Some remain unproven. The table below is the cleanest way to separate them.

Claim	Status	Best reading
Balaji worked at OpenAI from 2020 to 2024.	Supported by his own homepage.	Treat as established biographical fact.^[2]
Balaji argued that many generative AI fair-use defenses were weak.	Supported by his October 23, 2024 essay.	Treat as his stated legal and technical position, not as a court ruling.^[1]
OpenAI says AI training is fair use.	Supported by OpenAI’s public lawsuit page.	Treat as OpenAI’s position in the dispute.^[4]
Balaji may have had relevant documents for the Times case.	Reported by AP based on court-related filings.	Treat as a litigation-discovery claim, not as proof of OpenAI liability.^[3]
San Francisco authorities ruled his death a suicide.	Supported by SFPD and OCME’s February 14, 2025 response.	Treat as the official determination.^[7]
His family believes the death investigation was inadequate.	Supported by Guardian and SFist reporting on the family’s demands and lawsuit.	Treat as the family’s dispute with the official investigation.^[8]^[9]
OpenAI was involved in his death.	Unproven in the cited public record.	Do not state as fact without verified evidence.

This distinction is essential. Balaji’s copyright critique deserves serious attention on its own terms. His family’s objections to the investigation also deserve accurate reporting. Neither point justifies turning uncertainty into a factual accusation.

How to read the story carefully

Start with primary documents when possible. Balaji’s own essay is the best source for his copyright reasoning.^[1] OpenAI’s own lawsuit page is the best source for OpenAI’s public legal position.^[4] The SFPD and OCME response is the best cited source for the official death investigation summarized here.^[7]

Second, keep legal status separate from moral judgment. Balaji believed OpenAI’s approach was legally and socially harmful. OpenAI says its model training is protected by fair use. The courts decide liability. The public can debate policy, but a court record is different from an essay, a company statement, or a social-media post.

Third, be precise about what “whistleblower” means in this story. Balaji publicly criticized his former employer and was identified as someone who might have relevant documents in a major copyright lawsuit.^[3] That does not mean every claim made after his death is verified. It means his insider perspective became part of the public debate over AI training data.

Finally, remember that the underlying copyright issue is larger than one company. OpenAI, Microsoft, publishers, authors, developers, and courts are still shaping the rules for training-data markets. If you follow OpenAI as a company, this story connects to its leadership, partnerships, legal risk, and business model. Related background includes OpenAI valuation, OpenAI careers, and when ChatGPT was released.

Frequently asked questions

Who was Suchir Balaji?

Suchir Balaji was a former OpenAI researcher in San Francisco. His personal homepage says he worked at OpenAI from 2020 to 2024 and studied at UC Berkeley before that.^[2] He became publicly known after criticizing OpenAI’s use of copyrighted data in generative AI training.

Why is Balaji called an OpenAI whistleblower?

He is called a whistleblower because he publicly challenged his former employer’s training-data practices and argued that many generative AI products may not have a strong fair-use defense. He also was named by Times lawyers as someone who might have unique and relevant documents in the OpenAI copyright litigation.^[3]

What did Balaji say about fair use?

Balaji argued that fair use must be analyzed case by case and that ChatGPT’s use of training data raised serious questions under the statutory factors.^[1] He put particular weight on market substitution, meaning whether an AI product can compete with or reduce the market value of the works used to train it.

What is OpenAI’s response to the copyright issue?

OpenAI says AI training is fair use and has defended that position in its public materials about The New York Times lawsuit.^[4] Its training-data summary also says it develops systems using varied sources, including publicly available data, partner data, user and trainer content, researcher-generated information, synthetic data, and material that may be protected by copyright.^[5]

How did Suchir Balaji die?

Balaji was found dead in his San Francisco apartment on November 26, 2024.^[3] San Francisco police and medical examiner officials concluded that he died by suicide and said they found insufficient evidence of homicide.^[7] His family disputes the adequacy of the investigation and has sought further review and records.^[8]^[9]

Has a court ruled that OpenAI broke copyright law because of Balaji’s claims?

No. Balaji’s claims are important because they came from a former OpenAI researcher and relate to active copyright disputes. They are not, by themselves, a court ruling that OpenAI is liable. The copyright cases depend on evidence, legal arguments, and judicial decisions.

Sources & references

10 cited

Each fact in this article was checked against the sources below. Numbers in the body link to the matching entry here.

1

When does generative AI qualify for fair use?
Suchir Balaji suchir.net accessed April 13, 2026
2

Suchir Balaji's homepage
Suchir Balaji suchir.net accessed April 13, 2026
3

Ex-OpenAI engineer who raised legal concerns about the technology he helped build has died
Associated Press apnews.com accessed April 13, 2026
4

Reporting the facts about the New York Times’ lawsuit
OpenAI openai.com accessed April 13, 2026
5

Training Data Summary Pursuant to California Civil Code Section 3111
OpenAI Help Center help.openai.com accessed April 13, 2026
6

The New York Times Company v. Microsoft Corporation et al, No. 1:2023cv11195 - Document 344
Justia law.justia.com accessed April 13, 2026
7

Death of Mr. Suchir Balaji, SFPD Case #240-735-063; OCME Case Number 2024-1459
San Francisco Police Department and Office of the Chief Medical Examiner assets.sfstandard.com accessed April 13, 2026
8

Family of OpenAI Whistleblower Sues SFPD, Demands Access to Investigation Records
SFist sfist.com accessed April 13, 2026
9

Family of OpenAI whistleblower Suchir Balaji demand FBI investigate death
The Guardian theguardian.com accessed April 13, 2026
10

U.S. Copyright Office Fair Use Index
U.S. Copyright Office copyright.gov accessed April 13, 2026

Sources were retrieved from official documentation when available. Prices, message limits, and feature lists change — verify against the linked source for production decisions.