Executive Summary
AI companies are purchasing out-of-print books by the pallet from European antiquarian bookstores and destroying them after scanning. Since May 2026, booksellers have reported systematic, automated bulk purchases by the Canadian company Zoom Books, which has specifically acquired non-fiction titles from 1970 onward – inventory that no one else wanted. The suspicion: the books serve as raw data for language models and are disposed of after being scanned. Zoom Books denies the allegations and points to a regular recycling and trading model. Experts estimate the volume at approximately 700,000 titles in Germany, three million worldwide.
People
- Sven Ahnert (Author, SRF)
Topics
- Artificial Intelligence and Copyright
- Antiquarian Book Trade and Book Culture
- Fair-Use Principle
- Data Acquisition for AI Models
- Cultural Heritage and Digitalization
Clarus Lead
The phenomenon reveals a tension between the data hunger of AI companies and the protection of analog cultural heritage. While booksellers dispose of unsaleable inventory in the short term, the long-term threat is monopolization: out-of-print books that previously functioned as circulating libraries are now concentrated as exclusive data assets in the hands of a few tech corporations – systematically, quietly, and without public debate to date. This touches on central questions regarding the Fair-Use Principle in copyright law and control over historical knowledge.
Detailed Summary
The pattern of purchases shows considerable systematicity. A German online antiquarian bookseller observed automated mass orders beginning in early May 2026 at night between three and five o'clock in the morning – systematically targeted at non-fiction titles from 1970 onward with ISBN numbers. The goods purchased were deliberately unattractive: dusty warehouse stock, with exactly one copy per title acquired. In warehouse photos, the books appear carelessly thrown into large boxes – handling that no regular bookseller practices. A transfer warehouse was established at the Czech-German border.
The legal strategy suggests the so-called Fair-Use Principle of U.S. copyright law. In contrast to copying digital texts online – which risks lawsuits for damages – physical book purchases followed by destruction might represent a gray area. The suspected logic: through physical possession and deletion after scanning, no illegally copied material remains in circulation; this is intended to qualify as fair use. The company Anthropic had previously documented this approach: acquiring millions of books, scanning them, and integrating them into language models.
Printed books are increasingly valuable for AI training. Freely accessible online texts have been largely exhausted for modern language models. AI companies are deliberately seeking older specialist books on regional history, linguistics, law, and economics – texts with historical language stages and stylistic nuances missing from contemporary internet. This creates a structural dependency: analog heritage is being repurposed from circulating collections into exclusive proprietary data sources.
Key Points
- AI companies systematically purchase out-of-print books from antiquarian bookstores, scan them, and subsequently destroy them as a purported fair-use strategy.
- The estimated volume is 700,000 titles in Germany, three million worldwide – a concentration of cultural heritage in private data repositories.
- The classical antiquarian bookstore as a circulating library is displaced by mass destruction; public access to historical texts shrinks in favor of proprietary AI models.
Critical Questions
Evidence/Source Validity: What direct evidence documents that Zoom Books actually destroys the purchased books after scanning – rather than passing them on? Are warehouse photos and statements from booksellers sufficient for this conclusion?
Conflicts of Interest: Do booksellers disposing of inventory have incentives to portray AI companies' practices negatively? Do some antiquarians profit short-term from these bulk sales?
Causality/Alternatives: Could the data gap for AI training not be closed through license acquisition, digital partnerships with archives, or other legal models – rather than book destruction?
Feasibility/Risks: What regulatory changes would be necessary to make fair use transparent and comprehensible in the digital context – without blocking innovation?
Market Power: Does the concentration of cultural heritage in few AI companies lead to a structural distortion of access to historical knowledge?
Transparency: What public data exists regarding purchase patterns, volumes, and the fate of books acquired by Zoom Books and similar actors?
Bibliography
Primary Source: Hunt for old books – AI companies buy out antiquarian bookstores and destroy books. SRF Culture News, 17.06.2026, 17:20. https://www.srf.ch/kultur/gesellschaft-religion/jagd-auf-alte-buecher-ki-firmen-kaufen-antiquariate-leer-und-vernichten-die-buecher
Supplementary Context Sources:
- Washington Post: Anthropic's book acquisition and scanning practices (mentioned in article)
- SRF Echo der Zeit: Competition for "real" data for AI, 19.07.2023
- SRF Rendez-vous: Data for AI running short, 29.05.2024
Verification Status: ✓ 22.06.2026
This text was created with the support of an AI model. Editorial responsibility: clarus.news | Fact-check: 22.06.2026