
Jamie Dobson, founder of Container Solutions and author of Visionaries, Rebels and Machines, explains the potential different types of government regulation for AI’s use of data
Jamie Dobson explores how AI systems rely on vast amounts of human-created content, often without consent or compensation, and the growing calls for regulation to protect creators and knowledge workers. For Assistants who support business leaders, understanding these regulatory debates is key to anticipating risk, shaping ethical AI policies, and safeguarding long-term business strategy. It’s a must-read for staying informed, proactive, and aligned with the values of transparency and fairness in the age of AI.
As AI systems transform industries at unprecedented speed, an urgent question emerges: who controls the data powering this revolution, and who benefits from it? Behind boardroom discussions about AI adoption lies a fundamental ethical and economic dilemma that will shape the future of creative industries, knowledge work, and perhaps society itself.
Unlike previous technological revolutions that primarily transformed physical labour, AI’s impact extends to intellectual and creative domains previously considered uniquely human. This isn’t merely another innovation cycle ─ it represents a fundamental restructuring of our economic and social fabric with implications comparable to the Industrial Revolution.
AI Consumption of Collective Knowledge
Modern AI systems operate on a simple yet profound principle: they learn by digesting vast quantities of human-created content. ChatGPT, Copilot, and DALL-E aren’t independently intelligent – they’re sophisticated pattern-recognition systems trained on billions of examples of human creativity and knowledge.
Initially, tech companies trained these models on publicly available data, content that was accessible but still represented countless hours of human effort, creativity, and expertise. Companies operated under the assumption that if content was accessible, it was fair game. No attribution needed, no compensation required.
As models grew more sophisticated, they required ever more data. Companies expanded their harvesting to include copyrighted content, paywalled articles, and private repositories. Meta’s recent legal trouble over scraping approximately 82TB of data for AI training illustrates this escalation.
What makes oversight particularly challenging is that many AI companies delete their training data after models are built – like removing the falsework that supports an arch during its construction. Once the arch is complete, the falsework is lost to history, but the arch, its shadow, remains. This practice, ostensibly for privacy and intellectual property protection, makes it nearly impossible to audit models for bias or track the use of copyrighted material.
Currently, most jurisdictions have no specific regulations governing how companies can use publicly available data for AI training. This regulatory vacuum has allowed AI developers to operate under a ‘take first, ask questions later’ approach, creating multi-billion-dollar technology platforms using content they didn’t create or license.
The Economic Stakes
Without regulation, we risk undermining the economic foundations of creative and knowledge-based industries. Journalism, photography, literature, music, and visual arts depend on compensation mechanisms that AI training currently bypasses.
When AI systems can freely reproduce the style and substance of human creators without compensation, what incentive remains to create original work?
Consider this potentially destructive cycle:
- AI systems train on human-created content without compensation
- Economic incentives for human creation diminish
- New content production declines in quantity or quality
- AI systems have less novel material to learn from
- AI outputs become increasingly derivative and homogenised
While proponents argue that AI will create new opportunities for creators, critics point out that without regulatory frameworks ensuring fair compensation, the transition period could devastate creative industries already operating on thin margins.
The creative sector isn’t alone in facing disruption. McKinsey predicts that 30% of US work hours across all sectors will be automated by 2030, potentially displacing 12 million workers. However, creative fields face a unique double threat: not only might their jobs be automated, but their existing work could be used to train the very systems replacing them.
Potential Regulatory Frameworks
As governments worldwide grapple with these challenges, several regulatory approaches are emerging:
1. Opt-in or Opt-out Models
The simplest solution could be to create a system for opting content in or out of AI training models. In theory, this could be quick to implement with minimum complexity. Yet, given that some models are already being trained on copyrighted content (which should already be a legal ‘opt-out’), it might not be particularly effective.
- Opt-Out Model (UK Proposal): Content creators must explicitly mark their work as not available for AI training. This places the burden on creators to protect their content.
- Opt-In Model (Advocated by Creator Organisations): AI companies must obtain permission before using copyrighted material, similar to how music licensing works.
For businesses, an opt-out system offers fewer obstacles to AI development but creates long-term legal uncertainty. An opt-in system provides clearer legal boundaries but potentially slower access to training data.
The UK’s proposed opt-out mechanism is particularly contentious. It’s essentially telling creators that someone can take their property unless they explicitly post a ‘No Trespassing’ sign – in a language that hasn’t been invented yet. Critics argue this approach heavily favours large tech companies, as creators could easily lose rights to their work by simply forgetting to check a box or failing to implement technical measures they may not even understand.
Another issue is policing this and ensuring that opted-out data is not inadvertently or deliberately used.
2. Data Rights and Compensation Models
Similar to how music and literary rights work, content creators could receive compensation when their work is used for AI training. This could be done on an ad-hoc basis, like music streaming, or through government distribution via a digital tax.
Collective licensing
Creators register with collecting societies that negotiate with AI companies and distribute payments based on usage. This model exists in music with performing rights organisations such as PRS in the UK, ASCAP and BMI in the USA, GEMA in Germany, or SACEM in France.
Data dividend
A tax or fee on AI companies based on their data usage, with proceeds distributed to creators. This resembles public lending rights systems in countries like the UK, Canada, and Australia, where authors receive payments when libraries lend their books.
Direct licensing
Individual negotiations between major content producers and AI companies, with standardised terms for smaller creators.
The music industry’s experience with streaming services offers important lessons. While Spotify created a legal framework for music consumption, many artists receive fractions of a penny per stream. A 2021 UK parliamentary inquiry into music streaming described the system as having ‘fundamental problems’ in how it compensates creators. Any AI compensation system would need stronger protections to avoid replicating these issues.
3. AI as a Public Resource
Some experts advocate treating advanced AI systems like public utilities or natural monopolies. This would work similarly to electricity companies, for example, where the national grid is seen as a natural monopoly and the government implements certain standards and expectations for managing it as a public resource.
- Private companies would continue developing AI, but under enhanced regulatory oversight
- Transparency requirements would include regular audits and public reporting
- Universal access provisions would ensure broad distribution of benefits
- Price controls or licensing requirements would prevent monopolistic practices
This approach draws from how telecommunications, electricity, and other essential services are regulated in many countries. It acknowledges both the innovation potential of private enterprise and the public interest in fair, accessible AI systems.
Transparency and Technical Safeguards
Any potential regulatory framework will require some level of transparency and technical safeguards to ensure that AI is not operating as a black box. We need to know how the algorithms are fed and on what data to ensure that creators are fairly compensated and we aren’t introducing systemic biases into what will become ubiquitous technology. In other words, whatever system is chosen, it needs to be tracked and policed to ensure compliance. And this ‘policing’ needs to be paid for.
This will entail:
Provenance tracking
Requiring AI companies to maintain comprehensive records of their training data sources, making it possible to audit systems and verify compliance.
Attribution systems
Building technical capabilities into AI systems that credit original creators when their work influences outputs.
Content authentication
Developing standards for watermarking or otherwise identifying AI-generated content to distinguish it from human-created work.
These technical measures would address what many legal experts consider a fundamental flaw in current AI development: the lack of transparency about what data is being used and how it influences outputs.
The publishing industry could offer a useful approach. Copyright registration systems and ISBN standards create a framework for tracking and attributing written works. Similar systems could be developed for AI training data, creating both accountability and the technical infrastructure for fair compensation.
What This Means for Business
Whether your business is using AI, developing AI, or creating digital content, the evolving regulatory landscape demands strategic preparation. AI is not a fad that will disappear in a few years; it is a revolutionary technology that will fundamentally shift the global economy and labour markets as we know them. Preparing for potential regulatory changes now will allow you to adjust to this new world as quickly and effortlessly as possible.
Risk Assessment and Compliance Planning
Audit your AI systems for data provenance and prepare for increased transparency requirements. Document current practices and develop transition plans for different regulatory scenarios.
For companies developing AI, this means implementing systems to track training data sources. For companies using AI products, it means due diligence on your vendors’ data practices to avoid downstream liability.
Ethical Positioning and Brand Protection
Beyond compliance, consider how your AI data practices align with your brand values and customer expectations. As public awareness grows, companies seen as exploiting creators may face reputational damage.
Early adopters of ethical AI practices may gain competitive advantages as regulatory requirements catch up to ethical standards. Consider developing transparent AI usage policies that acknowledge and compensate content creators.
Exploring New Business Models
For content creators and rights holders, consider how you might proactively engage with AI development:
- Developing licensing frameworks for your content
- Creating AI-specific content packages with clear usage terms
- Forming collectives to increase bargaining power with AI companies
Technology companies should explore subscription models that include licensing fees directed to content creators, potentially avoiding more onerous regulation through proactive industry standards.
Conclusion
The current regulatory vacuum around AI’s use of data cannot persist indefinitely. Whether through government regulation, industry self-regulation, or landmark legal cases, new frameworks for managing AI’s relationship with human creativity will emerge.
The businesses that thrive won’t be those extracting maximum short-term value from unregulated data harvesting, but those building sustainable models that respect and reinforce the creative ecosystem upon which AI ultimately depends.
After all, AI systems – no matter how advanced – remain fundamentally derivative of human creativity and knowledge. Ensuring the continued flourishing of that human creativity isn’t just an ethical imperative; it’s essential for AI’s own future development.