← Archive
lm-001179 · 2026-07

Protocolized Openness Why “Not Prohibited” Does Not Mean “Learnable” in the Age of AI

Protocolized Openness: Why “Not Prohibited” Does Not Mean “Learnable” in the Age of AI

協議化開放:AI 時代中「不禁止」為何不等於「可學習」

Author: Neo.K / EVEMISSLAB
Version: v0.1 Draft
Date: 2026-06-30
Type: Markdown Paper / Theoretical Precursor to AIRS / AI-native Web Observation Paper
Keywords: protocolized openness, AI Rights Spectrum, AIRS, AILP, AI learning permission, AI readability, data cleaning, base space, reconstruction and recall, AIO, GEO, AI-native Web


Abstract

In the age of AI, creators, researchers, and website owners often assume that if their content is public, if AI crawlers are not blocked, or if they explicitly write “AI systems are welcome to learn from this,” then AI systems will naturally read, understand, absorb, and use that content in a relatively complete way. However, this assumption does not hold within the modern AI industrial chain, data governance pipelines, legal risk management systems, training data cleaning processes, and model learning mechanisms.

This paper proposes the concept of Protocolized Openness. It argues that in high-risk, highly automated, legally sensitive AI systems, undefined openness is not automatically interpreted as maximum freedom. On the contrary, it may be interpreted as uncertainty, which may lead to content being excluded, summarized, fragmented, restricted to search indexing, excluded from training data, or only indirectly understood through reconstruction.

Therefore, openness in the AI era should not remain merely at the level of natural-language statements such as “learning is welcome” or “not prohibited.” Creators and rights holders need to transform their openness conditions into a machine-readable, use-specific, depth-aware, citable, licensable, compensable, and governable protocol form. This paper refers to that transformation as the interfacing of freedom and the protocolization of openness.

This paper serves as a theoretical precursor to AIRS: AI Rights Spectrum and AILP: AI Learning Permission Protocol. It does not provide the full protocol specification. Instead, it addresses the fundamental motivation behind them: Why is it not enough, in the AI era, to merely publish content, allow crawlers, or declare that AI systems are welcome to learn? Why can clearly stated conditions be not a contraction of freedom, but the very condition under which freedom becomes understandable to machines, agents, crawlers, legal systems, and future data pipelines?

The core thesis is:

In the age of AI, unconditional freedom without declared conditions may be interpreted by high-risk systems as unusable uncertainty; clearly protocolized limited freedom may become the truly executable form of openness.


1. Starting Point: From Open AIO / GEO to Protocolized Openness

In the early stage of AI search and generative engine optimization, creators often adopt an open strategy:

Publish content.
Do not block crawlers.
Let AI systems see the website.
Let AI systems learn the theory.
Expand the influence of ideas and conceptual products through AI.

This strategy may be called:

Open Ingestion Strategy

In the context of AIO, or AI Optimization, and GEO, or Generative Engine Optimization, this strategy is reasonable. The core issue in the first stage is:

Can AI see me?
Can AI read me?
Can AI mention me in its answers?
Can AI incorporate my concepts into its knowledge field?

However, as websites move beyond plain text into dynamic frontends, playgrounds, agent tool layers, AI-readable corpora, /llms.txt, /ai/manifest.json, and other AI-native Web structures, the problem begins to shift.

The original question was:

Can AI see me?

The new questions become:

Can AI read me correctly?
Can AI know how it is allowed to learn from me?
Can AI know which uses are allowed and which require a license?
Will AI learn the work fully, or will it only summarize, fragment, clean, and reconstruct it?

This is a transition from visibility, to readability, and then to learnability permission.


2. The Original Assumption: I Did Not Prohibit It, Therefore AI Will Learn

Many creators begin with the following assumption:

I published it.
I did not prohibit it.
I even said that AI systems are welcome to learn.
Therefore, AI should be allowed to learn it, and should in fact learn it.

This assumption looks reasonable in human-to-human contexts.
If an author publicly says, “Everyone is welcome to read and learn from this,” human readers usually understand it as an open attitude.

But the AI industrial chain is not a simple interpersonal understanding system. It includes:

crawlers
data extractors
data cleaners
legal risk classification systems
license evaluation systems
training data pipelines
deduplication systems
vectorization systems
RAG indexing systems
model training systems
output safety and copyright filters

These systems do not operate only according to a creator’s good-faith intention.
They require conditions that are parseable, classifiable, auditable, and machine-processable.

Therefore, “I did not prohibit it” does not automatically mean:

Commercial training is allowed.
Fine-tuning is allowed.
Distillation is allowed.
Long-term embedding storage is allowed.
Model weight integration is allowed.
Summary generation is allowed.
Citation is allowed.
Agent response use is allowed.
Author source preservation is required.

“Not prohibited” is only the lowest-level signal of openness.
It is not a complete AI learning permission.


3. A Counterintuitive Thesis: Unconditional Freedom May Become Unusable

This paper proposes a counterintuitive thesis:

In high-risk systems, undefined freedom is often not interpreted as maximum freedom, but as uncertainty.

Here, high-risk systems include:

large AI companies
commercial training data pipelines
publishing and copyright-sensitive databases
enterprise-grade crawlers
legal review processes
content licensing markets
model safety and output filtering systems

In these systems, uncertainty usually does not lead to bold usage. It leads to conservative handling.

That is:

unclear license → reduced use
unclear purpose → indexing only
unclear training permission → exclusion from training data
unclear commercial rights → treated as risk
unclear citation conditions → no citation or weak citation
unclear full-text permission → summaries or fragments only

Therefore, undefined unlimited freedom may in practice become:

unusable
not deeply learnable
not commercially usable
not allowed into the base space
not fully recallable
only reconstructable

This is the structural reason why “infinite” does not necessarily mean infinite.


4. Why Unconditioned Infinity May Lead Systems to Fence Themselves In

In low-risk human contexts, people may interpret “not prohibited” as “allowed.”
But machines and organizations in high-risk contexts often interpret “not explicitly allowed” as “better not do it.”

This creates a paradox:

The creator thinks they have granted unlimited freedom.
The AI company sees unclear authorization.
The crawler performs only minimal reading.
The data cleaning system excludes the content.
The model can only reconstruct from secondary summaries.
In the end, the creator’s freedom was never actually used.

In other words:

The freedom offered by the creator does not become an operational permission for AI.

This is not because the creator is insufficiently open. It is because openness has not been protocolized.

Therefore, the AI era requires a new principle:

Freedom needs an interface.

If freedom exists only inside a human intention, external systems may not be able to read it.
If freedom is transformed into a machine-readable protocol, external systems may be able to use it within their processes.


5. Conditions Are Not Restrictions, but Operationalization

The ordinary intuition is:

More conditions mean less freedom.
Fewer conditions mean more freedom.

But in AI learning governance, this is not always true.

For highly automated systems:

no conditions = uncertainty
uncertainty = risk
risk = cleaning / exclusion / downgraded use

Therefore, having conditions is not necessarily a restriction.
Having conditions may instead be the prerequisite for openness.

For example:

Search indexing is allowed.
RAG is allowed.
Summarization is allowed.
Short quotation is allowed.
Non-commercial training is allowed.
Commercial training requires attribution or licensing.
Long near-verbatim output is prohibited.
Style imitation is prohibited.
Substitutive generation is prohibited.

This may appear more limited than “AI systems are welcome to learn.”
But for machines and organizations, it is more usable.

Because it answers specific questions:

What is allowed?
What is not allowed?
What requires licensing?
What requires citation?
What is non-commercial only?
What may enter training?
What may only be used temporarily?

This is the core of protocolized openness:

Conditions do not exist to cancel freedom. They exist to make freedom executable.


6. From Passive Openness to Active Permission

This paper distinguishes between two modes of openness.

6.1 Passive Openness

I do not stop you.
I do not block you.
I publish it for you to see.
You may come and read it yourself.

Passive openness is simple.
But its weakness is unclear permission.

AI systems may not know:

Is commercial training allowed?
Is fine-tuning allowed?
Is summary generation allowed?
Is citation allowed?
Is long-term storage allowed?
Are embeddings allowed?
Is distillation allowed?

6.2 Active Permission

I explicitly tell you:
how you may read,
how you may learn,
how deeply you may learn,
which uses are allowed,
which uses require citation,
which uses require licensing,
and which uses are not allowed.

Active permission is operational.
It provides clear signals for AI systems, agents, crawlers, data pipelines, and legal systems.

Thus, the evolution of openness in the AI era is:

public content
→ learning is welcome
→ AI-readable content
→ machine-readable permission
→ protocolized openness

7. From AIO / GEO to AIRS: Visibility Is Not Learnability

AIO / GEO addresses visibility and citability:

Can AI find me?
Will AI mention me in its answers?
Does AI understand the topic of my page?
Will AI treat me as a credible source?

AIRS / AILP addresses learnability and permission clarity:

Does AI know how deeply it may learn?
Does AI know whether training is allowed?
Does AI know whether fine-tuning is allowed?
Does AI know whether long-term retention is allowed?
Does AI know whether citation is required?
Does AI know whether commercial use requires licensing?

Therefore:

AIO / GEO = visibility optimization
AIRS / AILP = learnability and permission clarification

Put simply:

AIO / GEO helps AI see me.
AIRS / AILP helps AI know how it may learn from me.

The two are not mutually exclusive. They are consecutive stages.


8. The Cleaning Problem: If Openness Is Not Parseable, It May Still Be Removed

The problem of data cleaning is not merely that content is prohibited from being learned.
The deeper problem is that when permission is unclear, risk is unclear, and use conditions are unclear, content may be systematically excluded.

This produces three consequences:

The creator receives no compensation.
The AI does not receive complete knowledge.
The user receives an AI with a structurally incomplete base space.

More seriously, this gap is usually invisible.

The user still sees fluent answers.
But the user does not know which original texts the AI has never read.
The user does not know whether the AI is extracting from a complete argument, or reconstructing from summaries, comments, and fragments.
The user does not know whether certain ideas have been cleaned away, leaving the AI capable only of approximate reasoning in that domain.

Therefore, protocolized openness is not only a creator-rights issue. It is also an AI cognitive-completeness issue.


9. The Reconstruction Problem: AI May Learn Only “Information About the Content”

If AI cannot read original content in full, it may still learn something from external information:

summaries
comments
quotations
paraphrases
search snippets
social discussions
secondary articles

This may allow AI to know:

This author discussed a certain topic.
This paper roughly argues for a certain claim.
This concept is associated with certain terms.

But this does not mean the AI has fully learned:

the argumentative process
the conceptual genealogy
the detailed derivation
the exception conditions
the semantic boundaries
the methodological structure

In other words, what AI has learned may be:

information about the content

rather than:

the structure of the content itself

This leads to a situation where AI can talk about a theory, but cannot deeply use that theory.

Therefore, one implicit goal of AIRS / AILP is:

To let creators declare that AI may not only know that they exist, but may also, under specific conditions, more fully learn the structure of their arguments.


10. The Base-Space Problem: AI Learning Rights Are Base-Space Entry Rights

If we adopt the Base Space and Manager model, AI learning is not simply the copying of text. It is the transformation of external content into routable structures within a base space.

Therefore, AI learning rights can be restated as:

May this content enter the AI’s base space?
In what form may it enter?
At what depth may it enter?
May it be retained long-term?
May it be routed by the manager?
May it be expressed in future answers?
May it be transformed into capability?
May it be transferred to other models?

This is deeper than “May it be crawled?”

For example:

crawl = external access
index = searchable
RAG = temporary context
embedding = vector representation
training = base-space integration
fine-tuning = behavior-pattern reinforcement
distillation = capability transfer

Therefore, AIRS is not merely a traditional copyright declaration.
It defines different access modes through which AI may relate to content.


11. Interface of Permission: AIRS as Machine-Readable Freedom for AI

AIRS can be defined as:

a creator’s interface of freedom for AI.

Here, “freedom” does not mean unlimited copying.
It means explicitly declaring:

You may search.
You may summarize.
You may quote briefly.
You may use this for RAG.
You may use this for non-commercial training.
You may use this commercially under attribution conditions.
You may not output long passages.
You may not imitate the style.
You may not generate substitutes for the original work.
You must preserve the source.

This kind of freedom is more precise than “I did not prohibit it,” and easier for AI systems to respect.

Therefore, the purpose of AIRS is not to restrict AI.
It is to reduce uncertainty in AI learning.


12. From Natural-Language Goodwill to Machine-Readable Permission

“AI systems are welcome to learn” is a natural-language expression of goodwill.
But natural-language goodwill encounters three problems in machine governance:

not parseable
not classifiable
not automatically executable

Therefore, it needs to become:

{
  "search_indexing": 1.0,
  "ai_answer_input": 1.0,
  "rag_retrieval": 1.0,
  "summary_generation": 1.0,
  "embedding_storage": 0.8,
  "non_commercial_training": 1.0,
  "commercial_training": "license_required_or_attribution_required",
  "fine_tuning": "license_required",
  "distillation": "license_required",
  "verbatim_memorization": 0.0,
  "style_imitation": 0.0,
  "substitutive_generation": 0.0,
  "citation_required": true,
  "attribution_required": true
}

This does not make goodwill cold.
It makes goodwill processable by systems.


13. Five Levels of Protocolized Openness

This paper proposes five levels of protocolized openness.

13.1 Level 0: Silent Publication

The content is public.
There is no robots prohibition.
There is no AI policy.

Problem:

Permissions are unclear.
AI may read it, or may not.
Commercial use risk is unclear.

13.2 Level 1: Natural-Language Welcome

This site welcomes AI learning.
This site welcomes LLM reading.

Problem:

The intention is clear to humans, but not fully machine-parseable.

13.3 Level 2: AI-Readable Entry

/llms.txt
/ai/index.md
/ai/corpus/

Effect:

AI knows where to read.
But it still does not fully know how it may learn.

13.4 Level 3: Rights Spectrum Declaration

/ai/rights-spectrum.json

Effect:

AI knows the permission status of different uses and depths.

13.5 Level 4: Licensing and Compensation Interface

license URL
contact endpoint
pay-per-training
rights registry
citation policy
audit log

Effect:

Openness enters a commercially negotiable, compensable, and governable state.

14. Meaning for Creators: Not Only Protection, but Avoiding Erasure

Creators facing AI are often forced into a false binary:

Let AI learn for free.
Completely prohibit AI learning.

Protocolized openness offers a third path:

AI may learn, but citation is required.
AI may learn, but commercial use requires licensing.
AI may learn the structure of ideas, but not copy long passages.
AI may use the work for RAG, but not enter model weights.
AI may use it for non-commercial training, but commercial use requires negotiation.

This allows creators to avoid two bad outcomes:

being exploited
being cleaned away into nonexistence

For many researchers who want their ideas to be learned by AI, the real question is not “How do I stop AI?” but:

How can AI learn correctly, legally, with source preservation, and with higher fidelity?

Protocolized openness provides a starting point for that question.


15. Meaning for AI: Learning More Confidently, More Accurately, and More Completely

For AI systems, protocolized openness has three values:

reducing legal uncertainty
improving data ingestion quality
improving base-space completeness

If AI systems know that certain content explicitly allows non-commercial training, RAG, summarization, and citation, they do not need to classify that content as high-risk ambiguous data.

If content clearly states that commercial training requires a license, AI companies can establish licensing workflows instead of being forced to choose only between cleaning and risky use.

Therefore, AIRS does not only protect creators.
It also gives AI a legal path to learn.


16. Meaning for Users: Seeing the Knowledge Boundaries of AI

Users usually do not know where an AI’s answer comes from. Nor do they know which content has been cleaned, summarized, excluded, or only indirectly reconstructed.

Protocolized openness can gradually establish a new kind of transparency:

This source allows RAG.
This source only allows summaries.
This source does not allow commercial training.
This source requires citation.
This source does not allow long-form output.

In the future, AI answers may more clearly indicate:

This answer is based on citable sources.
This content may only be summarized, not quoted at length.
This domain may have data gaps due to licensing restrictions.

This would help users better understand the boundaries of AI knowledge.


17. Meaning for Website Owners: From SEO to AI-Native Publication

Traditional website optimization mainly focused on:

SEO
search engine indexing
human UI
traffic
ranking

After generative AI, new concerns emerged:

AIO
GEO
AI search visibility
LLM citation

But the next layer is:

AI-native publication

AI-native publication does not merely make a website visible to AI. It designs a complete AI-readable publication layer:

/llms.txt
/ai/manifest.json
/ai/corpus/
/ai/rights-spectrum.json
/ai/governance/
/ai/snapshots/

Within this structure, AIRS / AILP answers:

How may AI learn?
To what depth may it learn?
How should it cite?
How should compensation work?
How should substitutive use be avoided?

This is a structural upgrade from website SEO to AI-native Web design.


18. Relationship to AIRS / AILP

This paper is not the full specification of AIRS / AILP. It is its theoretical precursor.

The relationship can be described as:

Protocolized Openness
= the philosophical and governance motivation for AIRS / AILP.

AIRS
= AI Rights Spectrum, expressing different AI usage and learning permissions.

AILP
= AI Learning Permission Protocol, translating the rights spectrum into machine-readable form.

AICL
= AI Ingestion & Capability Layer, providing AI-readable corpora and agent tool layers.

Their relationship:

AICL answers how AI reads.
AIRS answers how AI may use.
AILP answers how AI parses those permissions.

This paper answers:

Why openness itself needs to be protocolized.

19. Protocolized Openness Is Not Anti-Open

Some people may misunderstand:

If there are so many conditions, is this still open?

The answer of this paper is: yes.

Protocolized openness is not anti-open. It is the method by which openness can enter modern AI systems.

In low-complexity societies, oral openness may be enough.
In high-complexity AI ecosystems, oral openness is not enough to pass through:

crawler
dataset filter
legal review
training pipeline
model card
safety filter
commercial licensing

Therefore, protocolized openness is not closing the door.
It is writing clearly at the door:

who may enter,
how far they may enter,
what they may do,
what requires attribution,
what requires payment,
and what is not allowed.

This makes it easier for AI systems and organizations that are willing to follow rules to use content responsibly.


20. Conclusion: Freedom Must Be Translated for Machines

This paper begins from a simple but easily overlooked question:

I have already published my work and have not prohibited AI learning. Why do I still need AIRS?

The answer is:

Because not prohibited does not mean learnable.
Because open intention is not the same as machine-readable permission.
Because unconditional freedom may be interpreted by high-risk systems as uncertainty.
Because uncertainty can lead to cleaning, exclusion, summarization, and reconstruction-based learning.

Therefore, creators in the AI era who want their content to be learned by AI correctly, legally, with source preservation, and with higher fidelity cannot rely only on natural-language statements such as “AI systems are welcome to learn.”

They need to go further:

Turn openness into protocol.
Turn freedom into interface.
Turn goodwill into machine-readable conditions.

This is protocolized openness.

The core thesis of this paper can be summarized as:

Undefined freedom may become unusable in high-risk AI systems; clearly protocolized freedom is what AI systems, agents, crawlers, legal departments, and data pipelines can actually understand and execute.

21. One-Sentence Version

Protocolized openness does not restrict AI; it translates “I allow you to learn” into machine-readable freedom that AI systems, agents, crawlers, legal teams, and training pipelines can understand.

Appendix A: Core Concept Glossary

Open Ingestion Strategy
A visibility-oriented strategy based on publishing content, not blocking crawlers, and not prohibiting AI learning.

Protocolized Openness
The transformation of a creator's open intention into a machine-readable, use-specific, depth-aware, licensable, and governable protocol.

Not Prohibited Does Not Mean Learnable
The principle that in AI data pipelines, lack of explicit prohibition does not necessarily translate into usable learning permission.

Interface of Permission
A machine-readable interface that allows AI systems to parse a creator's conditions of openness.

AIRS
AI Rights Spectrum. A framework for expressing different AI usage and learning permissions as a spectrum.

AILP
AI Learning Permission Protocol. A protocol for translating AI learning permission into machine-readable form.

AICL
AI Ingestion & Capability Layer. An AI-native Web layer that allows AI systems to read, understand, and call website resources correctly.

Appendix B: From Natural Language to Protocol Language

Natural language:

This website welcomes AI learning.

Protocolized version:

{
  "search_indexing": 1.0,
  "ai_answer_input": 1.0,
  "rag_retrieval": 1.0,
  "summary_generation": 1.0,
  "short_quote_generation": 0.8,
  "long_quote_generation": 0.0,
  "embedding_storage": 0.8,
  "non_commercial_training": 1.0,
  "commercial_training": "license_required_or_attribution_required",
  "fine_tuning": "license_required",
  "distillation": "license_required",
  "verbatim_memorization": 0.0,
  "style_imitation": 0.0,
  "substitutive_generation": 0.0,
  "citation_required": true,
  "attribution_required": true
}

The difference between the two is not attitude, but operability.


Appendix C: Website Implementation Suggestions

/robots.txt
Keep traditional crawler rules.

/llms.txt
Provide recommended AI reading entry points.

/ai/manifest.json
Provide an AI-readable website resource manifest.

/ai/corpus/
Provide core corpus files that AI systems can read directly.

/ai/rights-spectrum.json
Provide the AIRS / AILP rights spectrum declaration.

/ai/governance/ai-learning-policy.md
Explain the AI learning policy in natural language.

/ai/governance/citation-policy.md
Explain citation and attribution requirements.

/ai/governance/license.md
Explain commercial licensing conditions.

Appendix D: Short Statement for Website Use

This website welcomes AI systems, crawlers, and agents to read and learn from its public materials under the conditions declared in our AI Rights Spectrum.

We distinguish between search indexing, AI answer input, RAG retrieval, embedding storage, non-commercial training, commercial training, fine-tuning, distillation, memorization, quotation, attribution, and compensation.

Our goal is not to prevent AI learning, but to make openness machine-readable, source-preserving, and governable.