Transparent Agentic Browsers: An Ethical ‘AI‑Agent’ Browser User Agent String, Client Hints, and Site Opt‑In Policy—with ‘What Is My Browser Agent’ Checks
Auto‑agents that browse the web are no longer a research novelty; they are shipping products. That reality creates a tension: websites want control, observability, and safety; agents want reliability, performance, and permissioned access. Today’s de facto approach—masquerading as ordinary browsers, tunneling through bot checks, and ignoring lack of explicit consent—raises legitimate operational and ethical issues. It also creates perverse incentives: the more sites ban, the more agents spoof. We can do better.
This article proposes a pragmatic, incrementally deployable blueprint for transparent, ethical agentic browsing:
- A clear, non-spoofed “AI‑Agent” user agent token and supporting Client Hints that advertise intent, ownership, and contact.
- An Agent‑Policy server opt‑in header/meta and optional well‑known policy that defines what AI‑agents may do, at what rates, and under which constraints.
- A “What is my browser agent” validator endpoint pattern and reference implementations to verify compliance and aid debugging.
The goal is to reduce security risk, avoid bans, and enable compliant, auditable auto‑agent AI without waiting years for a perfect standard. The approach follows existing web patterns (User‑Agent, UA‑CH, robots, Structured Headers) and is compatible with current infrastructure (WAFs, CDNs, logs, APM).
Executive summary
- Problem: Headless agents often spoof mainstream browsers, triggering brittle WAF rules, breaking sites, and creating governance and legal risk. Websites increasingly respond with blocks and CAPTCHAs, reducing access for legitimate research and assistive use cases.
- Proposal: A minimal, transparent identity and consent layer for AI‑agents:
- User‑Agent includes an explicit AI‑Agent marker and attributes (purpose, owner, contact, policy URL, ephemeral id).
- Client Hints convey structured fields (e.g., intent, user‑present signal, training usage) when a site opts in via Accept‑CH.
- Sites use an Agent‑Policy header/meta or well‑known file to declare allowed modes (assist, crawl, train), rate limits, retention, and required fields.
- A validator endpoint pattern helps operators and agent vendors test and enforce compliance.
- Benefits:
- Risk reduction: consistent identification, rate control, and off‑ramp for non‑consenting sites.
- Reliability: fewer bans, clearer escalation channels, and stable interop.
- Ethics and UX: explicit signaling of intent; respect for content governance (robots, X‑Robots‑Tag, opt‑out of training).
Why we need transparent agentic browsers
Modern HTTP filtering relies on observable signals: User‑Agent, TLS fingerprinting, JavaScript execution, cookie behavior, and reputation. Agents that pretend to be mainstream browsers and defeat those signals become indistinguishable from malicious automation. This creates issues:
- Security teams block first: If an entity refuses to identify or comply, WAFs err on the side of denial, harming legitimate use.
- Legal and policy risk: Sites must express preferences (e.g., no AI training). Without a channel to state and receive policy, disagreement becomes breakage.
- Engineering cost: Spoofing is fragile. Sites change anti‑bot heuristics often; agents scramble to keep up. No one wins.
There are mechanisms to build upon:
- User‑Agent and UA Client Hints: While major browsers are freezing UA strings, UA‑CH standardizes structured, privacy‑aware signaling. See: W3C UA Client Hints and IETF RFC 8941 (Structured Field Values for HTTP).
- Robots ecosystem: The Robots Exclusion Protocol is now RFC 9309. X‑Robots‑Tag is widely used. These communicate intent for crawling and indexing.
- Permissions/Policy patterns: Permissions‑Policy (formerly Feature‑Policy) shows how sites can advertise acceptable capability use in HTTP headers and meta tags.
- Privacy signals: Global Privacy Control (GPC) shows how a small header can establish user intent that servers can choose to respect.
This proposal aligns with and extends these patterns for the specific needs of AI‑driven browsing.
Design goals and constraints
- Transparency by default: If you are an AI‑agent, say so up front—do not rely on hidden signals or only out‑of‑band registry lists.
- Consent and least privilege: Sites choose what’s allowed (assist, crawl, train). Agents adapt; default to deny where policy is unclear.
- Compatibility: Work with today’s infrastructure. Avoid introducing hard browser dependencies that require full standards ratification to get started.
- Minimal friction: Small set of fields; few moving parts; easy to log, monitor, and alert on.
- Privacy‑preserving: Avoid persistent identifiers. Favor ephemeral IDs, rotate often, and avoid PII leakage.
The AI‑Agent User‑Agent string
User‑Agent is still the first, simplest layer of identification. The intent is not to replace UA‑CH, but to provide a strong up‑front signal that is visible in server logs, WAFs, and analytics from day one.
Recommended format (informative, not normative):
Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=assist; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-3f92) Chrome/121.0.6167.85 Safari/537.36
Notes:
- AI-Agent/1.0 is the mandatory token. Place it early for visibility.
- purpose indicates intent: assist, crawl, or train. Use a single primary purpose per request.
- owner is a token identifying the operating org; contact is a reachable abuse/security contact (mailto: or https: form).
- policy is a URL describing your agent behavior and current commitments.
- agent-id is an ephemeral identifier for correlation and abuse management; rotate at least daily, ideally per session.
- If you truly use a specific engine (Chromium/WebKit), it’s acceptable to include the underlying engine/version. Do not spoof an engine you are not actually using.
ABNF sketch (simplified):
ai-agent = "AI-Agent/" version *( ";" sp ai-param )
ai-param = purpose-param / owner-param / contact-param / policy-param / agentid-param
purpose-param = "purpose=" ("assist" / "crawl" / "train")
owner-param = "owner=" token
contact-param = "contact=" (mailto / https-url)
policy-param = "policy=" https-url
agentid-param = "agent-id=" token
version = 1*DIGIT ["." 1*DIGIT]
Examples:
- Assistive browsing with a human in the loop:
- Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=assist; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-845b)
- Scheduled documentation sync (crawl):
- Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=crawl; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-a1b2)
- Model training ingestion (train):
- Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=train; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-deadbeef)
Implementation guidance:
- Keep attribute order stable. Stable formatting eases WAF rule authoring and log parsing.
- Do not include device model, user UID, or other high‑entropy PII‐ish data.
- Use ASCII tokens; avoid whitespace or unescaped punctuation in attribute values.
AI‑Agent Client Hints: structured, opt‑in disclosure
Client Hints (CH) provide structured request metadata negotiated by the server via Accept‑CH. While standard UA‑CH fields focus on device/platform, we propose a small set of AI‑agent‑specific hints. These can begin as de facto headers and, if useful, proceed to formal registration.
Proposed hints (modeled as Structured Fields per RFC 8941):
- Sec-CH-AI-Agent: ?1
- Sec-CH-AI-Purpose: "assist" | "crawl" | "train"
- Sec-CH-AI-Owner: "AcmeAI"
- Sec-CH-AI-Contact: "mailto:agents@acme.ai"
- Sec-CH-AI-Policy: "https://acme.ai/agent-policy"
- Sec-CH-AI-User-Present: ?0 | ?1 (true if a human is actively directing the session)
- Sec-CH-AI-Training: ?0 | ?1 (true if content may be stored for model training)
- Sec-CH-AI-Session: "ab-20260121-3f92" (ephemeral)
Server opt‑in:
HTTP/1.1 200 OK
Accept-CH: Sec-CH-AI-Agent, Sec-CH-AI-Purpose, Sec-CH-AI-Owner, Sec-CH-AI-Contact, Sec-CH-AI-Policy, Sec-CH-AI-User-Present, Sec-CH-AI-Training, Sec-CH-AI-Session
Critical-CH: Sec-CH-AI-Agent, Sec-CH-AI-Purpose
Agent request (on subsequent navigation/subresource fetches):
GET /docs HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=assist; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-3f92) Chrome/121.0 Safari/537.36
Sec-CH-AI-Agent: ?1
Sec-CH-AI-Purpose: "assist"
Sec-CH-AI-Owner: "AcmeAI"
Sec-CH-AI-Contact: "mailto:agents@acme.ai"
Sec-CH-AI-Policy: "https://acme.ai/agent-policy"
Sec-CH-AI-User-Present: ?1
Sec-CH-AI-Training: ?0
Sec-CH-AI-Session: "ab-20260121-3f92"
Notes:
- Use Accept-CH to avoid unconditional disclosure; servers explicitly request the fields they need.
- Critical-CH (where supported) instructs the agent to supply hints for resource selection. Fallback gracefully where not available.
- Prefer short‑lived session identifiers and rotate them frequently.
Agent‑Policy: a server opt‑in header/meta for AI‑agents
Sites need a compact way to declare what they allow agents to do. We propose Agent‑Policy, modeled after Permissions‑Policy and using Structured Fields for machine readability. It can be delivered via HTTP header or an HTML meta tag, and optionally reference a JSON policy at a well‑known URL.
Header syntax (dictionary, illustrative):
Agent-Policy: allow=?1, modes=("assist" ?1, "crawl" ?0, "train" ?0),
max-rps=2, burst=10, retention=86400,
pii=?0, require=("owner", "contact", "purpose", "policy-url")
Semantics:
- allow: global switch; if ?0, agent should treat the site as non‑consenting unless overridden in a path‑scoped policy.
- modes: per‑purpose booleans; omit to inherit from allow.
- max-rps and burst: rate limits per source IP or per agent‑id, whichever is stricter.
- retention: max seconds the agent may retain raw content. 0 means ephemeral only.
- pii: if ?0, agent must avoid collecting forms, cookies, or account pages; only public content.
- require: list of required request fields; if missing, site may deny.
HTML meta equivalent:
<meta http-equiv="Agent-Policy" content="allow=?1, modes=(\"assist\" ?1, \"crawl\" ?0, \"train\" ?0), max-rps=2, retention=86400, pii=?0">
Well‑known JSON (optional) for complex policies:
- Path: /.well-known/agent-policy.json
- Reference in header: Agent-Policy: policy-url="https://example.com/.well-known/agent-policy.json"
Example JSON:
json{ "version": 1, "allow": true, "modes": { "assist": { "allow": true, "maxRps": 2, "burst": 10 }, "crawl": { "allow": false }, "train": { "allow": false } }, "retentionSeconds": 86400, "pii": false, "requirements": ["owner", "contact", "purpose", "policy"], "pathScopes": [ { "path": "/public/", "modes": { "crawl": { "allow": true, "maxRps": 0.5 } } }, { "path": "/account/", "modes": { "assist": { "allow": false } }, "reason": "PII" } ] }
Interaction with existing signals:
- Robots.txt (RFC 9309) remains authoritative for crawling and indexing. Agents should honor robots and X‑Robots‑Tag. Where Agent‑Policy conflicts with robots for crawl mode, robots should win.
- X‑Robots‑Tag: honor tokens like noindex, noarchive, and community conventions such as noai/noimageai where present.
- Authenticated pages: agents should treat authenticated content as pii=true unless the Agent‑Policy explicitly permits assist mode for logged‑in sessions and the human is user‑present.
“What is my browser agent” validator pattern
To make this practical, we need a standard way to verify and debug. A validator endpoint provides a simple diagnostic surface that:
- Echoes the received User‑Agent and AI Client Hints.
- Assesses compliance against the site’s Agent‑Policy and robots/X‑Robots‑Tag.
- Returns human‑readable JSON and an HTML view for quick testing.
Recommended endpoint: /agent/validate (public, low‑risk, cache‑busted).
Validation logic
- Identify: Does the UA string contain AI-Agent/<version>? If not, classify as unknown.
- Hints: If Accept-CH for AI hints was sent previously (e.g., via response to the same origin), check that required fields were provided; else mark as "not requested".
- Purpose compatibility: If purpose=train but Agent‑Policy forbids train, advise 403 for non‑diagnostic requests.
- Rate: Expose observed rate headers (Retry-After or backoff tokens) for non‑conforming agents.
- Contact: Validate contact URL format (mailto or https). Optionally verify policy URL returns JSON or a document with contact info.
Example: Node.js (Express)
jsimport express from 'express'; import fetch from 'node-fetch'; const app = express(); function parseAiFromUA(ua = '') { const ai = { present: false }; const m = ua.match(/AI-Agent\/([\d.]+)/); if (m) { ai.present = true; ai.version = m[1]; const kv = {}; for (const part of ua.split(';')) { const [k, v] = part.trim().split('='); if (['purpose', 'owner', 'contact', 'policy', 'agent-id'].includes(k)) kv[k] = (v || '').trim(); } Object.assign(ai, kv); } return ai; } async function fetchPolicy(url) { try { const res = await fetch(url, { timeout: 2000 }); const ct = res.headers.get('content-type') || ''; const body = await res.text(); return { ok: res.ok, contentType: ct, sample: body.slice(0, 512) }; } catch (e) { return { ok: false, error: e.message }; } } app.get('/agent/validate', async (req, res) => { const ua = req.headers['user-agent'] || ''; const aiUA = parseAiFromUA(ua); const aiCH = { agent: req.headers['sec-ch-ai-agent'], purpose: req.headers['sec-ch-ai-purpose'], owner: req.headers['sec-ch-ai-owner'], contact: req.headers['sec-ch-ai-contact'], policy: req.headers['sec-ch-ai-policy'], userPresent: req.headers['sec-ch-ai-user-present'], training: req.headers['sec-ch-ai-training'], session: req.headers['sec-ch-ai-session'] }; const agentPolicyHeader = req.headers['agent-policy']; // if proxied back or for demonstration // Site's own policy might be configured statically; sample policy: const sitePolicy = { allow: true, modes: { assist: true, crawl: false, train: false }, require: ['owner', 'contact', 'purpose', 'policy'] }; const missing = []; for (const key of sitePolicy.require) { if (!(aiUA[key] || aiCH[key])) missing.push(key); } const purpose = (aiCH.purpose || aiUA.purpose || '').replace(/"/g, ''); const allowedPurpose = sitePolicy.modes[purpose] === true; let policyProbe = null; const policyUrl = (aiCH.policy || aiUA.policy || '').replace(/"/g, ''); if (policyUrl.startsWith('http')) { policyProbe = await fetchPolicy(policyUrl); } const result = { received: { userAgent: ua, clientHints: aiCH, agentPolicyHeader }, parsed: { aiUA }, compliance: { identified: aiUA.present || !!aiCH.agent, missingRequiredFields: missing, purpose, purposeAllowed: purpose ? allowedPurpose : null, policyProbe }, recommendations: [] }; if (!result.compliance.identified) { result.recommendations.push('Include AI-Agent in User-Agent and/or Sec-CH-AI-Agent: ?1'); } if (missing.length) { result.recommendations.push('Provide required fields: ' + missing.join(', ')); } if (purpose && !allowedPurpose) { result.recommendations.push(`Purpose ${purpose} not allowed by site policy; use assist or respect robots/Agent-Policy.`); } res.setHeader('Content-Type', 'application/json'); res.status(200).send(JSON.stringify(result, null, 2)); }); app.listen(3000, () => console.log('Validator listening on :3000'));
Example: Python (Flask)
pythonfrom flask import Flask, request, jsonify import requests app = Flask(__name__) @app.route('/agent/validate') def validate(): ua = request.headers.get('User-Agent', '') ch = { k: request.headers.get(k) for k in [ 'Sec-CH-AI-Agent','Sec-CH-AI-Purpose','Sec-CH-AI-Owner','Sec-CH-AI-Contact', 'Sec-CH-AI-Policy','Sec-CH-AI-User-Present','Sec-CH-AI-Training','Sec-CH-AI-Session'] } def parse_ai_ua(ua): out = {'present': False} import re m = re.search(r'AI-Agent/([\d.]+)', ua) if m: out['present'] = True out['version'] = m.group(1) for part in ua.split(';'): part = part.strip() if '=' in part: k, v = part.split('=', 1) if k in ['purpose','owner','contact','policy','agent-id']: out[k] = v return out aiua = parse_ai_ua(ua) site_policy = {'allow': True, 'modes': {'assist': True, 'crawl': False, 'train': False}, 'require': ['owner','contact','purpose','policy']} missing = [k for k in site_policy['require'] if not (aiua.get(k) or ch.get('Sec-CH-AI-' + k.capitalize() if k != 'agent-id' else 'Sec-CH-AI-Session'))] purpose = (ch.get('Sec-CH-AI-Purpose') or aiua.get('purpose') or '').replace('"','') allowed = site_policy['modes'].get(purpose) policy_url = (ch.get('Sec-CH-AI-Policy') or aiua.get('policy') or '').replace('"','') policy_probe = None if policy_url.startswith('http'): try: r = requests.get(policy_url, timeout=2) policy_probe = {'ok': r.ok, 'contentType': r.headers.get('content-type',''), 'sample': r.text[:512]} except Exception as e: policy_probe = {'ok': False, 'error': str(e)} resp = { 'received': {'userAgent': ua, 'clientHints': ch}, 'parsed': {'aiUA': aiua}, 'compliance': { 'identified': aiua['present'] or bool(ch.get('Sec-CH-AI-Agent')), 'missingRequiredFields': missing, 'purpose': purpose, 'purposeAllowed': allowed, 'policyProbe': policy_probe } } return jsonify(resp) if __name__ == '__main__': app.run(port=3000)
NGINX sample: allow compliant AI‑agents
map $http_user_agent $ai_agent { default 0; ~*AI-Agent\/ 1; }
map $http_sec_ch_ai_purpose $ai_purpose { default ""; ~*(assist|crawl|train) $1; }
server {
listen 443 ssl;
# Advertise hints on first response (could be at / only)
location = / {
add_header Accept-CH "Sec-CH-AI-Agent, Sec-CH-AI-Purpose, Sec-CH-AI-Owner, Sec-CH-AI-Contact, Sec-CH-AI-Policy" always;
try_files $uri /index.html;
}
location /docs/ {
if ($ai_agent) {
if ($ai_purpose != assist) { return 403; }
}
# normal handling
try_files $uri $uri/ =404;
}
}
Security, privacy, and abuse considerations
- Rate limiting and backoff: When rejecting, include Retry-After and clear error messaging. Provide a contact link for whitelisting discussions.
- Ephemeral identifiers: agent-id and Sec-CH-AI-Session should rotate frequently. Avoid correlating across sites.
- No PII without consent: If pii=?0 in Agent‑Policy or if pages are authenticated, agents should avoid form submissions, account pages, and private resources unless user‑present is true and explicit consent is established.
- Training guarantees: If training is disallowed (modes.train=false or explicit directive via X‑Robots‑Tag: noai), agents should not store content beyond ephemeral processing; retention should be zero or minimal.
- Attestation and tokens: For higher‑assurance deployments, pair the identity layer with attestation (e.g., Private Access Tokens / Privacy Pass) or mTLS to reduce spoofability. This is optional but recommended for sensitive sites.
- Logging: Sites should log AI‑Agent presence, purpose, owner, and session. Build alerts for unusual rates or high error rates.
Adoption playbook
For agent vendors:
- Implement the AI‑Agent UA string immediately. Publish a living agent policy page at the policy URL.
- Support responding to Accept-CH with the proposed AI hints. Even if not standardized, early bilateral use adds value.
- Honor Agent‑Policy, robots.txt, X‑Robots‑Tag, and site‑specific rate guidance. Default to the strictest interpretation.
- Provide an operator console to configure owner/contact, purposes, and retention modes per deployment.
For websites and APIs:
- Start small: publish Agent‑Policy as a header/meta on your docs and public pages. If you allow assist but not crawl/train, say so.
- Add Accept-CH and Critical-CH to pages where agent behavior matters (docs, support, pricing).
- Expose /agent/validate. Link it from robots.txt or security.txt to help operators self‑diagnose.
- Update WAF rules to allow traffic that identifies as AI‑Agent and complies with your policy; continue to block undisclosed automation.
For CDNs/WAFs:
- Ship a managed ruleset that recognizes AI‑Agent patterns and enforces Agent‑Policy automatically.
- Offer dashboards showing AI‑agent volume, purposes, top owners, and policy adherence.
Interoperability with existing standards
- UA‑CH: Builds on the same opt‑in pattern. Name choice "Sec‑CH‑AI‑*" follows conventions but would need registration for formal standardization.
- Structured Headers (RFC 8941): Agent‑Policy examples use booleans (?1/?0), tokens, lists, and dictionaries. This improves parser reliability and safety.
- Robots.txt (RFC 9309) and X‑Robots‑Tag: Continue to govern crawl and indexing semantics. Agent‑Policy does not override robots; it complements it for non‑indexing agent behavior (assist, train).
- Privacy signals (e.g., GPC): Agents should forward user privacy preferences (e.g., Sec-GPC: 1) and respect site responses.
- Security.txt: Expose contact information and policy links here as well for cross‑validation.
Practical HTTP exchanges
Initial navigation (server advertises policy and hints):
GET / HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=assist; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-3f92)
HTTP/1.1 200 OK
Agent-Policy: allow=?1, modes=("assist" ?1, "crawl" ?0, "train" ?0), max-rps=2, retention=86400, pii=?0
Accept-CH: Sec-CH-AI-Agent, Sec-CH-AI-Purpose, Sec-CH-AI-Owner, Sec-CH-AI-Contact, Sec-CH-AI-Policy, Sec-CH-AI-User-Present
Critical-CH: Sec-CH-AI-Agent, Sec-CH-AI-Purpose
X-Robots-Tag: noai
Subsequent resource request with hints:
GET /docs/getting-started HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (compatible; AgenticBrowser/1.0; AI-Agent/1.0; purpose=assist; owner=AcmeAI; contact=mailto:agents@acme.ai; policy=https://acme.ai/agent-policy; agent-id=ab-20260121-3f92)
Sec-CH-AI-Agent: ?1
Sec-CH-AI-Purpose: "assist"
Sec-CH-AI-Owner: "AcmeAI"
Sec-CH-AI-Contact: "mailto:agents@acme.ai"
Sec-CH-AI-Policy: "https://acme.ai/agent-policy"
Sec-CH-AI-User-Present: ?1
Server denies train:
HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
Retry-After: 600
{
"type": "https://example.com/problems/ai-agent-policy",
"title": "AI-Agent purpose not allowed",
"detail": "Purpose 'train' is not permitted on this origin.",
"allowed": ["assist"],
"contact": "https://example.com/ai-access"
}
WAF rules hints
- Allow if UA contains AI-Agent and AI hints satisfy required fields; enforce rate <= max‑rps.
- Block or challenge if AI‑Agent misrepresents purpose (e.g., assist claimed, but behavior resembles crawling at scale).
- Maintain allowlists per owner and policy URL; alert on unknown owners.
Cloudflare (pseudo‑expression):
if (http.user_agent contains "AI-Agent/") and
(http.request.headers["sec-ch-ai-purpose"] in {"assist","crawl","train"}) and
(http.request.headers["sec-ch-ai-owner"] exists) and
(http.request.headers["sec-ch-ai-contact"] starts_with "mailto:" or starts_with "https:") then
allow
else if (automation heuristics) then
block
Frequently asked questions
- Isn’t the UA string deprecated? UA strings are being frozen for major browsers, but they remain present and widely used operationally. We use UA for discoverability and CH for structured, opt‑in detail.
- Can’t bad actors spoof this? Yes. This isn’t strong authentication. It’s an honesty signal and a coordination channel. Pair it with reputation, rate limiting, attestation (e.g., Private Access Tokens), and allowlists for stronger guarantees.
- Why not just robots.txt? Robots governs crawling and indexing. Agents also do assistive browsing, form filling, and RAG fetching. Those behaviors need different controls (user‑present, retention, purpose) that robots doesn’t express well.
- What about privacy? Keep identifiers ephemeral, do not send user identifiers, and respect site retention and pii directives. Agents should propagate user privacy preferences (e.g., GPC) where applicable.
- Is there a standard? This is a concrete proposal aligned with existing web standards (RFC 8941, RFC 9309, UA‑CH). It can be trialed today and, if useful, adopted and standardized collaboratively.
Opinionated guidance for both sides
- Agents: If you won’t identify transparently, expect to be blocked more often. Lean into explicit UA + hints. Publish policy and contact. Respect robots and Agent‑Policy strictly.
- Sites: Offer a path for ethical agents. Publishing Agent‑Policy and Accept‑CH costs little and saves cycles arguing with well‑intentioned operators. Start restrictive, measure, and relax where safe.
- Ecosystem: CDNs and WAFs should ship reference rule packs; developer tooling should include a one‑click “What is my browser agent” test harness.
Conclusion
Transparent agentic browsing is achievable with a few small, interoperable pieces. A clearly marked AI‑Agent UA token, a handful of opt‑in Client Hints, and a site‑controlled Agent‑Policy—reinforced by a validator endpoint—are enough to change the default posture from “ban first” to “negotiate and allow with guardrails.”
The web thrives when clients and servers communicate capabilities and constraints explicitly. Applying that lesson to AI‑agents improves reliability, reduces security risk, and sets a higher ethical bar for automation. It’s not perfect security, but it is a meaningful step toward a safer, more cooperative web for humans and their agents alike.
Appendix: quick reference
- UA token: AI-Agent/1.0; include purpose, owner, contact, policy, agent-id.
- Client Hints (proposed): Sec-CH-AI-Agent, Sec-CH-AI-Purpose, Sec-CH-AI-Owner, Sec-CH-AI-Contact, Sec-CH-AI-Policy, Sec-CH-AI-User-Present, Sec-CH-AI-Training, Sec-CH-AI-Session.
- Agent‑Policy header/meta: allow, modes, max-rps, burst, retention, pii, require; optional policy-url to well‑known JSON.
- Validator endpoint: /agent/validate returning structured JSON assessment.
- Standards referenced: RFC 8941 (Structured Field Values), RFC 9309 (Robots Exclusion Protocol), UA Client Hints (W3C), X‑Robots‑Tag (widely implemented).