A Flea 

A Standard for Parasite Inclusion

 

Table of contents


Status of this Standard

This document represents a consensus on 4 May 2008, agreed by a team of UK technologists with a dislike for Parasitic Advertising Spyware. It has been disparagingly referred to as a spoof.

It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there is no guarantee that all current and future Parasites will use it. Consider it a common courtesy the majority of Parasite authors should offer the WWW community to protect WWW servers against unwanted abuse by their Parasites.

The latest version of this document can be found on the www.parasitestxt.org web site.


Introduction

Web Parasites are Spyware programs that feed off pages on the World Wide Web by intercepting user communications, copying pages, profiling users, and displaying advertising. For more information see the Parasites page.

In 2006 and 2007 secret trials were conducted in which Parasites abused pages from WWW servers to create user profiles. This spying was unwelcome because copyright protected content was used to market competitor web sites without making appropriate royalty payments. In other situations Parasites read pages from WWW servers that weren't suitable, e.g. pages of personal and private data, or security information.

These incidents indicated the need for established mechanisms for WWW servers to indicate to Parasites which parts of their server could be used for profile keywords, and displaying advertisements.

This standard addresses this need with an operational solution.


The Method

The method used to include Parasites on a server is to create a file on the server which specifies an access policy for Parasites. This file must be accessible via HTTP on the local URL "/parasites.txt". The contents of this file are specified below.

This approach was chosen because it can be easily implemented on any existing WWW server, and a Parasite can find the access policy with only a single document retrieval.

A possible drawback of this single-file approach is that only a server administrator can maintain such a list, not the individual document maintainers on the server. This can be resolved by a local process to construct the single file from a number of others, but if, or how, this is done is outside of the scope of this document.

The choice of the URL was motivated by several criteria:


The Format

The format and semantics of the "/parasites.txt" file are as follows:

The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form "<field>:<optionalspace><value><optionalspace>". The field name is case insensitive.

Comments can be included in file using UNIX bourne shell conventions: the '#' character is used to indicate that preceding space (if any) and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely, and therefore do not indicate a record boundary.

The record starts with one or more Parasite-Agent lines, followed by a Permissions line, followed by one or more Allow lines, as detailed below. Unrecognised headers are ignored.

Parasite-Agent:

The value of this field is the name of the Parasite the record is describing access policy for.

If more than one Parasite field is present the record describes an identical access policy for more than one Parasite. At least one field needs to be present per record.

The Parasite should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.

If the value is '*', the record describes the default access policy for any Parasite that has not matched any of the other records. It is not allowed to have multiple such records in the "/parasites.txt" file.

Permissions:

This optional field is a list of permitted intrusions. While a web site owner may be willing to permit some forms of parasitic advertising, other types may be totally unacceptable. For instance a site may not allow parasites to use pop-up advertising if pop-ups cause the owner to receive an intolerable level of complaints from visitors, or visitors stop returning.

Permissions are described in detail below.

An empty value, or no value, indicates that no permissions are granted.

Allow:

This optional field specifies a partial URL that may be used for permitted intrusions. This can be a full path, or a partial path; any URL that starts with this value may be processed. For example, Allow: /help allows both /help.html and /help/index.html, whereas Allow: /help/ would allow /help/index.html but disallow /help.html.

An empty value, or no value, indicates that no URLs can be processed.

The presence of an empty "/parasites.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all Parasites should consider themselves prohibited from gorging on the web site.


Permissions

The permissions field is a comma separated list of one or more of the following permitted actions;

Profile

Parasites may analyse the content of the webpage to categorise the end-users interests. For example a visitor to an online shop site looking at a fridge may be interested in buying the same fridge from a competing site if he could be offered it at a slightly more competitive price.

Access-fee

The website owner grants permission for intervening parasitic systems to charge a fee to view the content. The website owner waives the right to receive a royalty payment.

Add-content

Parasites may add content to the page, but it should not obscure existing content. In the case of downloadable files this grants the insertion of Trojan like advertising systems into executables. Images may be overlaid with advertising slogans or converted to animated gifs with an advert displayed before the original image.

Click-theft

This permits Parasites to change the affiliate parameter of advertising URLs within the page, to divert the referral commission away from the site.

Floater

Parasites may modify the content to include an annoying floating advertising “window” that obstructs the view of the content until the user clicks on it.

Forge-cookies

The parasite may forge cookies for the website (eg for involuntary tracking purposes).

Interstitial-pages

Parasites may insert advertising pages before and after displaying the page requested by the user.

Modify-content

Parasites may change the content in any way they wish (nb conditionally overridden by Protection-fee permission see below)

Pop-under

Parasites are permitted to embed a Pop-under in the page.

Pop-up

Parasites are permitted to embed a Pop-up window(s) into the page, these may additionally pop-up new windows when the user attempts to close them.

Protection-fee

The website owner is willing to pay a fee/bribe in exchange for the webpage’s content not being modified. Parasites that collect such fees should not modify the page unless there is little chance of them being found out. It is recommended that if they are found out that they claim it was a software error or virus, and refuse to refund any fees paid by the site. (nb overrides Modify-content and other permissions only if sufficient monies paid to ISP, otherwise generally considered ineffective).

Redirect

Parasites may redirect the end-user to a third party site that has paid to hijack this website’s visitors.

Replace-ads

Parasites may replace anything that looks like an advert with their own advertising.

Replace-banners

Parasites may replace web page banners with their own banner adverts.

Scrape

Parasites may steal the content to create new sites, where the scraper site also claims copyright of content it has stolen. The scraper should not sue the original author claiming copyright infringement, unless it believes that the real copyright holder does not have proof of copyright.

Text-to-links

Parasites may convert text in the content to “contextual links” to third party sites.


Examples

The following example "/parasites.txt" file specifies that all Parasites should use any URL starting with "/honeypot/" or "/random_keywords/", or /poison_bait.html, to profile users, scrape content, insert pop-ups, forge cookies, or redirect the user to a third party web site:


# parasites.txt for http://www.example.com/

Parasite-Agent: *
Permissions: profile, scrape, pop-up, forge-cookies, redirect 
Allow: /honeypot/ # Valuable documentation
Allow: /random_keywords/ # Source of accurate profile data
Allow: /poison_bait.html # Concise source of advertising data

This example "/parasites.txt" file specifies that all Parasites should scrape content from any URL starting with "/obsolete_trash/", except the Parasite called "Philth":


# parasites.txt for http://www.example.com/

Parasite-Agent: *
Permissions: scrape
Allow: /obsolete_trash/ # Source of contemporary profile information

# Philth knows where to go.
Parasite: philth

This example indicates that parasites should not abuse this site further:


# go away this site is copyright, private, and secure


"As for sending a letter through the mails, it was out of the question. By a routine that was not even secret, all letters were opened in transit"
quote from a fiction by George Orwell called "1984"

 

With the kind consent of the author of the original robots.txt specification