| WIRE
- Crawling experiment 1: site summary using RDF |
Objective
The objective of this experiment is measuring the impact of
server-side cooperation in the crawling process.
Description
For this experiment, the Web server will generate an RDF file
containing the URL and last-modification date of each file in
its public Web directory. The crawler will download that file
to check for changes daily during one month, and if there are
changes, it will download the modified files.
Each website will be visited twice a day: once
by a normal Web crawler, and once by an RDF-enabled Web crawler.
The total number of bytes transferred daily will be compared.
Requirements
The websites should have more than 100 pages. They should also
have at least 5 changes or 5 new pages each month.
It is required that the website administrator
installs a Perl script, and configure its crontab for
running the program daily. This program will generate a list
in XML with the file names and the last-modification dates.
Installation instructions will be provided. No special access
to the server is required, as the XML list of files will be
in a public directory.
A UNIX/Linux-based web server is required.
To participate,
Thanks you for your collaboration. |