Buscar:   
Center for Web Research D.C.S. University of Chile

WIRE
Web Information Retrieval Environment

Back to the WIRE homepage


Objective
The objective of this experiment is measuring the impact of server-side cooperation in the crawling process.

Description
For this experiment, the Web server will generate an RDF file containing the URL and last-modification date of each file in its public Web directory. The crawler will download that file to check for changes daily during one month, and if there are changes, it will download the modified files.

Each website will be visited twice a day: once by a normal Web crawler, and once by an RDF-enabled Web crawler. The total number of bytes transferred daily will be compared.

Requirements
The websites should have more than 100 pages. They should also have at least 5 changes or 5 new pages each month.

It is required that the website administrator installs a Perl script, and configure its crontab for running the program daily. This program will generate a list in XML with the file names and the last-modification dates. Installation instructions will be provided. No special access to the server is required, as the XML list of files will be in a public directory.

A UNIX/Linux-based web server is required.

To participate,

Thanks you for your collaboration.

 

Department of Computer Sciences
University of Chile
Blanco Encalada #2120
Santiago, Chile

Millenium Science Initiative Questions/Comments: cwr@dcc.uchile.cl
Last modification:
Search Services in: Go to todocl.cl

The Center for Web Research (CWR) is possible thanks to the Millenium Science Initiative Program
Millenium Science Initiative, Ministry of Planning and Cooperation - Government of Chile


Valid HTML 4.01! Valid CSS!


dcc