Buscar:   
Center for Web Research D.C.S. University of Chile

Table of contents

Overview

Downloading and installing

Documentation

Acknowledgements


Crawling Experiments
April-May (2004)

Crawling experiment 1
site summary using RDF

Crawling experiment 2
log analysis


WIRE en Español

Ir a la página principal del proyecto WIRE en español

WIRE - Web Information Retrieval Environment

Overview

The WIRE project is an effort started by the Center for Web Research for creating an application for information retrieval, designed to be used on the Web.

Currently, it includes:
  • A simple format for storing a collection of web documents.
  • A web crawler.
  • Tools for extracting statistics from the collection.
  • Tools for generating reports about the collection.
The main characteristics of the WIRE software are:
  • Scalability: designed to work with large volumes of documents, tested with several million documents.
  • Performance: written in C/C++ for high performance.
  • Configurable: all the parameters for crawling and indexing can be configured via an XML file.
  • Analysis: includes several tools for analyzing, extracting statistics, and generating reports on sub-sets of the web, e.g.: the web of a country or a large intranet.
  • Open-source: code is freely available under a GPL license.

Downloading and installing

The home page of WIRE is http://www.cwr.cl/projects/WIRE/

The latest version can be downloaded from http://www.cwr.cl/projects/WIRE/releases/. Download and unpack the distribution, then follow the installation instructions.

Documentation and support

If you use WIRE, it is advisable to join the wire-crawler@groups.yahoo.com mailing list to receive announcements of new releases.

See online documentation.

See also a PhD. Thesis and publications on the WIRE crawler.


Third-party modules

NOKUBI Takatsugu made a library to access WIRE using SWIG. This is useful if you want to access the collection generated by WIRE using Ruby/Perl/TCL/etc.


Acknowledgements

This project is funded by the Center for Web Research. The Center for Web Research (CWR) is possible thanks to the Millenium Program.

Design:
Programming:

 

Department of Computer Sciences
University of Chile
Blanco Encalada #2120
Santiago, Chile

Millenium Science Initiative Questions/Comments: cwr@dcc.uchile.cl
Last modification:
Search Services in: Go to todocl.cl

The Center for Web Research (CWR) is possible thanks to the Millenium Science Initiative Program
Millenium Science Initiative, Ministry of Planning and Cooperation - Government of Chile


Valid HTML 4.01! Valid CSS!


dcc