|
Third-party software (recommended)
|
In 2011 NIC Brazil created WIRE-Nic, a fork of WIRE that brings some bug fixes and improvements to the original system. Additionally, they developed ConNeCTOR, a software that analyzes websites downloaded with WIRE in order to measure IPv6 adoption, perform geo localization, check HTML standards adherence, among other tasks.
|
|
|
WIRE - Web
Information Retrieval
Environment
|
Overview
The WIRE project
is an effort started by the
Center for Web
Research for creating an application for
information retrieval, designed to be used on
the Web.
Currently, it includes:
-
A simple format for storing a collection of
web documents.
-
A web crawler.
-
Tools for extracting statistics from the
collection.
-
Tools for generating reports about the
collection.
The main characteristics of
the WIRE software are:
-
Scalability: designed to work with large
volumes of documents, tested with several
million documents.
-
Performance: written in C/C++ for high
performance.
-
Configurable: all the parameters for
crawling and indexing can be configured via
an XML file.
-
Analysis: includes several tools for
analyzing, extracting statistics, and
generating reports on sub-sets of the web,
e.g.: the web of a country or a large
intranet.
-
Free software: code is freely
available under a GPL license.
Downloading
and installing
Documentation and support
Third-party software
In 2011 NIC Brazil created WIRE-Nic, a fork of WIRE that brings some bug fixes and improvements to the original system. Additionally, they developed ConNeCTOR, a software that analyzes websites downloaded with WIRE in order to measure IPv6 adoption, perform geo localization, check HTML standards adherence, among other tasks.
Luis Alberto García Hernández created a front-end in Java to configure and execute WIRE, and to visualize/analyze web graphs.
NOKUBI Takatsugu made a library to access WIRE using SWIG. This is useful if you want to access the collection generated by WIRE using Ruby/Perl/TCL/etc.
Acknowledgements
This project is funded by
the Center for Web Research. The Center for Web
Research (CWR) is possible thanks to the
Millenium Program.
Design:
Programming:
|