Information for Webmasters

Add a HuriSearch box to your site

You are welcome to add a HuriSearch box to your website. This will make your website more appealing to your visitors and give them another reason to return. It will also help us by making HuriSearch available from more locations.

Several organisations, from all over the world, have already added some very nice searchboxes to their websites:

Rights and Democracy

Antigone

OSCE – ODIHR – Tolerance and Non-Discrimination website

How to add a search box to your site?

Its very simple. All you need to do is add the code below. This code is configured so that the search results appear in a new browser window, making it easier for your visitors to return to your site once they have finished their search.

When you are done, the box should look like this:
hrt

the human rights search engine – searching over 3000 human rights websites

HTML code:


the human rights search engine – searching over 3000 human rights websites

Please ensure the links to the HuriSearch and HURIDOCS homepages are included on your page.

Kindly let us know if you add a HuriSearch box to your Website, by sending an email to hurisearch[at]huridocs.org.

HuriSearch Newsletter

To get regular news about HuriSearch and other HURIDOCS tools, sign up for the free HURIDOCS mailing list.

HuriSearch Crawl

HuriSearch uses a dedicated web crawler to retrieve and index documents from your site based on a starting URL which in most instances will be your domain name (e.g. www.yoursite.org). The crawl conforms to a top-down approach and does not “horizontally” crawl to sites outside the domain specified in the starting URL. Sites mentioned for example in your “Links” page will not be crawled.

The Crawler visits your site periodically. The frequency depends on the type of organisation:

Every 24 hours for the Intergovernmental organisations, national human rights institutions, and academic institutions
Every 8 days for the NGO collection.

The crawler identifies itself as HuriSearchBot. Consult your server logfiles for information regarding its activity.

The crawler is set to comply with the Robots Exlusion Protocol and follow robots.txt instructions. More info here.

Should you have areas on your site you do not wish to have indexed on HuriSearch you may modify your robots.txt to read e.g.:
User-agent: HuriSearchBot
Disallow: /cgi-bin/
Disallow: /privateandconfidentialdata/

This will tell HuriSearch to ignore folders “cgi-bin” and “privateandconfidentialdata” on your webserver. Files in these folders will not be available for searching through HuriSearch. HuriSearch conforms to on-document instructions contained in the HTML Robots Meta Tag. More info here.

HuriSearch ranking

Retrieved documents are ranked on the result list solely on relevance. No paid listings are accepted, no popularity ranking is applied.

Basically, the ranking algorithm weights documents according to the placement, frequency and density of search terms within a document:

When search terms are found in the URL, HTML TITLE, or HTML META tags of an HTML document, or in the Title and other fields of document properties (for Office and PDF files) it will be given a higher weight.
When search terms are found in the HTML Headings of a document it will be given a higher weight.
When search terms are found frequently in a document it will be given a higher weight.
The combination and comparison of these various weightings in a set of results will determine the rank of a given document in a results list.

How to make your pages more retrievable

Webmasters and authors can do many things to make their documents more retrievable. A few tips are provided here:
1. Ensure your documents have a TITLE,
2. Use meaningful and document specific meta data,
3. Ensure that documents have readable filenames,
4. Ensure that your web pages have search engine friendly URLs.

Many search engine optimisation sites on the web provide information about how to make pages more retrievable on the major search engines such as Google, Yahoo, MSN etc. Many of these tips are valid on HuriSearch too but please be aware that many are not. In particular, the number of links on external sites pointing to your site has NO relevance in the HuriSearch algorithm.

Also see section below on meta data.

Using metadata

We encourage webmasters to make a systematic use of meta data to describe all published content, such as the Dublin Core meta data standard.
This will make it easier for search engines to sort and retrieve your documents.

Content management systems allow persons without technical knowledge of HTML to easily create web content. Some of these systems also include fields for metadata, so content creators can directly add the relevant keywords by themselves. For example, Plone is an excellent open source content management system which offers meta data fields.

We also recommend to include at least one meta data tag for keywords relevant to the human rights context, in order to describe the subject of a particular page. A controlled vocabulary (or micro-thesaurus) will make it possible to avoid common errors made when adding keywords to a page, such as spelling mistakes, or using terms with overlapping or similar meaning. HURIDOCS has developed Micro-thesauri which may be used for this purpose. You may want to use in particular Micro-thesaurus 1: HURIDOCS Index Terms.

In the near future HURIDOCS intends to enable full Dublin Core meta data search into its indexing.

We appreciate your feed-back!

Please let us know how you found HuriSearch. Also, you have any questions which were not adequately answered by the text above, please let us know.

Just send an email to Bert Verstappen at hurisearch[at]huridocs.org. We look forward to hearing from you!