Search engine: do it yourself

Sun 27 October 2013

[caption id="" align="alignright" width="300"]English: Magnifying glass with focus on paper.... Search engine ancestor (Photo credit: Wikipedia)[/caption]

I heard many people complaining about the monopolistic situation of google.
I just want to present few alternative. I decided to group them as follow:
  • big ones
  • old ones
  • stange ones
  • do it yourself
  • others

Of course, my preference goes to the "do it yourself" solution (namely yacy and seeks), and as usual, for each usage a search specific tool should be prefered.

The big ones

This one are run by big big big companies. They are powerfull, well fitted for large general usage, have indexed many many many (not only web) content. They are also the must used (just look at alexa). I will only list some of them as every body may know them.

They are run by private company that make what they want. If they want to modify the result for any reason (such as promoting their commercial partners or make some web ressource unvisble), they can make it.

The old ones

[caption id="" align="alignright" width="300"]Labrador retriever Lycos! Go get it! (Photo credit: Wikipedia)[/caption]

We don't remember them but they were here in late 90's.

The strange ones

They are specific and are difficult to use for general web search.

  • web content directories such as dmoz (open directory). Each link is human-checked (high quality but not many content)
  • specific web site search engine (e.g. imdb)
  • semantic search engines such as wolfram alpha. you ask this kind of search engine with natural language.
  • mystery seeker: the search engine that respond somethink you don't ask for
  • creative common search engine
  • amfibi for companies search
  • archive to find old version of a web site

The ones you run yourself

[caption id="" align="alignright" width="180"]Do it yourself fencing. Do it yourself fencing. (Photo credit: M i x y)[/caption]

They consist of a piece of software you run on your computer. They are slow and usually share theyre result among a on-purpuse peer to peer network. The main advantage is the impossibility to deindex a ressource indexed by one or more computer involved in the P2P network. The two main (maybe the only two that exist) are:

Both project propose publicly available runnig node to test (here and here). The main difference between both is that yacy only rely on yacy-running-computer crawler to index the web and prenset search result whereas seeks is fisrt of all a meta search engine  that share its results by its P2P network.

Others I use

When I don't have my computer, I use one of the following:

The others

It's just others. I just configure them into my prefered meta search engine (seeks project), and I don't use them otherwise

Anyway, I advise you to compare results, and to run it yourself. OK it's less reliable (P2P node shut down affects index capabilities, the few number of  running nodes don't allow big worldwide web index), less easy to use (installation process, less user-friendly,...), slower (must query other p2p nodes and wait there answer). If it's too complicated or you don't want to install stuff you don't know on your computer, use a metasearch engine.

As I was writing this post, I saw that many people already write about search engine alternatives. So there is really no reason to complain about google's monopolistic situation.

Enhanced by Zemanta

Related articles (or not):

Category: tools Tagged: bing DIY Google Lycos Metasearch engine Search tools Web search engine Wikipedia yahoo search