9

I am working on a project that requires that I understand different techniques used by search engines for the web.

I have a strong scientific and development background, so I am not afraid of highly technical information.

I am looking for all forms of technical information including information on web crawlers, other techniques for acquiring data, methods of data storage and how to query it, etc., etc.

I am completely new to the subject and I'm looking for useful resource and books on the subject. Any suggestions are appreciated.

RLH
  • 541
sebpiq
  • 375

3 Answers3

3

This area of study is known as Information Retrieval. This Wikipedia article contains a good summary and lots of useful links.

3

First, Google actively participates in the "science" of technology and often shares their knowledge by releasing papers from their R&D department. You can find those papers from the reference link below. I haven't searched for specific papers on search retrieval algorithms but there should be more than enough information available on the subject from a very technical perspective, as well as papers on storing massive sets of data and effectively querying it.

Publications by Googlers

Other than Google's resources, I highly recommend that you look into Semantic Web research. Semantic web isn't a method of searching data and even though the concept at first seems a little vague, the clear implications of a semantic web "engine" would be to parse the information within the WWW and link relevant information with one another.

In short, semantic web is the science of what many forward-thinkers hope and are working for the internet to truly become where the information that is provided is well parsed, interpreted and correctly linked together. I haven't looked into it much myself, so some of my information may be a bit incorrect. However, their are plenty of resources available that discuss semantic web and many are hoping/waiting/working on a break through within the field with the hopes of making the "next big thing" for the internet.

A good starting point for learning about semantic web is, of course, Wikipedia.

These references may not be books but it is a lot of information. Reading and sifting through all of the technical information should keep you busy for a while.

RLH
  • 541
0

Following advices from @Andy Waite I read the wikipedia page on information retrieval, and followed the references. There is a lot of online information, and I find this introduction to information retrieval, which is an online book from 2008, so probably up-to date, and which seems to be quite a good intro to the topic.

sebpiq
  • 375