Majestic12 – a distributed search engine

Majestic-12 is a distributed search engine that I came across on Christmas Eve. Perhaps I should rephrase that. It came across me [my web server] on christmas eve with such a force the server overloaded and stopped.

Ho Hum. The joys of running a web server is that at 5am on any given day you can recieve a text message saying “Server Down”. Within the laws of science (Murphy’s Law), there is a rule that the probablily of this text message arriving is increased whenever I am a) Asleep b) on holiday and if both a) and b) apply then the chances are quadrupled.

So, what’s a “distributed search engine”? Sort of like Google. Many people may think google is one or two super big computers. In fact, it’s lots (and lots) of simple standard computers connected together. Rather than wait for one computer to answer your search query, lots of computers look for you, each taking charge or a little piece of the database google has built up. Likewise, to fill the google database with information about all the pages on the internet, lots of computers go off and visit all the web sites. In summary, the work is “distributed” between many computers.

Now, all those computers belong to google, but for many years teams/projects have been working on sharing the unused resources of other peoples computers in order to achieve their goal. So, you could sign up for SETI – and while you are not at your computer it will try and find alien signals in data recorded by radio telescopes. Other projects like Rosetta use your computers idle time to find cures for diseases and the ClimatePrediction.net project uses your idle computer time to crunch numbers to help accurately forecast weather.

Now enter Majestic12, a distributed computer solution to searching the internet. It’s in it’s infancy at the moment but it uses peoples idle computers and internet bandwidth to capture information on web pages and use that information to respond to peoples search requests.

Now you know what it is, what happened to my web server?
Well, when software like Google or Majestic12 visit a web site, they are called a “robot” and they should follow the instructions on my web server in a file called “robots.txt. This file basically tells the robot where it can and cannot go. Why should they follow it? Well, if they follow links into the shopping basket they wont find any useful information. Going there wastes their bandwidth and mine, not to mention costing me money. Majestic12’s robot had a problem that no-one knew about. If the robots.txt had a particular value in it, it would ignore the whole robots.txt file. That’s a bug. It ignored my robots.txt and proceeded into the shopping basket where it promptly got stuck in a loop. When in that loop, it made my web server very busy trying to answer it’s requests (to add another item into the basket) and after a short while the server stopped answering requests from anyone.

The simple fix is a restart the server, but I also had a look to find out what had caused the server to stop and saw the log file entry for Majestic-12 I visited their website, saw a user forum and posted a message in ‘bugs’ to say that the robot had stopped my server working. To be honest, I didn’t expect a reply that day, or even until the new year. But the main person of the project replied in 4 hours. Then he traced the problem, created a fix, issued a new version within 2 hours. He appologised for the slow response, he’d be Christmas shopping! 10 out of 10 for effort on the part of “AlexC” – most impressive bug fix time for a small project. The search results are a work in progress, but I think as the Majestic-12 project grows it could become a serious contender to the big boys of the search engine game.

The evening of Christmas day

The evening of Christmas day. Despite having eaten more today than I normally would in two working days, I’ve still found room for another turkey sandwich. Sharon called from Australia a short while ago. It’s boxing day there now. We were going to call her this morning but got confused with the time zones so decided to leave it. Nicola and James were telling her all they got for Christmas. The corporate machine of “Thomas the Tank Engine” succeded this year as over half of all his presents had Thomas on them in one way or another. From the Thomas DVD, a large thomas train set, a book which folded open to have a large track that a wind up Thomas engine will follow round in circles, a matt where you can ‘paint’ with water your own train tracks which a Thomas engine will follow, a set of Thomas story books and a Thomas Annual which he was reading in bed when I said goodnight earlier. Corporate Machine? Yes. Happy James? Very much so.
All those branded items got me thinking to how we can improve the Roots brand.
Thinking of work, this week will be a busy week for me. Asside from two fitters packs I need to put together, I have to migrate from one web server to another new web server. What does that mean? Well, this web site, the rootskitchens web site and all the things that I’ve made the web server do over the last few years have to happen on the new web server. I’d like the switch to be seamless, but I know that it can’t be. Downtime should be limited to only a few hours though, if I do it properly. I’ve got about 3 days (24 hours) of time allocated to set the new server up before I flick the virtual switch. By this time next week I’ll know if it worked or not.

Telesales

Just had a very funny phone call. A young man called Alex from “Space Designs” called me to say that I’d been selected by their computer based on my postcode. They are looking to have photos taken for a magazine and that I’d be able to have luxury home improvements to my kitchen and bathroom….

At that point I burst out laughing. He’d called my business phone number, and the business name on all listings is “Roots Kitchens Bedrooms Bathrooms”. Once I’d finished laughing, I let him in on the joke.