Rails Sunspot/Solr returning result for single character searches

I spent 8 hours trying to get this to work. I googled and found lots of people asking the question but no one with a solution, or rather the given solutions wasn’t working for me.

1) rake sunspot:solr:stop

2) Edit the solr/conf/schema.xml file;

<fieldType name="text" class="solr.TextField" omitNorms="false">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="15"/> <!-- This is The New Line -->

3) rake sunspot:solr:start

4) rake sunspot:reindex

I think my problem was changing the schema but not restarting the solr server. Restarting the rails server isn’t enough.

Backup! (Bye bye TimeMachine, hello Carbon Copy Cloner)

Executive summary:
Stop reading, backup your computer. Done that? Let’s continue. [1]

Second Executive summary:
TimeMachine on my mac failed, I now use Carbon Copy Cloner.
Yes, that cost me £26. Yes, it’s basically rsync (free opensource software I already use) but it’s easier to configure and has other useful features that I decided are worth me paying for.

Part 1: What happened to TimeMachine.
Part 2: How I solved it
Part 3: Other backup options that may be better for you
Part 4: Backup of yesteryear

Part 1: What happened to TimeMachine.
One of the things I liked about my mac was ‘TimeMachine’. I liked it because the old ReadyNas NV+ I use as an offsite [2] backup destination supported TimeMachine. Backup’s were easy to setup. Backup’s worked. When I bought a new hard disk for my mac, I re-installed everything from the backup over the network. Life was good.

About 2 Months ago, My TimeMachine backups stopped. It reported a problem and that it needed to start the backups from scratch. Fine, it told me, so that’s not a problem. Except, it just wouldn’t complete a new full backup. It would get through most of the backup and then the volume (both quantity and size) of files left to backup would grow as fast as it made the backup. I cold see plenty of data being passed across the network link to the backup destination – more data traffic that data on my disk in fact, so it was doing something but never finishing.

  • I tried deleting the backup and starting again, no luck.
  • I found a dashboard widget that would provide information about TimeMachine backup – but it didn’t tell me anything useful as to why my backups wouldn’t complete.
  • I tried looking for error messages in the log files, no luck.
  • I bought an external hard drive (thinking maybe the ReadyNas or network link was the problem), no luck.
  • I resorted to an emergency backup method – I used rsync to mirror my user directory to the office network.

The more I researched TimeMachine as a backup tool, the more I felt disappointed at the lack of capabilities it had, especially regarding feedback as to whether the backup was working.

Part 2: How I solved it
So, it was time to find a solution.
Option A) rsync
rsync is an old command line tool that can make two directories on different computers look the same. It is very efficient, only transferring files that have changed. It can be set to only send files in one direction (so deleted files remain – useful for recovering a file accidentally deleted). I already use it as one part of my backup routine for my linux servers and one of our windows desktop clients (and used to use it to backup my Windows laptop)

I like rsync, but I hadn’t quite worked out the best settings for the mac.
My quick get-out-of-jail backup command was
rsync -avz --exclude 'steve/Documents/Parallels/' --exclude '*/Library/*' /Users/ rsync://backup5.rkbb.co.uk/mac_rsync
I’ve been running that by hand every couple of days. You can see;
-avz in the command means ‘archive’ the files, ‘verify’ the files and ‘zip compress’ the transfer.
–exclude are things i don’t want to sync, my Parallels virtual machines and the Library. There are things in the library that may be worth keeping (I think “Notes” may save there, and “Mail” does but I have all that on my gmail)
/Users/ is the directory I want to mirror from my machine and
rsync://backup5.rkbb.co.uk/mac_rsync is the rsync daemon on the backup server, into the direcotry mac_rsync.
This isn’t ideal. It doesn’t backup the applications and it backs up unnecessary files like .DS_Store which the mac will create when it needs them.

Whilst looking for examples of rsync backup scripts for Mac, I discovered Carbon Copy Cloner.

Option B) Carbon Copy Cloner
Carbon Copy Cloner is an application that can back up your mac. I gather it uses rsync behind the scenes, but it has some neat wizardry that makes it very simple to use.
1) It can make a bootable clone – which means if you need to recover your whole mac, it’s ready to go. I think that means you can even plug your clone into another mac and run from there. I could see that being very useful if I was travelling to friends and wanted to take my mac software with me, but use their mac so I could travel light.
2) It can be easily set to run automatically. 09:40am for me, it starts and backups up. If the backup disk isn’t available (like, I’m not in the office), it will tell me and let me choose to delay or try again (in case I just need to plug the disk in).
3) It will work on any disk.  If you have a HFS+ formatted disk (apple) it can do more like make it a bootable clone, but if you’ve just got a simple shared directory on a file server it will happily backup your files to that.
4) It means I didn’t have to work out exactly what was worth backing up and what I could ignore.

It does cost £26 but for a one off cost and piece of mind I’m happy to pay that.   Just think, if I’d have known about Carbon Copy Cloner sooner I probably wouldn’t have spent £269 on a big external hard disk.

If your TimeMachine backups are working for you, great, keep it going.
If they’re not then I suggest you try the free trial of Carbon Copy Cloner.

Part 3: Other backup options that may be better for you
I didn’t use it for my Mac, but for the other family PC’s I use ‘CrashPlan‘. It’s a web based, off site backup service. I think it’s a great solution for many people, but not for me and my mac. I have too much on the disk and it would take too long to backup over the internet (lots of files change frequently, more so now we’re playing with video) and more importantly I HAVE OFF SITE BACKUP ALREADY!  If you don’t have off site backup, you should seriously consider CrashPlan or one of the many other online backup services.

To put that in context I put all the people I care about on my family CrashPlan account. My daughters laptop uses CrashPlan, my parents laptop uses CrashPlan, my sister in Australia uses CrashPlan.

It has one other neat cost saving feature. If you have a friend who always has their computer on, and you always have your computer on, you can backup to each other over the internet for free. You can’t see each others files as they are encrypted but at least you don’t have to pay the monthly fee.

Part 4: Backup of yesteryear

Thinking of backup reminds me of being 16 years old with our first home computer.  A Pentium 486 from Gateway 2000 if I remember correctly. I used to do backups on a Travan TR-1 tape drive.  I also remembering trying to improve the PC performance by tweaking software settings, breaking it, then spending all night and into the early hours restoring from the tape backup.  Here’s hoping I have no need for any more late night/early hour data recovery missions.

[1] I hope you have a current backup if you have anything on your computer you’d like to keep
[2] A backup must be off site to be a backup. What if your house burnt down? What if your computer was stolen along with the backup disk sitting next to it? Offsite for me is backing up home to work

IBM need help with statistics

Look at this infographic

The way I read this, before and after were the same.
The way I read this, before and after were the same.

So,”Stockholm convinced the skeptics to pay for a faster commute.”?

I’m assuming that means pay for an IBM traffic management system as I’m not exactly sure what was being paid for.  I’m don’t I agree the skeptics were convinced though from reading their infographic. The two key bits of data to prove their point are: “Before: Over 50% say no.  After: Over 50% say yes”

Now, I’m not a statistical genius, but I don’t imagine they’re doing too much rounding on these numbers.  If 60% said no, I’m sure they’d have told us. If 55% said no, I’m sure they’d have told us.  We don’t know how accurate their data is either, did they ask everyone or try for a representative survey?  They may be out by 10% anyway (let’s hope not).  To keep it clearly written,  let’s replace the word “Over” with “About”. Does that sound reasonable? Well it does to me so let’s do it.

Now let’s put the ‘after’ into the same context as the ‘before’. As “Over 50% said yes”, that’s the equivalent of “Under 50% said no”, or replacing Over/Under with About, that’s the equivalent of “About 50% said no”  Let’s rewrite their phrase:

Before: About 50% say no.  After: About 50% say no.

At least, that’s how I read it, but maybe it’s me that needs the help with statistics.