WordPress redirects

Moving from movabletype to wordpress has involved several post moving to new locations. The redirection plugin has made adding redirects easy but only once I figured out the correct regular expressions to use. Here’s what I’ve had to use.

1) Old MT pages had .html on the permalink

Source: /(d*)/(d*)/(.*).html
Target: /$1/$2/$3/

2) A much earlier MT site had pages in an ‘archive’ directory. They also had urls with underscores instead of hyphens the wordpress pages are using, instead of spaces generated from the page title. So some archive entries had one hypen, others had four. I didn’t figure out a singe elegant regex and instead used five to do the job. Note, I couldn’t use “.*” to match as “.” would match the underscores too. The same applied to using the shortcut w as that also matches underscores.

No underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)/
Target: /$1/$2/$3/

One underscore

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4/

Two underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5/

Three underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5-$6/

Four underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5-$6-$7/

Results are in, WordPress gets as much spam as movabletype

That didn’t take long.  Last night I pointed the domain name to the new WordPress server. This morning I had 50 spam comments held for moderation.  I was impressed the spammers had noticed the blog change so quickly, then I thought the robots probably change their behaviour depending on the comment page. Clever those spammers.

Still, there are plenty of solutions. Before I get into writing some unique code I found a simple plugin. If you want to comment you now have to answer a question. I’ve begun by asking you to type the number 6. You can answer with any of: 6, Six, SIX, six.

If the spammers decide it’s worth adding that answer to their bots then I can change the question.


Technical notes on migrating from MovableType to WordPress

Having just migrated from MovableType to WordPress here are my notes just in case they help anyone else.

Time it’s taken from empty wordpress to viewing complete posts: about 6 hours
Time it would take to do now knowing exactly how to solve problems: about 1 hour
Time you need to allow…. decide for yourself based on your past experience of migrations!  I was happy with 8 hours or less.  Oh, and there are few broken images I need to find and some hard coded html within some old posts I need to look into.

1) Setup wordpress (d’uh!)

2) Install Plugins

3) From MovableType, delete spam comments from the database

I had 15,000 spam comments in my database.  Removing them turned a backup that took over 15 minutes to create (at which point I gave up to investigate why it was so slow) to less than 2 minutes.  I did this using Mysql workbench.  It was pretty clear what sql commands to use for me, but if you’re finding it hard post a comment and I’ll work it out again and post. It was something like

"Select * from `mt_myblog`.`mt_comments` WHERE `spam_field` = -1;"

Then having checked I was correctly selecting the records that were spam, using DELETE FROM instead of SELECT *.

4) Generate the MovableType full backup.

This is good, it includes images and comments.

5) Import the backup file using the MovableType Backup Importer Plugin.

I had two gotchas. 1) I had a non-ascii character within my xml file.  There was a solution on the plugin forum – a one line perl command that fixed the file by removing the non ascii characters. I was too lazy to see what they were. 2) The Gandi Simplehost php max execution time is 120 seconds.  The scripts was taking more than that to import my blog (some 600 entries, a 2mb xml file and a few hundred images).  It turns out I just had to keep running the import routine until it finished. It recognised the entries it had imported and skipped them and after 3 or 4 attempts everything was imported and I got the success message.

6) Setup redirects for the old MT page names to the new WordPress names

My MT permalinks were /YEAR/MONTH/some-name.html.

In WordPress the default is /YEAR/MONTH/DAY/some-name/

So, I changed the WordPress default to /YEAR/MONTH/some-name/ then installed the Redirection plugin.  I created a redirect with regex match (to tick the REGEX box too):

Source: /(d*)/(d*)/(.*).html
Target: /$1/$2/$3/

This regex matches slash – any number of digits – slash – any number of digits – slash – any character and any number of characters before – “.html”.

I then get those 3 regex matches and rebuild the correct URL. There are probably easier ways of doing this, but it seems to work OK.  I’ve started by using a temporary redirect but once I see that I’m picking up all the right 404 errors and not creating any new ones I’ll make it permanent.

Goodbye MovableType, hello WordPress

Finally I’ve got around to updating my blog from MovableType to WordPress.

Steves original movable type blog
How the old blog looked

If you’re looking at this today (and maybe for another month.. or year) you’ll see the template is still the default WordPress template. Eventually I’ll update it but for now at least the content from all the old posts is still visible.

The move has come about because of a number of reasons:

1) Spam. There was too much comment spam. Most didn’t get past the spam filter but every attempted post was a load on the server. At times the server was overloaded by spammers trying to post.  The anti-spam features stopped almost all of them but it was annoying me.  I say this knowing there’s probably even more spammers trying to break WordPress….

2) Server Migration.  The old server is being retired.  My web host gandi.net has introduced a new service where instead of having a fully managed ‘virtual private server’ (VPS) they run a ‘platform’ they call ‘simplehost’.  Simple host has a number of advantages

  • It has a built in web cache for performance
  • They keep all the software up to date (PHP, Mysql, etc)
  • It costs less than their VPS (less than half for roughly equivalent performance)
  • It scales easier than a VPS (in the unlikely event one of my posts becomes popular I can up the power for a short time to handle it and lower it once the world moves on)

It has a number of disadvantages

  • I don’t get full control. Limited control of PHP settings, limited Cron options, no Perl, no Ruby, no Nginx.  It’s just plain LAMP (Linux Apache Mysql PHP)
  • Each vhost can only have one domain pointed to it. Although I’ve got the WordPress Network working by using symbolic links and actually this might make my spread of domains and subdomains being used somewhat neater (I have lots of vanity domains pointed here, steveroot.co.uk/com/sroot.eu).
  • I only get one login (so where I used to host a few friends and gave them SSH access to upload their files, I can’t do that any more because they could access all the sites and accidentally break something. Note that only applies for those that need to upload things, where they login through a web interface like with WordPress I can still host that for them, or they can send me the files they want uploaded like my sister in Australia does.).

3) If I put my blog on the simplehost service I no longer have Perl which is the software MovableType runs on.  Wordpress uses just PHP so that’s another motivator to switch

4) A lot of friends are using WordPress for different things.  I’ve had to use it for a couple of community/charity projects I’ve been involved with so I thought I might as well learn how to use it full. I actually had it running on the old server too but just as an experiment a year or so ago.

Which is best?  I honestly don’t find much difference between them so far.  Although any code alterations I want to do should be easier in wordpress (I’ve coded PHP in the past but never Perl) I found there was always an open plugin that did the job and that’s been good enough for me.  After all, I’m just typing rubbish for my own benefit anyway 🙂

Next post – some notes on the migration method.

Mail merge onto a PDF background

Some days, I love computers.

Days like today, where we want send a mailshot to 900 past customers. I got the address list from our MS Access database. I got a PDF of the artwork. What I wanted to do was create a mail merge using the PDF as a background.

Libreoffice/Openoffice didn’t seem to have a neat way of adding a background so I googled the problem and instantly found the solution here.

When I next read this, with my luck the page will be gone, so here’s the important bit just in case.

So, solution; pdftk + background
I had been reading the pdftk manual like the bishop reads the bible. I don’t know what that means, but I think it is bad. Anyway I only saw what I wanted to see, and that was not background.

The merging I’ve been looking for was right there. And so the years I spent on this problem were finished! Here’s the procedure:

Use OpenOffice to mail merge all the names just like blank pages.
Export as PDF.
So you’ll get this big, blank document:

Open a terminal, and add the boat as a background to your 30.000 page PDF.
pdftk names.pdf background boat_background.pdf output out.pdf

And there you go. The sweet deal about this, is that the background is only saved once, and referenced on all the other pages. Nice, just like I wanted.

I’m sure there are other ways to solve this, but this was quick and easy (more so because pdftk is open source and ready to run on Mac OS X, linux and windows).