WordPress redirects

Moving from movabletype to wordpress has involved several post moving to new locations. The redirection plugin has made adding redirects easy but only once I figured out the correct regular expressions to use. Here’s what I’ve had to use.

1) Old MT pages had .html on the permalink

Source: /(d*)/(d*)/(.*).html
Target: /$1/$2/$3/

2) A much earlier MT site had pages in an ‘archive’ directory. They also had urls with underscores instead of hyphens the wordpress pages are using, instead of spaces generated from the page title. So some archive entries had one hypen, others had four. I didn’t figure out a singe elegant regex and instead used five to do the job. Note, I couldn’t use “.*” to match as “.” would match the underscores too. The same applied to using the shortcut w as that also matches underscores.

No underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)/
Target: /$1/$2/$3/

One underscore

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4/

Two underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5/

Three underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5-$6/

Four underscores

Source: /archives/(d*)/(d*)/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)_([a-zA-Z0-9]*)/
Target: /$1/$2/$3-$4-$5-$6-$7/

Results are in, WordPress gets as much spam as movabletype

That didn’t take long.  Last night I pointed the domain name to the new WordPress server. This morning I had 50 spam comments held for moderation.  I was impressed the spammers had noticed the blog change so quickly, then I thought the robots probably change their behaviour depending on the comment page. Clever those spammers.

Still, there are plenty of solutions. Before I get into writing some unique code I found a simple plugin. If you want to comment you now have to answer a question. I’ve begun by asking you to type the number 6. You can answer with any of: 6, Six, SIX, six.

If the spammers decide it’s worth adding that answer to their bots then I can change the question.


Technical notes on migrating from MovableType to WordPress

Having just migrated from MovableType to WordPress here are my notes just in case they help anyone else.

Time it’s taken from empty wordpress to viewing complete posts: about 6 hours
Time it would take to do now knowing exactly how to solve problems: about 1 hour
Time you need to allow…. decide for yourself based on your past experience of migrations!  I was happy with 8 hours or less.  Oh, and there are few broken images I need to find and some hard coded html within some old posts I need to look into.

1) Setup wordpress (d’uh!)

2) Install Plugins

3) From MovableType, delete spam comments from the database

I had 15,000 spam comments in my database.  Removing them turned a backup that took over 15 minutes to create (at which point I gave up to investigate why it was so slow) to less than 2 minutes.  I did this using Mysql workbench.  It was pretty clear what sql commands to use for me, but if you’re finding it hard post a comment and I’ll work it out again and post. It was something like

"Select * from `mt_myblog`.`mt_comments` WHERE `spam_field` = -1;"

Then having checked I was correctly selecting the records that were spam, using DELETE FROM instead of SELECT *.

4) Generate the MovableType full backup.

This is good, it includes images and comments.

5) Import the backup file using the MovableType Backup Importer Plugin.

I had two gotchas. 1) I had a non-ascii character within my xml file.  There was a solution on the plugin forum – a one line perl command that fixed the file by removing the non ascii characters. I was too lazy to see what they were. 2) The Gandi Simplehost php max execution time is 120 seconds.  The scripts was taking more than that to import my blog (some 600 entries, a 2mb xml file and a few hundred images).  It turns out I just had to keep running the import routine until it finished. It recognised the entries it had imported and skipped them and after 3 or 4 attempts everything was imported and I got the success message.

6) Setup redirects for the old MT page names to the new WordPress names

My MT permalinks were /YEAR/MONTH/some-name.html.

In WordPress the default is /YEAR/MONTH/DAY/some-name/

So, I changed the WordPress default to /YEAR/MONTH/some-name/ then installed the Redirection plugin.  I created a redirect with regex match (to tick the REGEX box too):

Source: /(d*)/(d*)/(.*).html
Target: /$1/$2/$3/

This regex matches slash – any number of digits – slash – any number of digits – slash – any character and any number of characters before – “.html”.

I then get those 3 regex matches and rebuild the correct URL. There are probably easier ways of doing this, but it seems to work OK.  I’ve started by using a temporary redirect but once I see that I’m picking up all the right 404 errors and not creating any new ones I’ll make it permanent.