Migrating from phpBB to Google Groups

For many years I’ve run a tiny web site for the village we live and work in. 8 years ago (or maybe more) I added a forum to the site using phpBB, as they say about themselves ‘THE #1 FREE, OPEN SOURCE BULLETIN BOARD SOFTWARE’.

It’s been very good software, regularly updated and very easy to maintain. However, the most interaction I have with the forum now is blocking spam registrations and migrating it to new servers every couple of years. There are only a couple of posts a year now, so I wanted to find a way of reducing my administration workload.

I decided to migrate it to a “google groups” group. Which is just like a forum with less customisation options. I couldn’t find any guides to migrate away from phpBB so I worked out my own method and here’s how I did it, in case you’re trying to do the same.

Steps in short form:
1) Get data from phpBB database tables as CSV file
2) Write script to process CSV file into multiple emails to the group

1) Get data from phpBB database tables as CSV file
I only needed to migrate each topic and all it’s replies. None of the other database content was important to me.
To do this, I wrote a SQL query:

SELECT po.post_subject, po.post_text, po.post_id, po.topic_id, po.post_time, us.username_clean, top.topic_title, top.topic_time
FROM phpbb_users as us, phpbb_posts as po, phpbb_topics as top
WHERE us.user_id = po.poster_id and po.topic_id = top.topic_id
ORDER BY po.topic_id ASC, post_time ASC

Essentially, this takes selected columns from the tables ‘phpbb_users’, ‘phpbb_posts’ and ‘phpbb_topics’. I’m not sure using ‘WHERE’ is very efficient and perhaps ‘INNER JOIN’/’OUTER JOIN’ would be technically better, but mine was a small database and this was more than fast enough for me (58ms for 114 rows).

Then, I saved the result as a CSV file. Opened it in LibreOffice to check. Several of the fields needed some hand editing, remove first line (headers), replacing some html characters, escaping speech marks, etc. I may have been able to fix those when saving the result of the query as CSV but I didn’t have many to do, so hand fix and move on was fastest.

2) Write script to process CSV file into multiple emails to the group

My script language of choice is ruby. Not because it’s any better than anything else, just what I happen to be using lately. I could have done the same in PHP if I spent a little more time on it.

This is the script:

# I saved file as: process.rb
# to run, "ruby process.rb" ... assuming you have ruby installed ;-)
# I had to install Pony from github, which i did using the specific install gem
# gem install specific_install
# gem specific_install -l https://github.com/benprew/pony
# If you're reading this later and forget where it came from,
# https://www.steveroot.co.uk/2015/11/migrating-from-phpbb-to-google-groups/
# Share any tips and fixes in the comments there to help others please!

require 'csv'
require 'date'
require 'Pony'

#initialise the topic counters
#some default text for the first email
#you will need to delete this manually in the google groups!
currenttopic = 0
lasttopic = 0
body = "initialise"
subject = "initialise"

CSV.foreach('phpbb_data.csv') do |row|

#get current topic
currenttopic = row[3]

if currenttopic == lasttopic
#This is a reply to the topic, add to the existing body
body = body+""+"n"
body = body+"-----------------------------------------------------"+"n"
body = body+"reply_by_username: "+row[5]+"n"
body = body+"reply_date: "+DateTime.strptime(row[7],'%s').strftime("%d/%^b/%Y")+"n"
body = body+""+"n"
body = body+row[1]+"n"
#This is a new topic. SEND the last group of messages
:to => 'YOUR-FORUM-NAME@googlegroups.com',
:subject => subject,
:via => :smtp,
:body => body,
:via_options => {
:address => 'smtp.gmail.com',
:port => '587',
:enable_starttls_auto => true,
:user_name => 'YOUR-EMAIL-ADDRESS',
:password => 'YOUR-PASSWORD',
:authentication => :plain, # :plain, :login, :cram_md5, no auth by default
:domain => "YOUR-SENDING-DOMAIN" # the HELO domain provided by the client to the server

#A message to terminal on every send, nice to know that something is happening!
puts "Sent "+subject

#Reset the body (subject is set only once, no need to clear)
body = ""
#Set subject, create standard header text and set subject for email.

#Set the subject as the topic name
subject = row[6]

#Put some generic header text in place
body = body+"-----------------------------------------------------"+"n"
body = body+"This post was transfered to the google group when the phpbb based forum was shutdown"+"n"
body = body+"You might find relevant information at YOUR-DOMAIN"+"n"
body = body+"This entry includes all replies to the original topic"+"n"
body = body+"-----------------------------------------------------"+"n"
body = body+""+"n"

body = body+"Topic: "+row[6]+"n"

body = body+"created_by_username: "+row[5]+"n"
body = body+"topic_date: "+DateTime.strptime(row[7],'%s').strftime("%d/%^b/%Y")+"n"
body = body+""+"n"
body = body+row[1]+"n"
#set the value of last topic ready for the next loop.
lasttopic = currenttopic


# These are the fields in order in the CSV. Here for easy reference whilst I coded
# numbers start from zero (so post_subject = row[0])
# "post_subject", "post_text", "post_id", "topic_id", "post_time", "username_clean", "topic_title", "topic_time"

Being very lazy, I didn’t write the code to understand the first pass should *NOT* be emailed to the group, so the first email to the group titled ‘initialise’ will need to be deleted manually.

You will need to enter your own values for: Forum name, your email address, your sending domain. You’ll need a password, but be aware that if you use 2 factor authentication you’ll need to get an app specific password from your apps account.

You will want to customise the text that is added to every email, perhaps correct the spelling of ‘transfered’ too 😉

The script isn’t particularly fast as it connects and sends each email individually. We use google apps and as there weren’t many topics to send it was well within my daily limit of gmail usage. However, if it was higher then I could have sent them directly via smtp. There are instructions for using different email methods on the ‘Pony’ github pages. The other problem I had was errors in the CSV causing the script to stop. For example some replies had no topic name and that made the script error when it encountered them. For me, I had fixed the CSV, deleted the posts already made to the forum, and run the whole script again. For others, you might like to set up a dummy group to send your messages too first to make sure everything works, then delete the dummy group and re-run the script to send messages to the new group.

To test the email messages, I suggest you take a few rows of your CSV file and send them to your own email to check formatting and content.

If you’re wondering what my results looked like, here’s one of the topics with a reply once posted to the google group

Birthday Calculator – in case you don't want to wait a whole year to celebrate being alive

We have a tradition where I live. We celebrate being alive with a party and that party generally coincides with being alive for another 31,557,600 seconds.  31,557,600 seconds happens to be just about equal to a solar year, which is a happy co-incidence as it’s not so easy to remember otherwise.

I decided I could really do with a good excuse to party before that arbitrary unit of time though.  The solution? Write a web application where I can put in my date of birth and it will tell me other dates that I can celebrate on.

Try it for yourself at http://birthday.sroot.eu and it will tell you amazing things like;

  • How old you would be if you were born on Mercury, Venus, Mars and the other planets in our solar system
  • When your next MegaSecond birthday is (so you can have a party when you survive another 1 million seconds of existence)
  • Or for a really big bash, celebrate the very infrequent in our lifetime GigaSecond birthdays.

If you’d like me to add another arbitrary repeating unit of time post a comment.

Virtual PDF Printer for our small office network – a step by step how to

Alternative title: How I got multiple cups-pdf printers on the same server. (I didn’t, but postprocessing let me work around the problem).


I have a small business. For years we’ve been creating PDFs from any computer on our network through a “virtual appliance’ called YAFPC (“Yet Another Free PDF Composer”).

The appliance originally ran on an old PC, then on a server that ran several other virtual machines. It had a neat web interface and would allow PDF printers to be created that would appear on the network for all of our users to use. It had one printer for plain A4 paper, one for A4 paper with a letterhead background, another one for an obscure use of mine, and so on. If you printed to it, it would email you the PDF (for any user, without any extra setup needed per user). It could also put the PDFs on one of our file servers or make them available from it’s built in file server.

If I remember correctly it cost £30 and ran since 2006 right through until today, November 2014. One of my best software investments!

However, Windows 8 came along and it no longer worked. Getting Windows 8 to print to it directly turned out to be impossible.  The program was not going to be updated or replaced with a new version. I managed a short term work around having windows 8 print to a samba printer queue which converted and forwarded to the YAFPC virtual appliance. There were problems, page sizes not be exact and so on but it worked in a fashion.

Roll forward to today when I’ve just got a new network PDF virtual printer working. It wasn’t so easy to do (some 20 hours I guess) so here are my setup notes for others to follow.  The final run through of these notes had it installed and working in about an hour.

These steps assume you know quite a bit about setting up linux servers. Please feel free to use the comments to point out errors or corrections, or add more complete instructions, and I’ll edit this post with the updates.  Also please suggest alternatives methods that you needed to use to meet your needs.

Overview – We are going to create:

  • a new Ubuntu based linux server as a virtual machine
  • Install CUPS, the Common Unix Printing System
  • Install CUPS-PDF, and extension that allows files to be created from the print queue
  • Create a postprocessing script that will run every time CUPS-PDF is used that will customise our PDF’s and send them where we want them (to our users).

Sounds simple, right 🙂

Continue reading “Virtual PDF Printer for our small office network – a step by step how to”

sunspot solr slow in production (fixed by using IP address instead of domain name)

Short version:
In my sunspot.yml I used a FQDN ( solr.rkbb.co.uk ). Solr was slow
When I used the server IP ( Solr was fast.

Setting the scene (you can skip this bit):
I’ve been slowing working on some improvements to our business system at work. Whilst most of it currently runs on MS Access and MySQL, I’m slowing working on moving bits into Ruby on Rails. One of the most important things our current system does is store prices and descriptions for over 200,000 products. Searching that database is a crucial task.

Searching in Rails turned out to be very easy. Sunspot had it working very quickly on my development machine. I also had it running on my production server using the sunspot_solr gem which is meant for development only (but mines a small business, so that’s fine). However, when the server was restarted sunspot_solr needed to be manually restarted which was a pain. I thought I should probably get around to setting up a real solr server and point my application to there. So far, so good, simply: copy the config from my rails app to my new Solr service , set the servers hostname in solr.yml, commit, deploy, it worked!

The problem – Solr was terribly slow!
Re-indexing was slow. I could tell something wasn’t right. Neither my rails server or my new solr server were under load.
I created a new product instead (so that would appear in the solr index).
That was slow, but it worked. Displaying search results was also slow.

Check the logs – wow! Yep, Solr is the slow bit

Started GET "/short_codes?utf8=%E2%9C%93&search=test" for at 2014-10-01 14:28:03 +0100
Processing by ShortCodesController#index as HTML
Parameters: {"utf8"=>"✓", "search"=>"test"}
Rendered short_codes/_navigation.html.erb (1.0ms)
Rendered short_codes/index.html.erb within layouts/application (6.7ms)
Rendered layouts/_navigation.html.erb (1.3ms)
Completed 200 OK in 20337ms (Views: 10.3ms | ActiveRecord: 1.7ms | Solr: 20321.1ms)

No way should Solr take 20321ms to respond.

I tried the search on the solr admin interface and the response was instant, so I knew that solr wasn’t the problem. It must be my code (as always!).

As solr replies over http, I tried querying it from my rails server command line. Also slow. So… maybe it’s not my code… then I tried pinging the solr server from my rails server:

ping solr.rkbb.co.uk

it said replies were coming back in less than 1ms .. but then I realised they were taking about 3 or 4 seconds between each report.
I tried pinging another server … same effect…
then I tried pinging my office router… reports every second, just as fast as I’m used to seeing it. But this was the first time I’d used an IP address and not a FQDN
Then I tried pinging my solr server by it’s address … reports every second!

So, maybe all I have to do is configure my application to talk to solr via the server IP instead of FQDN…

I tried…

Started GET "/short_codes?utf8=%E2%9C%93&search=test" for at 2014-10-02 11:51:49 +0100
Processing by ShortCodesController#index as HTML
Parameters: {"utf8"=>"✓", "search"=>"test"}
Rendered short_codes/_navigation.html.erb (0.9ms)
Rendered short_codes/index.html.erb within layouts/application (8.4ms)
Rendered layouts/_navigation.html.erb (0.8ms)
Completed 200 OK in 27ms (Views: 12.2ms | ActiveRecord: 1.1ms | Solr: 8.3ms)

… and I fixed it 🙂

Well, solr is working great. Now I need to figure out what’s wrong with using FQDNs in my network.

BT Wifi hotspots intercept and redirect google SSL searches

BT WiFI intercept and redirect SSL I wouldn’t have noticed them doing this except google told me. That does beg the question, if I was buying something on a website and didn’t notice the redirect from HTTPS to HTTP, could other people on the hot spot be snooping my transactions?

I guess BT have a good reason for doing it but this reminds me I must always connect to the internet via a VPN when on a public hotspot.

How to change a folder icon to a picture in Mac OS X

In an earlier post I showed how I set my screenshots to save in a custom folder rather than onto my desktop (I seem to take a lot of screenshots). I also shared a little camera icon that I made for it. One of the comments asked how I changed the folder icon, so I’ve made a 30 second screen recording to show how.

1. Go to web page that has the image you want
2. Right click image (or ctrl + left click)
3. choose Copy image
4. select folder you want to change the icon for (single left click the folder)
5. Press cmd + i keys together (opens the info pane)
6. left click folder icon shown in the top left of the info pane (it will get a blue highlight border)
7. Press cmd + v keys together (this is the shortcut for paste). You’ll see the image will have replaced the folder icon.

VMware consolidated backup missing a catalogue file – fixed!

As always seems to be the case, a routine update of server software becomes a problem. This time it was updating VMware ESXi from 4 to 5. I know, I’m a little behind the times, but it was working, and it’s only a small office server… and I should have left it alone, sigh.

So, shutdown the Virtual Machines, overnight copy them all over the network to my laptop and a handy external disk. – Done.
Note: I probably should have used the VMWare standalone converter to copy them, rather than just copy them direct from the datastore.
This morning, in at 8am, install the new ESXi (having lost two hours ’cause the DVD on the server was playing up).

Start restoring the Virtual Machines. First a non important one… all good

Second, the most important one, our file server…. uh oh.

"The VMware Consolidated Backup source ... has a missing catalog file."

Several hours of trying to fix it, editing files trying different versions of the VMWare standalone importer (which may have helped, I’m not sure),
I solved it by Opening the Virtual Machine in VMware Player,
which spotted the problem (I had the VM disks split across two datastores but I’d saved them into one folder), asked me to tell it where they were, and that fixed it for VMware player, which also meant the importer was happy again.

PS – I also realised why I never upgraded from VMware ESXi 4. Version 5 takes away a lot of the essential functionality from the vSphere software. That makes ‘it not a lot of use’ for me. Still, it was free. So having fixed the import, I’m now waiting to import it back to a fresh install of version 4. At least I finally set up the 3+1 raid 5 (instead of the 2 sets of raid 1 left over from the original disks and an upgrade 2 years ago).

A quick note on my first steps using stripe.com

I’m building a web site for a charity that needs to take credit cards for tickets being sold. I’ve chosen to use stripe.com as:

  • It’s simple to implement
  • They take care of all the security and PCI DSS (I never get any card details to save, that’s a good thing).
  • It’s not expensive (compared to other options like having a merchant account for the charity).
  • I couldn’t get away with using existing services (eg: eventbrite, picatic, etc)

Users don’t have to register on this charity site (essentially it’s selling a one off event ticket) so my process is:
1) Vistor completes form and submits [let’s call it ‘Registration’]
2) Server validates form (email address present, other information entered, etc)
3) Server sends page with Stripe pay now button. That button contains the code to precomplete some of the stripe form (eg: the email address).
4) Vistor clicks stripe button, enters card details which are sent direct to the stripe server (ie: not through my server)
5) stripe returns a ‘token’ that can be used to charge the credit card and visitor is directed to my ‘charge’ page with their token (sent as a https post request).
6) when my /charge page is requested, My server can request the card is charged using the single use stripe token. Then thank the customer for paying.

I wanted to record the payment as processed against my Registration_ID, and thought I would be able to use the browser session to link the stripe request with a specific registration. It didn’t work, every test transaction came back with nothing in the session. It was as if the session was being refreshed every time a Stripe transaction occurred.
After several hours of frustration, I tracked it down to rails built in CSRF protection.
As the post form is coming via Stripe, it won’t read the session cookie from the browser and resets it.

All I have to match the registration record with the stripe transaction is the visitors email address. This obviously causes problems if:

  • A visitor wants to buy more than one registration on the same email address
  • A visitor changes their email address during the stripe process (not easy for them to do, but possible).

However, it’s the best I’ve got so I’ll have to write some backup code to prevent two registrations on one email address (they’ll have to get in touch and pay another way) and raise an error if the email address that stripe got is different from the address in our records (the charity will have to match the records manually which isn’t difficult for such a small event).

Here’s the part in my dev log that help me find the problem, along with this blog post on kalzumeus.com.

Started POST "/registers/charge" for at 2014-03-14 16:12:34 +0000
Processing by RegistersController#charge as HTML
Parameters: {"stripeToken"=>"tok_1234sometokendata", "stripeEmail"=>"asdf@asdf", "stripeBillingName"=>"CARDNAME", "stripeBillingAddressLine1"=>"asdf", "stripeBillingAddressZip"=>"ME13 9AB", "stripeBillingAddressCity"=>"Faversham", "stripeBillingAddressState"=>"Kent", "stripeBillingAddressCountry"=>"United Kingdom"}
WARNING: Can't verify CSRF token authenticity

A new form of Comment spam? – url shorteners and redirection?

This is interesting. This blog just had a comment which, at first glance, looked normal.

URL redirection can hide the destination, not always a good thing
URL redirection can hide the destination, not always a good thing

The link first runs through URL shortening service tinyurl.com.

That in turn redirects it to adfly (http://adf.ly) which is where it becomes interesting.
Example of an Adfly landing page
Adfly is an advertising system. Instead of linking directly to the destination, you link with a custom link from them. Before the visitor can go to the new page, they see an advert.
They can interact with that advert or click the big “Skip Ad” button at the top of the page.
If people click on the advert, whoever created the link gets a commission.

I don’t have a problem with Adfly. I’ve seen my son skip the adverts lots of times when he’s getting plugins for Minecraft. What I hadn’t seen before was this method of hiding the adfly link and as far as I know, it’s the first one posted on my blog.

Is it a problem?
I don’t think so, just an observation. It means I’m going to be less trusting of any url shortening from now on.

Is it an opportunity?
Not for me, at least not yet.
It would not be difficult for me to write some code that redirected all my off site links via adfly, including those posted in comments. It does mean anyone visiting and following a link would have an extra step to go through and I’d rather not do that readers.

I used to have google adverts on the blog but when I came to update WordPress I didn’t bother rewriting the templates or installing any plugins. The revenue it was generating was trivial.
I suspect Adfly revenue from this site would also be too small to be worth the effort.

BitDefender v Nod32

It’s anti virus renewal time! Not the most exciting job of the year, which is why I’ve been renewing for at least 2 years at a time.

Bottom line: So, after all my research, reading and testing, we’re sticking with Nod32 for another 2 years.

We’ve been Eset Nod32 customers a long time, but for this renewal a few warning signs meant extending the licence wasn’t the no brainer it has been in the past.
1) Their web site is way out of date. Here’s a screenshot:
It’s November of 2013, so why are all these certifications dated 2006 to 2010?
2) Being pedantic when it comes to presentation of data, I read the claim of “ESET has won an unprecedented number of Virus Bulletin’s VB100 awards, more than any other security product” can also mean “We’ve been around years longer than everyone else, so we can say that whilst the new companies can’t”. It doesn’t tell me Eset are still leading the field and I’m sure they used to say they were the only provider with a 100% detection rate. They don’t say that now… maybe they’re not as good?

The big plus in their favour: The renewal price is cheaper than the new customer price. I like that. I hate it when companies give discounts to new customers but not existing, making me need to spend time switching supplier each year.

Despite the plus, it was time to do a little more research.


I went through each month of reports on this site, as well as a couple of others. About 4 hours of study (yeah, I should get a life).
Result, ESET isn’t the leader any more. It may only be behind by a couple of percent, but going through several months of av-comparatives.org tests, they are now often a little behind. My fear, of course, is that one or two new viruses they mis in a given month is the one that gets into our network and causes mayhem [at this point, I am obliged to remind you to make sure you backups work, lest this latest virus called ‘cryptolocker’ destroy your files].

I looked at the new consistant leaders. I settled on the best alternative for our needs to be BitDefender. I wanted to try Kaspersky. Mostly because I admire their stance against a patent troll but unfortunately it’s a lot more expensive than Eset Nod32. Of course, I’ll regret that one day if we get a virus Kaspersky would have stopped, but in 6 months time it could be Kaspersky misses the virus Eset would have stopped. Hey ho!

I registered for the trial and at first, I liked it. I went for their ‘Cloud Security’ option, which as best as I can tell, is their ‘Small Business Pack’ (ie: regular PC Antivirus) but with a web based console for reporting and installing. I installed it on a new Windows 8 PC (Our first in the office, and I like Windows 8 a lot) and I love the console. It gave me a link to download the install which was super smooth (no licence ID’s to type in). It later told me that we have 7 other PC’s that aren’t running BitDefender (it searches the Windows network for machine ID’s and matches it to the machines BitDefender is installed upon).

Everything was great… until I got a virus. OK, not a real virus, the EICAR test virus file. It’s a small piece of text you can download to see if your virus scanner will detect it. Except. It didn’t. Or, I thought it didn’t. It immediately quarantined the file BUT DIDN’T TELL ME. So I did what any user would do, I tried again. I then decided the download function wasn’t working, so copied the text into a new text file, saved it, closed it – but it had disappeared. I then created the text file and left as .txt. Saw it on my desktop, renamed the file… and it disappeared.

Only then did I go and check the notification panel to see all these files were quarantined. So it’s good, it did it’s job, but it’s bad, because I didn’t know that. If one of our users has the same situation trying to read a customer’s .doc attachment, how are they to know what’s happened? It’s annoying.

So I put in a support request:

Issue: I’m testing bitdefender for our business. I tried the EICAR test file. Bitdefender spotted the file and moved it to quarantine. However there was no warning for the user (that the file they just downloaded was quarantined). The action was reported in the web console.Is there a setting that prevented a warning for the user or is this always the case (users don’t get told)

A few days having had no answer, I took to their product forums. The forums were pretty quiet. No one with a similar question to mine but it appears I wasn’t the only one waiting for an answer


A whole 8 days later, I got an answer to my support email:

When running in Auto Pilot Mode, the product will take automatic actions for all malware and all information will be logged in Events.

The user will be notified via the Security widget that will display the number
of Events.

So, if you realise, you can open the widget and see what’s happened. It doesn’t pop up a warning. Until then, you’ll be clicking download wondering why nothing seems to be happening. Today, I also noticed a new warning “7 Days since last system scan” or similar. I don’t understand why BitDefender hasn’t just gone ahead and scanned if that is significant to the antivirus protection, I know Eset Nod32 does. Sure, a full scan can affect PC performance so make it happen when the processor is idle, or as a low priority background task.

I’ve only put one support request into Nod32 over the years we’ve had it, but looking back it appears to have been answered on the same day (with the solution, my further thank you reply 4 days later shows).

So, after all my research, reading and testing, we’re sticking with Nod32 for another 2 years.