One of the blogs I like to keep an eye on is Kris Wallsmith his personal blog. He is a Symfony2 contributor and also author of Assetic and Buzz. Last year he wrote about a new experimental project called Spork: a wrapper around
pcntl_fork to abstract away the complexities with spawning child processes with php. The article was very interesting, although I didn’t had any valid use case to try the library out. That was, until today.
It happens to be we were preparing a rather large data migration for a application with approximately 17,000 users. The legacy application stored the passwords in a unsafe way – plaintext – so we had to encrypt ‘em al during the migration. Our weapon of choice was bcrypt, and using the BlowfishPasswordEncoderBundle implementing was made easy. Using bcrypt did introduce a new problem: encoding all these records would take a lot of time! That’s where Spork comes in!
Setting up the Symfony2 migration Command
If possible I wanted to fork between 8 and 15 processes to gain maximum speed. We’ll run the command on a VPS with 8 virtual cores so I want to stress the machine as much as possible . Unfortunately the example on GitHub as well on his blog didn’t function any more so I had to dig in just a little bit. I came up with this to get the forking working:
The command generates the following output:
ricbra@goku:/var/www/wb$ app/console wb:generate:password Greeting from 3743 with id 1 Greeting from 3743 with id 2 Greeting from 3743 with id 3 Greeting from 3744 with id 4 Greeting from 3744 with id 5 Greeting from 3744 with id 6 Greeting from 3745 with id 7 Greeting from 3745 with id 8 Greeting from 3745 with id 9 Greeting from 3746 with id 10
Make it a little bit more dynamic
To be really useful I’ve added some parameters so we can control the behavior a little more. As I mentioned before I wanted to control the amount forks so I added a option to control this. This value needs to be passed on to the constructor of the
I also added a max parameter so we can run some tests on a small set of users, instead of the whole database. When set I pass it on to the
setMaxResults method of the
Storing the results in MySQL: Beware!
In Symfony2 projects storing and reading data from the database is pretty straight forward using Doctrine2. However when you start forking your PHP process keep in mind the following:
- all the forks share the same database connection;
- when the first fork exits, it will also close the database connection;
- database operations in running forks will yield:
General error: 2006 MySQL server has gone away
This is a known problem. In order to fix this problem I create and close a new connection in each fork:
That’s basically it. Running this command on a VPS comparable with c1.xlarge Amazone EC2 server did speed up things a lot. So if you’re also working on a import job like this which can be split up in separate tasks you know what to do… Give Spork a try! It’s really easy, I promise.
As stated in the comments by Kris, you should close the connection just before forking. Example of how to do this: