How I use PHP generators to make my life easier

With the release of php 5.5 we got ourself a cool new language feature: generators. If you’re new to the concept I suggest you read the excellent blog post from Anthony Ferrara about this subject. In this post I’ll show an example of how I use generators in my every day work.

The other day I was working on a import of a large batch of vacancies from a remote system via a SOAP webservice (yes, C#.NET on the other side). The service returned a container containing the current resultset and a boolean value whether there where more results. Consider the following the classical approach without generators:

There’s fundamentally not much wrong with this approach. It works. But let me show a more elegant way of solving this issue which also brings more benefits to the table as you’ll see.

The GetAllVacanciesGenerator class

Take a look at this GetAllVacanciesGenerator who’s responsibility it is to do the cumbersome fetching of the results and notice the yield keyword turning it into a Generator:

The command is much cleaner now:

As you can see the command is looking pretty again. I love this kind of improvements. It’s also easy to test this seperate generator class:

This could also be done with implementing a Iterator but I think the generator is much more readable.

Creating a custom Doctrine DBAL type the right way

Today one of my colleague’s was debugging a strange issue with Doctrine’s schema validation tool which caused our test setup to fail (we’re running app/console doctrine:schema:validate as part of our CI process). The output was the same every time:

We quickly discovered the issue was caused by the custom UUID Doctrine DBAL type we introduced lately. This type was based on some gist we found on the web:

It turned out that doing a doctrine:schema:update kept executing the same update query over and over again:

One hour of debugging later I discovered the issue originated to MySqlSchemaManager::_getPortableTableColumnDefinition where a MySQL column gets reverse-engineered into a Doctrine\DBAL\Types\Type which is used in the schema comparison tool. We’re storing the UUID in a BINARY(16) field which results in a BinaryType after reverse-engineering because Doctrine can’t tell the difference between a BinaryType and UuidType because both result in the exact same MySQL column definition. However, there is a solution to fix this.

Using DC2Type in the comment

The solution is to add a comment to the field to store the metadata in. This seems to be missing in the docs but I’ve found some JIRA issue describing the feature. We have to change our column definition so the metadata of the type doesn’t get lost:

The corrected UuidType looks as following:

Unfortunately the binary type is always reverse-engineered with $fixed = true so you have to configure the UuidType accordingly (note the options) on your entities (haven’t found a better way yet):

This comment stuff in Doctrine is not well-documented and probably a lot of people experienced the same behaviour in the schema tool so I hope I prevented some nasty debug sessions in Doctrine’s core (although you learn a lot from it).

CQRS: How to handle file uploads?

If you are like me one who tries to keep up with all the cool stuff happening in the PHP world you’ve probably noticed the buzz around Domain Driven Design and more recently Event Sourcing and CQRS. Last year Qandidate released Broadway: a project providing infrastructure and helpers for introducing CQRS and Event Sourcing into your PHP stack. It wasn’t until last month before I got the change to get my hands on it. We adopted the framework in one of the latest projects at work. And it didn’t take long before we ran into all kind of problems and questions 🙂 .

So for every question we have I’ll try to write a blogpost so others can learn. Also I’m curious about how you handle the problems I describe in these posts, so don’t hesitate to comment if you have a different opinion. Let’s dive into what should be the first in a series of post about CQRS!

The problem

In the application we’re building one requirement is that users can configure attachments to be send to a user when performing some kind of action. We’re using Symfony2 and Broadway so I our code will be very specific to these frameworks. Consider the following form:

In the controller we validate the form, construct our UploadAttachment command – which is just a DTO – by passing all the values from to form to the command bus:

And the command handler calls the appropriate method on our aggregate:

Our aggregate creates a new event:

But as you probably noticed now we run into problems because we’re passing around a UploadedFile instance in an event. Imagine how this would get stored into the event store:

Storing the complete file in the event storage is theoretically possible but we prefer to store our files not in MySQL but somewhere in a S3 bucket in the cloud. If you do your event store will grow quickly and you’ll have other challenges to wrap your head around. Keep in mind events often will be transferred by some queue like RabbitMQ.

After some digging around on the internet I found some others with the same problem. On Freenode #qandidate I also asked for advice. In general everybody stores the file in the controller or command handler and passes on the id to the event.

The solution

We’ve chosen to store the file in our controller and pass on the UUID to the command. A code example is worth a thousand words:

Drawbacks

There are a couple of drawbacks in this method:

  1. every new attachment results in a new file, this could take up a lot of storage from unused files
  2. if something goes wrong in the command handler, the file is stored already

Personally I see it as a benefit we have a history of every single attachment uploaded. We can easily go back in time and revert an erroneous upload or debug what our users did wrong in case of a problem.

By only passing around the UUID our event keeps small and this makes it easy to be published on RabbitMQ.

Debugging Selenium with X11 Forwarding on Scrutinizer CI

Last week we ran into some issues on Scrutinizer CI with our Behat Selenium test suite. These things tend to be quite hard to debug: as Selenium is running headless in X virtual framebuffer (Xvfb) there is nothing to see for the developer. It’s possible to take screenshots, but this requires code changes (probably).

X11 Forwarding

One of the cool things you can do with Selenium (or more particular Firefox or Chrome) is X11 forwarding. If you’re running a X.Org Window system it is possible to forward the display from one box to another. When you’re on a Linux desktop environment you’re golden, but if you’re on a Mac like me you have to install XQuartz to get it working. Follow the instructions on the site and don’t forget to logout and login after installation otherwise your $DISPLAY enviroment variable is empty. By the way: if you’re a Windows user I honestly suggest to get a Mac or install Ubuntu to get it working :).

If you want more information on this topic there is plenty of information to be found on the interwebz.

SSH Remote debugging session

To get this X11 forwarding working on Scrutinizer CI first request a new SSH debugging session. This can be done on the inspection page. When the inspection fails you can retry but in the same dropdown also choose for “SSH Remote debugging”. The first time you do this it will ask for your public keys from Github, you should accept the request. It may take some time but after a few seconds or minutes you’ll receive a SSH login to connect to. Add an -X switch after ssh, so the command looks like this:

Now login on the remote machine and verify your $DISPLAY server is set (should be something like localhost:10.0). Open ~/.profile in your favourite editor and remove the line:

Do not log out, but create new SSH session with the command above. If all is fine you should be able to start firefox and see the browser appearing. Kill this firefox instance, and start Selenium in this session (java -jar /location/of/selenium.jar) and in the first session start your Behat tests.

Speedup your test suite on Codeship using ParallelCI

As I mentioned in an earlier blog post we use Codeship to test some of our private repositories. The folks at Codeship improved their service a lot since we first used it: the UI is improved a lot (both visually as practically) and the list of notification services keeps growing too.

Lately they introduced a cool new feature called ParallelCI. Travis CI has a similar feature called build matrix. You can split up your test suite in multiple parallel builds, called pipelines. If you have a big or slow test suite (probably your Behat tests) you can speed up things a lot by splitting them into multiple pipelines.

Example configuration

Because our phpspec suite is fast enough, we’ve splitted our Behat suite in multiple pipelines. Of course this is project dependant and will vary per use case. To enable ParallelCI open your project settings at Codeship and click on the “Test” link. Scroll down for the “Configure Test Pipelines” section. There will be one pipeline configured called “Test commands” in which all your current test commands are configured.
Click on the green “Add new pipeline” link and a new pipeline tab will be added. Give it a clear name and add your test command. To get an idea of how this can be done take a look at our configuration:

Tab #1: Behat user

Tab #2: Behat profile

Tab #3: phpspec

When you save these settings (the pipeline edit form is bit cumbersome as you will notice when adding new tabs, but I guess this will be improved soon enough) and rerun your last run you’ll see your suite will be split into multiple pipelines and as a result it will speedup things drastically. So I definitely see the use of this new feature and I’m sure you’ll love it for your bigger test suites.

Symfony2 and RabbitMQ: Lessons learned

Last year we introduced RabbitMQ into our stack at Waarneembemiddeling.nl. We were in desperate need of a worker queue and after fiddling around with Gearman, Beanstalkd and RabbitMQ we made our choice: RabbitMQ it will be.

Now there’s quite some information to be found on RabbitMQ and how to use it, but a lot of things you have to find out yourself. Questions like:

  • what happens to messages on the queue after a service restart?
  • what happens to messages on the queue after a reboot?
  • how do we notice that a worker crashed?
  • what happens to my message when the consumer dies while processing?
  • etc.

Using RabbitMQ and Symfony2 (or php in general) is quite easy. There is a bundle for Symfony2 called OldSoundRabbitMqBundle and a php library called php-amqplib which work very well. Both are from the same author, you should probably thank him for that 🙂 .

First try: pure php consumers

We’re running a fairly common setup. Because we’ve been warned that php consumer die out every now and then, we’re using Supervisor to start new consumers when needed. There is a lot of information out there on this subject so I won’t go in there.

Despite the warnings we started with pure php consumers powered by the commands in OldSoundRabbitMqBundle. The first workers were started like this:

This means we’re consuming from the async_event queue without any limit to the messages. Basically this means it will run forever, or better said: until php crashes. Or worse: your consumer ends up in non-response state. Which means it doesn’t process any message any more and Supervisor thinks all is fine because you still have a running process. This happened once to our mail queue. I can assure you it’s better to prevent these kind of things.

Second try: pure php consumers with limited messages

So after the mail-gate I was searching for a quick way to make our setup more error proof. The OldSoundRabbitMqBundle supports limiting the messages to process. So I limited our workers so that they got restarted a couple of times a day:

After that things got running more smoothly and it took a while before we encountered new problems. While spitting trough the logs I notices some consumers produced some errors. A brief summary:

  • General error: 2006 MySQL server has gone away
  • Warning: Error while sending QUERY packet.

Because the consumer is one process that keeps running, that also means that the service container and stuff keeps existing in memory. When you’ve done some queries the database connection keeps open in the background. And if it’s quiet on our queue, it may take some time before we reach the message limit. If that time exceeds the connect_timeout of your MySQL server, you’ll run into the warnings and errors about lost connections.

Of course we should close the connection after the message is processed or could try catch for Doctrine DBAL connection exceptions or increase the connect_timeout setting but thats just denying the real problem. Running consumers with a booted Symfony2 kernel just doesn’t work so well.

A final resort could be to strip down the consumers and don’t use the Symfony2 kernel and container but we don’t like that. Our messages are most of the time serialized events which get dispatched again after the consumer picks them up. At the application level we don’t want to know wether we are in a RabbitMQ consumer or in a normal HTTP request.

Real solution: rabbitmq-cli-consumer

So it took a couple of months to learn the hard way we needed some different solution for our consumers. I found this interesting blog post about the same problem. He solved it with Java and Ruby consumers. We all learned java in college right, but I don’t like to run the memory eating jvm on our servers. The Ruby consumer unfortunately misses some good documenten for me as Ruby virgin. So I got a bit lost there.

That was the point where Go got in. Go is a kind of improved C with not real OO but a lot of cool stuff in it. I wrote a application that makes it possible to consume messages from RabbitMQ queue and pipe them into an command line application. I called it: rabbitmq-cli-consumer.

The main advantages for using rabbitmq-cli-consumer are:

  • no more stability issues to deal with
  • lightweight and fast
  • no need to restart your workers after a fresh deployment

We still use supervisor to start and stop the consumers because it’s the right tool for it. An example of how we start a consumer:

An example of a Symfony2 command we use:

Final tip: use the management plugin

Before even starting with RabbitMQ make sure you have the management plugin installed. It gives you a good overview about whats happening. Also you can purge queues, add users, add vhosts etc.

Install Selenium headless on Debian Wheezy (optionally with Ansible)

When you start testing with Behat and Mink Selenium2 driver you also need a browser running. Because we develop on a virtualised server installing FireFox was a bit more tricky then I expected. Of course a search yielded some interesting results, but also a lot of crap. So here is a little writeup on how I managed to get it running to save you some time. An example playbook can be found at the bottom of this post. But beware: this is Debian only!

On Debian there is a package called iceweasel. This is a rebranded version of FireFox. Because of that there is no firefox package available in the default repositories.

We are using Ansible for configuration management (for both our production and develop environment) so I prefer a package above compiling shit because that’s much easier to automate. There are a couple of options to install FireFox trough package manager:

  1. add Linux Mint repository
  2. add ubuntuzilla repository

Using the Linux Mint repository I experienced some problems. The Ubuntuzilla repository worked like a charm. If you want to install manually just follow the instructions in their Wiki. After adding the repository you can install the firefox package:

To run Firefox headless you also need some display server and to emulate that we are going to use xvfb. Selenium requires Java, thus we install:

Download Selenium somewhere:

You should be able to start Selenium now:

Starting by hand is a bit lame, so we use this startup script:

Copy this to /etc/init.d/selenium and after that you can:

And when we create an Ansible playbook for all this we get:

How to use Codeship with Symfony2, phpspec and Behat

My coworkers and I at waarneembemiddeling.nl are really fond of phpspec and Behat. Yes, we must confess: we didn’t test much since a couple of months ago. We skipped the phpunit age and started right away with phpspec and Behat. We also like services, so instead of setting up (and maintain) our own CI server, we use Codeship. To be honest we fell in love with Travis, but that was a little bit to expensive for us. And so our search ended at Codeship.

There is some documentation on how to use it with php, but its not that in depth about phpspec and friends. Let’s start with phpspec, as this is pretty easy. I’m assuming you install phpspec and Behat as dev dependencies using Composer:

phpspec

Now head over to codeship.com and edit your projects configuration. Pick “PHP” as your technology (didn’t see that one coming). In the “setup commands” field we first select the desired php version:

Next install deps (I believe this line is placed there by default by the codeship guys):

Then add phpsec to the “test commands” field:

Et voila, phpspec should now be functioning. 🙂

Behat

Behat is a little bit more difficult. The first problem you need to solve is to get the MySQL credentials into your Symfony2 application. These are provided trough environment vars, but differ from the naming convention in Symfony2.

We start by changing our app/config/config_test.yml:

Now to let Symfony2 pick up the environment vars we have to follow the convention I just mentioned. This means that an environment variable with the name SYMFONY__TEST_DATABASE_USER will be recognised when building the container. But let’s start by adding a bash script to ease the setup of the testing environment (locally and Codeship). Call it setup_test_env.sh and place it in the root of your project:

Then adjust your codeship setup commands and add:

Last but not least add the behat command to the test commands:

Things should be working now. Quickly enough you will run into the infamous xdebug “Fatal error: Maximum function nesting level of ‘100’ reached” error. Let’s fix this right away and add this in your setup commands:

Summary

So the complete setup commands dialog for phpspec and Behat together looks like this:

And the test commands like this:

Everything should be working fine now! To run your tests local don’t forget to first execute the bash script (notice the extra dot, it is required):

Happy testing! 😉

Slow initialization time with Symfony2 on vagrant

A few days ago we switched our complete infrastructure from hosting provider. Also we made the switch from CentOS to Debian. So we also got a new fresh development environment using Debian and Vagrant (and latest PHP and MySQL ofcourse :)).

We expected the new dev box to be fast, but the oppositie was happening: it was slow as hell. And when I mean slow as hell, it’s terribly slow (10 – 20 seconds, also for the debug toolbar). In the past we had some more problems with performance on VirtualBox and Vagrant. There are some great post out there on this subject (here and here) which we already applied to our setup. In a nutshell:

  • change logs and cache dir in AppKernel
  • use NFS share

The cause: JMSDiExtraBundle

After some profiling I discovered there were so many calls originating from JMSDiExtraBundle I tried to disable the bundle. And guess what: loading time dropped to some whopping 200ms!

The real problem was the way the bundle was configured:

This causes the bundle to search trough all your php files in those locations. Apparently in the old situation (php 5.3 and CentOS) this wasn’t as problematic as in the new situation (php-fpm 5.5, Debian).

Speed up your data migration with Spork

One of the blogs I like to keep an eye on is Kris Wallsmith his personal blog. He is a Symfony2 contributor and also author of Assetic and Buzz. Last year he wrote about a new experimental project called Spork: a wrapper around pcntl_fork to abstract away the complexities with spawning child processes with php. The article was very interesting, although I didn’t had any valid use case to try the library out. That was, until today.

It happens to be we were preparing a rather large data migration for a application with approximately 17,000 users. The legacy application stored the passwords in a unsafe way – plaintext – so we had to encrypt ’em al during the migration. Our weapon of choice was bcrypt, and using the BlowfishPasswordEncoderBundle implementing was made easy. Using bcrypt did introduce a new problem: encoding all these records would take a lot of time! That’s where Spork comes in!

Setting up the Symfony2 migration Command

If possible I wanted to fork between 8 and 15 processes to gain maximum speed. We’ll run the command on a VPS with 8 virtual cores so I want to stress the machine as much as possible ;). Unfortunately the example on GitHub as well on his blog didn’t function any more so I had to dig in just a little bit. I came up with this to get the forking working:

The command generates the following output:

Make it a little bit more dynamic

To be really useful I’ve added some parameters so we can control the behavior a little more. As I mentioned before I wanted to control the amount forks so I added a option to control this. This value needs to be passed on to the constructor of the ChunkStrategy:

I also added a max parameter so we can run some tests on a small set of users, instead of the whole database. When set I pass it on to the setMaxResults method of the $query object.

Storing the results in MySQL: Beware!

In Symfony2 projects storing and reading data from the database is pretty straight forward using Doctrine2. However when you start forking your PHP process keep in mind the following:

  1. all the forks share the same database connection;
  2. when the first fork exits, it will also close the database connection;
  3. database operations in running forks will yield: General error: 2006 MySQL server has gone away

This is a known problem. In order to fix this problem I create and close a new connection in each fork:

That’s basically it. Running this command on a VPS comparable with c1.xlarge Amazone EC2 server did speed up things a lot. So if you’re also working on a import job like this which can be split up in separate tasks you know what to do… Give Spork a try! It’s really easy, I promise.

UPDATE 2013-03-19
As stated in the comments by Kris, you should close the connection just before forking. Example of how to do this: