Speed up your data migration with Spork

One of the blogs I like to keep an eye on is Kris Wallsmith his personal blog. He is a Symfony2 contributor and also author of Assetic and Buzz. Last year he wrote about a new experimental project called Spork: a wrapper around pcntl_fork to abstract away the complexities with spawning child processes with php. The article was very interesting, although I didn’t had any valid use case to try the library out.  That was, until today.

It happens to be we were preparing a rather large data migration for a application with approximately 17,000 users. The legacy application stored the passwords in a unsafe way – plaintext – so we had to encrypt ‘em al during the migration. Our weapon of choice was bcrypt, and using the BlowfishPasswordEncoderBundle implementing was made easy. Using bcrypt did introduce a new problem: encoding all these records would take a lot of time! That’s where Spork comes in!

Setting up the Symfony2 migration Command

If possible I wanted to fork between 8 and 15 processes to gain maximum speed. We’ll run the command on a VPS with 8 virtual cores so I want to stress the machine as much as possible ;) . Unfortunately the example on GitHub as well on his blog didn’t function any more so I had to dig in just a little bit. I came up with this to get the forking working:

The command generates the following output:

ricbra@goku:/var/www/wb$ app/console wb:generate:password
Greeting from 3743 with id 1
Greeting from 3743 with id 2
Greeting from 3743 with id 3
Greeting from 3744 with id 4
Greeting from 3744 with id 5
Greeting from 3744 with id 6
Greeting from 3745 with id 7
Greeting from 3745 with id 8
Greeting from 3745 with id 9
Greeting from 3746 with id 10

Make it a little bit more dynamic

To be really useful I’ve added some parameters so we can control the behavior a little more. As I mentioned before I wanted to control the amount forks so I added a option to control this. This value needs to be passed on to the constructor of the ChunkStrategy:

namespace Netvlies\AcmeMigrationBundle\Command;
class GeneratePasswordCommand extends ContainerAwareCommand
    protected function configure()
            ->addOption('forks', 'f', InputOption::VALUE_REQUIRED, 'How many childs to be spawned', 4)
    protected function execute(InputInterface $input, OutputInterface $output)
        $forks = (int) $input->getOption('forks');

        $manager = new ProcessManager(new EventDispatcher(), true);
        $strategy = new ChunkStrategy($forks);
        $manager->process($iterator, $callback, $strategy);

I also added a max parameter so we can run some tests on a small set of users, instead of the whole database. When set I pass it on to the setMaxResults method of the $query object.

Storing the results in MySQL: Beware!

In Symfony2 projects storing and reading data from the database is pretty straight forward using Doctrine2. However when you start forking your PHP process keep in mind the following:

  1. all the forks share the same database connection;
  2. when the first fork exits, it will also close the database connection;
  3. database operations in running forks will yield: General error: 2006 MySQL server has gone away

This is a known problem. In order to fix this problem I create and close a new connection in each fork:

That’s basically it. Running this command on a VPS comparable with c1.xlarge Amazone EC2 server did speed up things a lot. So if you’re also working on a import job like this which can be split up in separate tasks you know what to do… Give Spork a try! It’s really easy, I promise.

UPDATE 2013-03-19
As stated in the comments by Kris, you should close the connection just before forking. Example of how to do this:

Symfony2 authentication provider: authenticate against webservice

The past few days I have really be struggeling with the Symfony2 security component. It is the most complex component of Symfony2 if you ask me! On the symfony.com website there is a pretty neat cookbook article about creating a custom authentication provider. Despite the fact that it covers the subject pretty well, it lacks support for form-based authentication use cases. In the current Symfony2 project I’m working on, we’re dealing with a web service that we need to authenticate against. So the cookbook article was nothing more then a good introduction unfortunately.

Using DaoAuthenticationProvider as example

Since we don’t want to reinvent the wheel, a good place to start is by investigating the providers that are in the Symfony2 core. The DaoAuthenticationProvider is a very good example, and used by the default form login. We are going to add a few pieces of code, so we can use the listener and configuration settings. The only thing we want to change are the authentication itself and the user provider. If you take a look at the link above, you will see the only thing we need to change is the checkAuthentication method. But, a few more steps are needed in order to make things function correctly. Let’s begin! :)

We also need a UserProvider!

First things first: we need a custom user provider. The task of the user provider is load the user from a source so the authentication process can continue. Because a user can already be registered at the webservice a traditional database user provider won’t work. We need to create a local record for every user that registers or logs in and doesn’t have an account. So basically the user provider is only responsible for loading and creating a user record. In this example I save the user immediately when there is no record; probably you want to do this after authenticating.

The code for the use provider looks like this:


namespace Acme\DemoBundle\Security\Core\User;

use Symfony\Component\Security\Core\User\UserProviderInterface;
use Symfony\Component\Security\Core\User\UserInterface;
use Symfony\Component\Security\Core\Exception\UsernameNotFoundException;
use Symfony\Component\Security\Core\Exception\UnsupportedUserException;
use Acme\DemoBundle\Service\Service;
use Acme\DemoBundle\Entity\User;
use Doctrine\ORM\EntityManager;

class WebserviceUserProvider implements UserProviderInterface
    private $service;
    private $em;

    public function __construct(Service $service, EntityManager $em)
        $this->service  = $service;
        $this->em       = $em;

    public function loadUserByUsername($username)
        // Do we have a local record?
        if ($user = $this->findUserBy(array('email' => $username))) {
            return $user;

        // Try service
        if ($record = $this->service->getUser($username)) {
            // Set some fields
            $user = new User();
            return $user;

        throw new UsernameNotFoundException(sprintf('No record found for user %s', $username));

    public function refreshUser(UserInterface $user)
        return $this->loadUserByUsername($user->getUsername());

    public function supportsClass($class)
        return $class === 'Acme\DemoBundle\Entity\User';

    protected function findUserBy(array $criteria)
        $repository = $this->em->getRepository('Acme\DemoBundle\Entity\User');
        return $repository->findOneBy($criteria);

We add it to our services configuration in app/config/services.yml:

<container xmlns="http://symfony.com/schema/dic/services"
    xsi:schemaLocation="http://symfony.com/schema/dic/services http://symfony.com/schema/dic/services/services-1.0.xsd">

        <parameter key="security.user.provider.acme.service.class">Acme\DemoBundle\Security\Core\User\WebserviceUserProvider</parameter>
        <parameter key="acme.service.class">Acme\DemoBundle\Service\Service</parameter>

        <service id="acme_demo_webservice" class="%acme.service.class%">
        <service id="acme_demo_user_provider" class="%security.user.provider.acme.service.class%">
            <argument type="service" id="acme_demo_webservice" />
            <argument type="service" id="doctrine.orm.entity_manager" />

Creating the AuthenticationProvider

As I said earlier we are going to base our provider on the DaoAuthenticationProvider. In my bundle I created a new class called ServiceAuthenticationProvider. Like our example we are extending the abstract UserAuthenticationProvider. Besides the checkAuthentication method we also must implement the retrieveUser method. We inject the service through the constructor, so the class looks like this:

namespace  Acme\DemoBundle\Security\Core\Authentication\Provider;

use Symfony\Component\Security\Core\Encoder\EncoderFactoryInterface;
use Symfony\Component\Security\Core\User\UserProviderInterface;
use Symfony\Component\Security\Core\User\UserCheckerInterface;
use Symfony\Component\Security\Core\User\UserInterface;
use Symfony\Component\Security\Core\Exception\UsernameNotFoundException;
use Symfony\Component\Security\Core\Exception\AuthenticationServiceException;
use Symfony\Component\Security\Core\Exception\BadCredentialsException;
use Symfony\Component\Security\Core\Authentication\Token\UsernamePasswordToken;
use Symfony\Component\Security\Core\Authentication\Provider\UserAuthenticationProvider;
use Acme\DemoBundle\Service\Service;

class EpsAuthenticationProvider extends UserAuthenticationProvider
    private $encoderFactory;
    private $userProvider;
    private $service;

     * @param Service $service
     * @param \Symfony\Component\Security\Core\User\UserProviderInterface $userProvider
     * @param UserCheckerInterface $userChecker
     * @param $providerKey
     * @param EncoderFactoryInterface $encoderFactory
     * @param bool $hideUserNotFoundExceptions
    public function __construct(Service $service, UserProviderInterface $userProvider, UserCheckerInterface $userChecker, $providerKey, EncoderFactoryInterface $encoderFactory, $hideUserNotFoundExceptions = true)
        parent::__construct($userChecker, $providerKey, $hideUserNotFoundExceptions);
        $this->encoderFactory   = $encoderFactory;
        $this->userProvider     = $userProvider;
        $this->service          = $service;

     * {@inheritdoc}
    protected function checkAuthentication(UserInterface $user, UsernamePasswordToken $token)
        $currentUser = $token->getUser();

        if ($currentUser instanceof UserInterface) {
            if ($currentUser->getPassword() !== $user->getPassword()) {
                throw new BadCredentialsException('The credentials were changed from another session.');
        } else {
            if (!$presentedPassword = $token->getCredentials()) {
                throw new BadCredentialsException('The presented password cannot be empty.');

            if (! $this->service->authenticate($token->getUser(), $presentedPassword)) {
                throw new BadCredentialsException('The presented password is invalid.');

     * {@inheritdoc}
    protected function retrieveUser($username, UsernamePasswordToken $token)
        $user = $token->getUser();
        if ($user instanceof UserInterface) {
            return $user;

        try {
            $user = $this->userProvider->loadUserByUsername($username);

            if (!$user instanceof UserInterface) {
                throw new AuthenticationServiceException('The user provider must return a UserInterface object.');

            return $user;
        } catch (UsernameNotFoundException $notFound) {
            throw $notFound;
        } catch (\Exception $repositoryProblem) {
            throw new AuthenticationServiceException($repositoryProblem->getMessage(), $token, 0, $repositoryProblem);

Note the call to $this->service->authenticate where the magic happens. The retrieveUser method receives a User instance from our user provider. Although this is not really clear in the code above, it will be after configuration in the service container. We use the configuration from the Symfony core and adjust it to our needs:

<service id="security.authentication_provider.acme_demo_webservice" class="%security.authentication.provider.acme_service.class%" abstract="true" public="false">
  <argument type="service" id="acme_demo_webservice" />
  <argument /> <!-- User Provider -->
  <argument type="service" id="security.user_checker" />
  <argument /> <!-- Provider-shared Key -->
  <argument type="service" id="security.encoder_factory" />

Please note the empty arguments. Look a bit strange, huh? These will be magically filled when the container is build by our Factory! This is a bit tricky, and the cookbook explains pretty wel, so I suggest to take a look there. We are extending the FormLoginFactory because we want to change it bit:


namespace Acme\DemoBundle\DependencyInjection\Factory;

use Symfony\Component\Config\Definition\Builder\NodeDefinition;
use Symfony\Component\DependencyInjection\DefinitionDecorator;
use Symfony\Component\DependencyInjection\ContainerBuilder;
use Symfony\Component\DependencyInjection\Reference;
use Symfony\Bundle\SecurityBundle\DependencyInjection\Security\Factory\FormLoginFactory;

class SecurityFactory extends FormLoginFactory
    public function getKey()
        return 'webservice-login';

    protected function getListenerId()
        return 'security.authentication.listener.form';

    protected function createAuthProvider(ContainerBuilder $container, $id, $config, $userProviderId)
        $provider = 'security.authentication_provider.acme_demo_webservice.'.$id;
            ->setDefinition($provider, new DefinitionDecorator('security.authentication_provider.acme_demo_webservice'))
            ->replaceArgument(1, new Reference($userProviderId))
            ->replaceArgument(3, $id)

        return $provider;

Add the builder in the Acme\DemoBundle\AcmeDemoBundle.php file:


namespace Acme\DemoBundle;

use Symfony\Component\HttpKernel\Bundle\Bundle;
use Symfony\Component\DependencyInjection\ContainerBuilder;
use Acme\DemoBundle\DependencyInjection\Factory\SecurityFactory;

class AcmeDemoBundle extends Bundle
    public function build(ContainerBuilder $container)
        $extension = $container->getExtension('security');
        $extension->addSecurityListenerFactory(new SecurityFactory());

Finally, change your security config:

    secure_all_services: false
    expressions: true

        Symfony\Component\Security\Core\User\User: plaintext

        ROLE_ADMIN:       ROLE_USER

            id: acme_demo_user_provider

            pattern:  ^/(_(profiler|wdt)|css|images|js)/
            security: false

            pattern:  ^/demo/secured/login$
            security: false

            pattern:    ^/demo/secured/
                check_path: /demo/secured/login_check
                login_path: /demo/secured/login
                provider: acme_provider
                path:   /demo/secured/logout
                target: /demo/


The webservice-login key activates our authentication provider. The user provider is defined under providers as acme_provider with the corresponding service id.
I used the AcmeDemo bundle from symfony-standard repository, so you could just copy paste most of my code to see everything in action! Only thing you need to provide yourself is a dummy webservice.

Happy coding!

Western Digital Green Caviar WD10EADS and hdparm problems

A few weeks ago I ordered myself a new eco friendly home server. The machine will be acting as my content server: running samba, sabnzbd, nginx and mysql. Most of the time it will be idling, so my goal was to build a server with energy saving hardware. The old server already had a 1TB Western Digital Green Cavair drive in it which I bought in 2010. So I placed an order for the following parts:

  1. MSI H61M-P22 motherboard
  2. Intel Pentium G620
  3. Transcend JetRam JM1333KLN-2G
  4. be quiet! Pure Power L7 300W

Within a few days I received the whole shipment, and a few hours later my server was up and running again :) . After installing and configuring Debian Squeeze I measured the power usage: 22 ~ 25 watt idle, not bad! I didn’t tweak anything at all, so I started my journey. Spinning down the disk after a period of idling was my next goal. For Linux users, hdparm is the tool which gets the job done. On Debian or Ubuntu all you need to do is:

# apt-get install hdparm

To configure your drive you have to know its name. One way to figure that out is

# fdisk -l

Output on my system:

root@vegeta:~# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0c8b0915

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdb: 16.0 GB, 16022241280 bytes
255 heads, 63 sectors/track, 1947 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b2232

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1        1863    14957568   83  Linux
/dev/sdb2            1863        1948      686081    5  Extended
/dev/sdb5            1863        1948      686080   82  Linux swap / Solaris

The /dev/sdb device is the USB key running the OS, while /dev/sda is my WD Green Caviar drive.
To set the spindown time all you have to do is:

# hdparm -S 5 /dev/sda

 setting standby to 5 (25 seconds)

Of course you should change the time out if you want. However, after waiting 25 seconds I did a status check:

# hdparm -C /dev/sda

 drive state is:  active/idle

What the? :( It obviously didn’t work! After trying some more time outs I concluded it didn’t really work. Googling for my drive and hdparm I quickly found a lot of other people which ran in the same problem. Furthermore I discovered that Linux and the old Western Digital Green Caviar drives don’t play well with each other. To summarize: the drive puts heads into parking position after 8 seconds idle time! This causes a very high Load_Cycle_Count. To check this, you have to install the smartctl utility:

# apt-get install smartmontools

Then, check the S.M.A.R.T. data for you drive:

# smartctl -a /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   105   088   021    Pre-fail  Always       -       7741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       453
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       9144
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       399
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       40
193 Load_Cycle_Count        0x0032   187   187   000    Old_age   Always       -       41449
194 Temperature_Celsius     0x0022   117   100   000    Old_age   Always       -       30

When issuing this command for a couple of minutes, I saw the number growing rapidly: every 3 minutes a new hit. Not good! :(

Fix the drive for Linux usage

Western Digital published a advisory about this problem. Basically we as Linux users are left in the dark. They provide a tool to reset of reconfigure the timeout, but it only runs on MS-DOS…

The good news is that the utility is present on the Ultimate Boot CD. Burn the ISO on a disc (or make a bootable USB key) and remove the timeout with:

wdidle3.exe /D

After that you’re drive will pay attention to the settings provided by hdparm, and the Load_Cycle_Count won’t be growing that rapidly. The count on my server grows by 2 counts per day, instead of ~ 200! :) And when the drives is standby my server consumes 18 ~ 20 watt!

How to create a VM with PHP 5.4 using Vagrant and Puppet

Everybody PHP developer who didn’t live under a rock the past few months must have heard of the upcoming release of PHP 5.4. Well, March 1 it was finally there: the official release of PHP 5.4!

Because it definately will take some time before we can install it with our favorite package manager, I decided to create a small Puppet manifest in combination with Vagrant that will build a virtual machine. Normally, you have to compile PHP from source in order to try it that quickly after it has been released. However, the nice dudes from dotdeb.org compiled them already for us, and provide it via their repository. Nice! :)
Furthermore, Vagrant provides us a cool Ubuntu server image, ready to rock with Puppet. So, let’s get thing of the ground shall we? (pro tip: scroll all the way down to simply clone my git repository with all the code ;) )


In order to get things running smoothly you have:

  1. Installed VirtualBox 4.1.x
  2. Installed Vagrant
  3. Some IDE for editing Puppet manifests (I prefer Geppetto)

Creating our project structure

Let’s start with creating a basic directory structure for storing our files needed. Fire up Eclipse/Geppetto and start a new project in your workspace. Create the following structure:

  • manifests
  • modules
    • php54
      • files
      • manifests
  • www

Writing the Puppet manifest

There are a few things we need to accomplish with Puppet, in chronological order:

  1. Add the dotdeb.org repository to/etc/apt/sources.list
  2. Add the dotdeb.org GPG key
  3. Run apt-get update
  4. Run apt-get install php5

Because we can bucket files to the VM easily with Puppet, I choose to supply a modified sources.list so Puppet takes care of copying it into the VM. Then, I download the GPG key with the famous wget utility and pipe it into apt-key. The exec call to apt-get update speaks for itself, and last but not least I tell Puppet to install the latest php5 package.

With the require directive I make sure that all commands are executed in the right order.

The contents of the init.pp file in the php54 module looks like this:

class php54 {
	file { "/etc/apt/sources.list":
		ensure => file,
		owner => root,
		group => root,
		source => "puppet:///modules/php54/sources.list",
	exec { "import-gpg":
		command => "/usr/bin/wget -q http://www.dotdeb.org/dotdeb.gpg -O -| /usr/bin/apt-key add -"

	exec { "/usr/bin/apt-get update":
		require => [File["/etc/apt/sources.list"], Exec["import-gpg"]],

	package { [
	] :
		ensure => latest,
		require => Exec["/usr/bin/apt-get update"]

Also we create a sources.list file in the “files” directory (you could change the Debian mirrors):

deb http://mirror.bytemark.co.uk/debian/ squeeze main
deb-src http://mirror.bytemark.co.uk/debian/ squeeze main

deb http://security.debian.org/ squeeze/updates main
deb-src http://security.debian.org/ squeeze/updates main

deb http://mirror.bytemark.co.uk/debian/ squeeze-updates main
deb-src http://mirror.bytemark.co.uk/debian/ squeeze-updates main

deb http://packages.dotdeb.org/ squeeze-php54 all

Last thing I do is create the entry point for Puppet, namely the site.pp file in the manifests directory:

include php54

All I do is including the php54 module which will handle all the magic for us.

Creating the virtual machine

Now Vagrant comes in to use. Create a Vagrantfile in your project root with the following content:

Vagrant::Config.run do |config|
  config.vm.box = "squeeze32"
  config.vm.host_name = "php54"
  # taken from vagrantbox.es
  config.vm.box_url = "http://mathie-vagrant-boxes.s3.amazonaws.com/debian_squeeze_32.box"
  config.vm.boot_mode = :gui
  config.vm.network :hostonly, ""
  config.vm.share_folder "www", "/var/www", "./www"

  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "manifests"
    puppet.module_path = "modules"
    puppet.manifest_file = "site.pp"

I’m using a Debian Squeeze box from vagrantbox.es here, credits go to the original author. I’m making use of the VirtualBox shared folders. These are not really fast, but will do for testing purposes. If you want some more advanced sharing I suggest NFS or Samba if you are on Windows.

Now, all left to do is start the VM. Open up a terminal and do vagrant up in your project root:

$ vagrant up

Navigate to with your favorite browser and have some happy testing :)

For all the lazy people out there, you can start the box with just 3 commands:

$ git clone git://github.com/ricbra/php54-sandbox.git
$ cd php54-sandbox
$ vagrant up

How to use Symfony2 entities from a bundle in vendor/bundles

Today I was working on a bundle for generating invoices. Since we have some kind of invoice functionality in many projects in the past, my goal was to create a nice reusable bundle.
So I started with creating an empty bundle and moved it to /vendor/bundles/Netvlies/Bundle/InvoiceBundle.

With the app/console I started generating a entity for persisting particular data for an invoice:

$ app/console doctrine:generate:entity

This is still all pretty straight forward… so to complete this difficult task I tried to create to update the schema:

$ app/console doctrine:schema:update --force
Nothing to update - your database is already in sync with the current entity metadata.

Hmmz, wtf? :( For some obvious reason it was ignoring my new bundle outside the src structure? After some little investigation I discovered you have to add it to your ORM mapping like this:

        auto_generate_proxy_classes: %kernel.debug%
        default_entity_manager: default
                connection: default
                    NetvliesSomeAppBundle: ~
                    # add it like this
                    NetvliesInvoiceBundle: ~

Creating a CentOS 6.2 base box for Vagrant

One of the cool things I stumbled upon last year at the Dutch PHP Conference was Vagrant. After some little experimenting I was convinced: this is the right tool for our development environment!

Since we’re running CentOS at the web agency I work for, I soon started searching for a nice base box to build upon. Not satisfied by the boxes available, I decided to create a base box myself.
Today we decided to switch to CentOS 6 for all our new boxes, so I had to build a new image for our developers to build on with Puppet and Vagrant. Since I had this free hosting account from Combell sponsored at PHP Benelux Conference I thought it would be nice to give something back to “the community” by writing my first blog post :) .

This tutorial assumes you have installed Virtual Box. First of all, we start with downloading an ISO image so we can install a fresh instance of CentOS. Pick a mirror nearby and download the right image. We’ll be using the netinstall ISO since we want to keep the size of the image as small as possible.
I hear you thinking: why doesn’t he use the minimal ISO if size matters? Believe me, the minimal is *really* minimal. Too minimal is you ask me!

While the ISO is downloading, let’s fire up Virtual Box and create a new virtual machine. Choose the name you want and set OS to “Linux” and version to “Red Hat”. Also create a virtual disk with the desired space and pick “Dynamically allocated”.
Once you’re done with creating the VM, don’t forget to disable audio and USB. Also make sure you set the base memory to something like 700 MB. Otherwise the GUI installer won’t work, and you get the text installer which is limited!


Next thing is to fire up the VM, and Virtual Box’s “First Run Wizard” will pop up. Pick the ISO you just downloaded and click “Start”. After it’s booted, choose the option for installation and hit “return”. If all went fine, the installer will pop up. A few things to keep in mind here:

  • disable ipv6
  • select HTTP installation method and enter a mirror nearby; for using the Dutch Leaseweb mirror like I did you enter “http://mirror.nl.leaseweb.net/centos/6.2/os/i386/” (just replace the hostname with your preferred mirror’s hostname)

CentOS netinstall mirror

After the kernel is downloaded, you’ll see the GUI installer.
Follow the wizard and select partition layout (I use the default settings).
A few important things:

  • Set vagrant as the root password
  • Set vagrant-centos62 as hostname (Vagrant conventions)
  • In the software selection window make sure you choose minimal as the set, and also choose “Customize now” at the software selection:

Software selection screen

In the next window unselect all packages (only one is selected if I remember correctly). After that you’re done, and the wizard will start downloading and installing the box.

Once it’s done you’ll be prompted to reboot. Before rebooting, make sure you remove the netinstall ISO as CD attachment (in the “storage” settings). Also, to make things more easy during the configuration of our box forward the SSH port like this (select “Network,” “Adapter 1,” and then “advanced settings” and select “port forwarding”):
Port forwading settings Virtual Box
Now boot the VM (don’t forget to enjoy the new animated boot screen ;) ).

Configuration for Vagrant

Once booted, connect to your VM via SSH:

ssh root@localhost -p 2222

Since there’s barely anything on the machine right now, I start with installing my favorite editor and some other stuff we’ll need:

yum install nano wget gcc bzip2 make kernel-devel-`uname -r`

Next we are going to install the VirtualBox Guest Additions. Click on your VirtualBox window and select “Devices” and “Install Guest Additions”. The option is in the VM window, and not in the VirtualBox control panel. Install them like this (ignore the erros you get, this is because we aren’t running any fancy GUI):

mkdir /media/cdrom
mount /dev/cdrom /media/cdrom
sh /media/cdrom/VBoxLinuxAdditions.run

Because we’ll be provosioning the VM with Puppet, we start with downloading the EPEL RPM package:

wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm

Add it:

rpm -ivh epel-release-6-5.noarch.rpm

Verify with:

yum repolist

Then install Puppet:

yum install puppet

I personally prefer installing Puppet with yum, but you could also install it via gems or any of the other methods on the official installation guide. Installing with yum auto resolves dependencies, and with CentOS 6 we don’t have an ancient Ruby version anymore ;) .

In order to keep things speedy, add the following line to the /etc/ssh/sshd_config file (it will disable DNS lookups):


Add vagrant user and set permissions

We’re almost there. Only thing left to do is add the vagrant user so Vagrant can log in and build our box.
Start with creating the user and adding it to the “admin” group (set the password to vagrant as stated on the Vagrant base box documentation):

groupadd admin
useradd -G admin vagrant

Now we only have to make some changes to the sudoers file. Do this with visudo (or manually edit /etc/sudoers, discouraged):


There are a couple of things that need to be changed:

  • Add SSH_AUTH_SOCK to the env_keep option
  • Find the line with Defaults requiretty and disable it by placing a # in front
  • Add the line %admin ALL=NOPASSWD: ALL so that the vagrant user can sudo without password

Last but not least we’re going to add the public key so Vagrant can easily SSH into our box. Login with the vagrant user:

ssh vagrant@localhost -p 2222
mkdir .ssh
curl -k https://raw.github.com/mitchellh/vagrant/master/keys/vagrant.pub > authorized_keys
chmod 0755 .ssh
chmod 0644 .ssh/authorized_keys

Please note that I’m using the public insecure pair as described on the readme. If you’re not planning to share the box you probably want to use the config.ssh.private_key_path option in your Vagrantfile.

Package the box

Now first let’s clean up:

yum clean all

Shutdown the box and package it. Replace centos62-32 with the name of your VM:

vagrant package --base centos62-32

Optionally you can also add a Vagrantfile into your base box.