Friday, March 20, 2015

HippyVM goes to Y Combinator and fails

tl;dr; We decided to go to Y combinator with HippyVM, our high performance PHP implementation, and we did not get through after two rounds of interviews.

But I suppose there is more to it, so keep reading....

The whole story started with a small disaster, but let's start at the beginning. We applied to Y combinator a bit haphazardly in 2012 for the 2013 summer batch, without expecting the interview to get through. The main reason for me to apply was precisely the 7 ideas talk done by Paul Graham at Pycon US as a keynote mentioning the "sufficiently smart compiler". For those readers who don't know, PyPy is a fast Python compiler, but we also developed a language and a framework called RPython that's suitable for implementing fast dynamic languages, so we decided to check if it works for PHP, which is how HippyVM was born.

Well, I thought, we have a framework that's as close as it gets this days to "sufficiently smart compiler"; so I decided to submit -- why not. When we got the Y combinator invitation, I was in Europe at the time, out from my usual place of residence, South Africa. We got tickets, went to the airport and.... it turned out my visa for the US had been left at home. Note: US tries not to admit the fact that they keep visa information in any sort of system, so if you get a new passport you are either allowed to use your old passport or you need to apply for a new visa. No way to transfer to a new passport. Oh well -- fortunately for the most part we live in the 21st century and a few calls, DHLs and tickets later, I landed in San Francisco for a weekend with the interview scheduled for Saturday. That ended up in 3h of being detained at SFO, since nobody flies to SF for a weekend carrying two sets of clothes a laptop and a sleeping bag.

The idea

The idea was simple - we have enough expertise in compilers to do hard things and PHP is the most widely deployed dynamic language. Also, people are selling various "PHP optimizers" for money that don't really do much. We can do better. At the time HHVM was really not working very well and there was no other competition.

The actual interview

We actually ended up having two sets of interviews, which I think is pretty unusual. The first team was probably very confused, so they sent us down to the second one. The positive part of the interview was that people (at least those that use Python) generally recognize our work. The negative part was that 3 months is by far not enough to bring any tangible results in the compiler world. We required 1-2 years of work to provide anything tangible, and that does not fit into their model. Paul Buchheit asked us half-jokingly why all the cool compiler guys are from Europe (which is as far as I know not true, but Europe is overrepresented). I didn't have an answer at the time but later that day it become blatantly obvious that it's all about long-term vision. Compilers take more time than Americans typically have in their sights. PyPy is 10 years old and it's "the new kid on the block". We were told we should be home cranking code until we can get to something showable. I walked out from the interview pretty sure we would not get in.

The aftermath

Unsurprisingly, we didn't get in. We ended up having a very good one day PyPy sprint in San Francisco. We do not fit in the model. Now this brings me to an interesting question, which is what Lars Bak told me -- there is no money in infrastructure like programming languages. Very few people are willing to invest in such companies and the contenders these days are all Open Source without a decent funding model or backed by a large corpo (Oracle, Google, Microsoft) or both. I have no idea how to go about sponsoring research like PyPy or building a business model around it. Despite bringing a lot of value to the system (and I don't mean just PyPy; also CPython, Ruby etc.) there does not seem to be a good way to build a business model.

There are good reasons why you want your infrastructure to be either Open Source or backed by a large stable entity, and I'm very much for that, the world is a better place than it was during the coldfusion days and we're all better off. However, we're missing a business model where infrastructure people can get attention from VCs and a revenue model that somehow corresponds to the value they're bringing.


HippyVM got a little funding at the beginning to get us to some sort of prototype. Within a bit over than a year to a point where we were able to run mediawiki with a significant speedup over Zend PHP. However, the HHVM team these days counts between 30-60 people (that's what I can guess from the photo) and is available for free. Sure, it's tied to Facebook, but it seems to be enough to deter any business in this area. We would not be able to outcompete HHVM by enough (usually enough is 2x faster) on real life workloads with a fraction of team and a fraction of their funding, so we went onto improving PyPy. We did achieve most of what HHVM does at a small percentage of the cost, but the difficulties in funding generally caused the HippyVM project to come to a stall.

What now?

I do consulting. Most of it is PyPy-related, so I'm pretty happy, however I'm still trying to find a model where basic research and infrastructure work can provide revenue which is related to the amount of value it's bringing to companies. Ideas welcome :-)


  1. Excellent post, and too bad that it's pretty hard to do a business model around these open source VM.
    Def added your blog to my rss :)

  2. I think the standard model with the open source companies is to either provide consulting around a framework that allows people to do X or some special sauce that enhances the workings (like Azul on the JVM.) If you could come out with a version of PyPy that overcame the limitations against using reflection or even the standard logging framework, people would pay for that. I would. PyPy is just awesome.

  3. Great post.
    I feel like the evolution of compilers is really driven by the corporation that mainly uses it, and of course can fund it. I hope that will change in the future.

  4. Hello Maciej, seems like we have a lot in common:

  5. HHVM is open source. Etsy, Wikipedia, Baidu, WP Engine, Box all use it. Being tied to facebook is not a negative, it's actually a huge positive. They are running rings around the zend team in what they are doing.

  6. You could consider crowdfunding.

  7. runs crowdfunding - success, moderate

  8. In case you haven't seen it, there's a great talk by Gary Bernhardt on this topic: Thanks so much for sharing the tale, you bring up interesting questions about the development of software moving forward.

  9. You guys have great assets if you want to start a startup. First is Python on iPhone/iPad or Android is a good business model like Rubymotion.

    Second is package PyPy with Twisted/Tornado/Django + CffiDatabaseDrivers + (Spark) like Typesafe in Scala. I think until now lots of people don't know how to speed up their web applications using PyPy. The some computation speed up advantage part of PyPy can be replaced by Numpy in CPython easily so most people still don't use PyPy.

  10. Great post, thanks for sharing your experience. I'm interested in HippyVM succeeding. I'm an Open Source marketer and crowdfunder.

  11. The Python -> PHP bridge seems like it should have commercial applications.

    I'm a huge ran of using Wordpress (especially in projects for clients), but for running data pipelines, and building SOA style applications in general I'd far prefer to use Python. Obviously I can use a task queue or simple webhooks, but having a deeper level of integration between languages would be more ideal.

    It would be great to be able to have Python handle data access and authorization and use PHP / Wordpress to render and serve the data to the client.

    Mainly I'm interested because it seems like most of the focus on PyPy development has been for scientific research, and I think it has the potential to be a great web serving platform.

    Especially with the STM approach to removing the GIL in python, having a thread pool of Python workers handle database and cache access (with tiered local and remote caches) seems like it would outperform a typical PHP framework.

  12. What about all the companies using Python2 still? Everyone is on the fence on what to do with their large codebases. I think a migration to PyPy4 makes more sense than CPython3.
    I think all you need to do is announce you're officially a permanent branch of Python2 and offering support for it under PyPy4. You could even call the implementation and language PyPy4 to not only avoid trademark issues going forward, and to build the brand.

    The money would come in from corporations with feature requests, bugs, or integrating backports to PyPy4 from Python3. Along with your recent post about supporting the Python C API you have a very strong base.
    Eventually integrating PyPy4 with PyPy-STM will be quite the product. If you were to do this, I'd drop development of PyPy3 because CPython3 would be your main competition for users. But snatching up all their lost users would be a major coup. I honestly think you'd be more important, thus wealthy, than CPython3 could ever be, overnight.
    There's plenty of work there, but you could build your own features if you wanted. I think these corporations would pay handsomely.

    As a Python2 dev, I can say many of us are looking to you guys to lead us to a better Python.