Under the Hood – Chango

I am always curious at what startups are using for the technology under the hood. It helps define the technologies that frontend and backend developers looking for jobs might start using. It helps understand what skills are being developed in the ecosystem and where competition or coopetition for talent may exist. And I am always curious at the emerging trends of technology adoption by local firms.

I have enjoyed the coverage of the technology at Toronto-born startups like BackType and the work of the team at BostonInno (in particular Kevin McCarthy’s The Tech Behind) and Phil Whelan in Vancouver. Dan Morel (@dpmorel) has started looking at the development processes used at Canadian startups (see: Supersize your Startup Dev Productivity & One is the Loneliest Number). If you are a Canadian startup that is building “interesting” technology, we’re actively looking for startups to profile.


Chango

We start the series with Chango and technical Q&A with Chris Sukornyk (LinkedIn, @sukornyk) and Mazdak Rezvani (LinkedIn, @mazdak) looking at the technology and the evolution of Chango platform. Chango closed $4.25MM in a Series B financing round led by Rho Canada and included iNovia CapitalMetamorphic VenturesExtreme Venture Partners and others. The Chango team has doubled over the last 6 months, they are roughly 25 people (including Hot Sh!t List member Josh Davey).

The online advertising landscape is changing rapidly, Chango is taking advantage of Real-Time Bidding to allow marketers to buy ads on the web in real-time. It means big data where time is critical because small fluctuations in price or users can have big impact on the overall effectiveness or cost.  The best analogy is a stock market, where instead of stocks, marketers can buy billions of ads each day based on how they think that websites ad slots will perform for them. The cool part about RTB is that you can combine data to put the right ad in front of the right user. This requires huge data processing skills and Chango processes information on about 60,000 ads a second.

What does Chango do?

We are an ad-buying platform that is built for online marketers looking to find new customers using display (banner ads). Our main goal is to eliminate wasted marketing dollars through smarter targeting. Specifically we show ads to people based on what they’ve searched for in the past – a technique we are pioneering called “Search Retargeting“.

Unlike traditional ad networks we exclusively buy ads using real-time bidding, a new technology that is rapidly changing the Display advertising industry and allowing marketers to layer data on top of their media buys.

Chango is unique in that it has access to billions of search terms, billions of incoming ad impressions, and has devised machine learning techniques for combining data from different sources to deliver more efficient and better-targeted campaigns for our advertisers.

What does your product architecture look like?

Software development that deals with “Big Data” typically has two types of challenges: big volume, or low latency. Chango has to deal with both of these issues simultaneously.

We receive over 90,000 requests per second (and growing monthly) from our ad exchange and data partners at peak times of the day. To make matters more interesting, we have approximately 80ms to respond to any incoming bid requests. Unfortunately 30-50ms of this limit is used up because of unavoidable network latency, this leaves us only 30ms to process each bid request. As a result, we have to constantly optimize our bidder subsystem around this challenge.

What are some of the tools and technologies you use?

Open Source is at the heart of our technology stack. Python is the common thread that binds all of our subsystems. Our entire infrastructure is written in Python. We use a (modified version of) the Tornado Web Server for our realtime servers, and Django for the front-end.

When dealing with super fast response times it’s critical to have a super-fast datastore at your disposal. That’s why we use a NoSQL database technology based on the proven memcached server.

The unsung open source hero of our infrastructure is HAProxy. HAProxy handles our load-balancing across the board. We use the “keepalived” feature of Linux to keep these servers in high-availability mode.

As far as our architecture is concerned we try to not have any major dependencies on any specific features of the third-party systems we choose. They just happen to work well for our current environment.

How did you get here?

Scalability is as much an art as it’s a science, but most importantly, it’s about keeping things simple. It took years of hard work, and careful tuning and measurement to arrive at where we are.

Originally we used Java for all server-side development and Ruby on Rails for front end development. The thinking was that we needed a rock-solid language for server architecture and a rapid development environment for front-end work. This concept served us well for a little while; however, in early 2010 we realized that Java was drastically slowing down our ability to iterate quickly and effectively. A single feature was taking days to build, test and deploy.

We bit the bullet and rebuilt the platform entirely in Python and it was probably the best decision we ever made. Not only do we have a consistent language across front end and server side development but it has enabled us to rapidly add features or test new ideas. We are fortunate to have access to the fantastic ecosystem created by the Python community.

Where do you host?

Our first approach back in 2009 was to leverage Amazon Web Services EC2 as a scaleable and cheap way to prototype the platform. That served us well for a while; however, the shared virtual environment meant that we had wildly variable server resources.

We shifted to Hosting.com knowing that we ultimately needed our own equipment and if we wanted a VM environment we would need to set one up ourselves. While Hosting.com provided good support at the time we wanted the rapid provisioning we were used to at EC2 with the power of dedicated hosting.

Ultimately we chose SoftLayer as our hosting provider of choice. SoftLayer offers a VM environment in addition with their “express” service that allows us to get 10s of new servers provisioned in about 3-4 hours! They have been extremely good about allowing us to occasionally provision a whole new parallel cluster as we do capacity planning.

How do you monitor your systems?

Monitoring is done through a combination of Nagios, PagerDuty, and Circonus among others. We have also built a real-time data visualization system that let’s us monitor both infrastructure, and campaign performance. We use this dashboard as our own NOC system hooked up to a TV that is mounted right in the engineering area!

What are your biggest development challenges?

We’ve got two distinct challenges. Our real-time, data processing, and systems engineering teams deal with problems of scalability and big data. As our business continues to grow, we need to re-examine our infrastructure and design choices. We have a very healthy culture of team collaboration, code-review, and refactoring.

Our Dashboard team has a different set of challenges. Our self-serve ad platform is very much like Adwords in that marketers can put down a credit card and launch a campaign themselves. We need to make this an extremely user-friendly system, while keeping it powerful enough to enable people to perform sophisticated reporting and campaign optimization.

How do you win?

The Chango business is all about putting the right ad in front of the right user at the right time. We made an early decision that data you know about a user (ie. search data) is only effective if it is combined with a proprietary bidding engine that can make decisions in real time.

Almost every DSP, data provider or ad network out there today does this by storing information about users in a client-browser “cookie”. They call this a user segment. The problem with user segments is that they are pre-computed and stored within the users browser. If you decide half way through your campaign that you need to adjust your audience (due to lack of performance) there is no way to do so since the information is hard coded in the users browser. The only option is to continue serving ads to this under-performing group of users and wait for that cookie to expire (typically 30 days).

At Chango we’ve decided to make everything real-time, including our decision about who to bid on, and how much to bid. Nothing about the user is pre-computed. So the Chango ‘cookie’ contains nothing more than a unique identifier that anonymously points to all the raw data we know about that person in our database.

Chango is hiring!

Python Developers! There are multiple open positions in Toronto, New York & San Francisco. But if I could have anything it would be talented developers that either know python or want to learn Python.


Interested in being profiled in our Under the Hood series, we are actively looking for Canadian startups building “interesting” technologies and solving “interesting” problems. Contact me by completing your initial Under the Hood submission.

Hack/Reduce Toronto

Hack/Reduce Toronto - June 18, 2011

We are pleased to be supporting Hack/Reduce Toronto. The rise of real-time computing, distributed sensors and big data have provided the ground work for development of a way to distributed the processing of these emerging large data sets across a cluster of computers. There are lots of Toronto and global companies leveraging the processing and analysis of large data sets to discover unique relationships in their data (Backtype, Postrank, BuzzData, Attachments.me, Google, and others. This is a wonderful opportunity for technical cofounders to get experience leveraging Map/Reduce, Hadoop and the shared expertise of local experts with some hands-on learning about big data.


What is Hack/Reduce?

Hack/Reduce is a free one-day big data hackathon. The goal is to extract valuable information from large datasets and learn how to work with big data. The event brings together Developers, Companies, Entrepreneurs and Students interested in Big Data.

Provided:

  • Free access to Amazon EC2 clusters that can be scaled up according to your needs.
  • Pre-loaded datasets (participants are encouraged to suggest datasets)
  • Introduction to Hadoop and Map/Reduce and the infrastrucure
  • Support from Hadoop and Map/Reduce experts
  • Food and drinks

At the end of the event, participating teams and developers get to present what they have done, what they learned and what problems they faced. It’s an opportunity to develop something great, learn Hadoop MapReduce and meet people interested in big data.

Who is it for?

Developers, researchers and students in big data or interested in working with big data. The best thing is if you have something you want to get done that requires a lot of computing power. Alternatively, you can come to learn to use Hadoop. Basically Hack/Reduce is about developers, working with new people, pizza, unlimited computing power and large data sets.

Who is involved?

Get Involved