Kaggle's startup advice: Ditch the daydreaming and start doing

Kaggle's Anthony Goldbloom

When INTHEBLACK first looked at data science start-up Kaggle, it was two-and-a-half years old and had never really faced a setback. Now, at six, Kaggle has shrugged off disappointments and is looking for the next way to grow.

When US insurance company Allstate sought to improve its vehicle-risk assessment tools, it didn’t call in the consultants. Instead, it handed its data over to problem-solving start-up Kaggle. The ensuing data analysis competition, which offered prize money of just US$10,000, resulted in an algorithm 340 per cent better than Allstate’s best internal system.

Stories such as this made Kaggle one of the poster children for data start-ups in Australia and around the world soon after its 2010 launch. Its big idea – running competitions to solve data problems – got big data enthusiasts excited and attracted customers such as MasterCard, Pfizer, Amazon and Facebook. They had the data, but Kaggle could supply the analytical talent to make that data answer important business questions.

When INTHEBLACK profiled Kaggle founder and CEO Anthony Goldbloom in late 2012, he could see a big, bright future.

“Just about every company needs to predict something, whether it is a bank predicting who’s going to default on a loan or an insurance company predicting who’s going to crash their car,” he said. “We’d like to see most of the world’s companies relying on Kaggle for their most valuable predictive modelling problems.”

“To the extent that companies aren’t willing to put their data up online, we are not really a great solution.”

More than three years on, Kaggle has problems of its own to solve. Last year, it was forced to cut jobs and close its energy-consulting arm, which it launched in 2014 as a means of building the business beyond data competitions. So what happened to Kaggle and what lessons does it hold for other start-ups?

Big time for big data

Kaggle entered the market at a time when businesses were just switching on to the potential of big data. Michael Lewis’s bestselling book Moneyball, which chronicles how the Oakland Athletics baseball club used advanced statistics to field a competitive team despite budget constraints, had helped to popularise data science. The big data strategies employed during the 2008 Obama election campaign also showed just how powerful numbers could be.

“Kaggle was very smart in that we’d gotten onto something that was really quite valuable, but we were also lucky in terms of timing,” says Goldbloom now.

“If the company [had] started two years earlier, there wouldn’t have been much of an appetite. Three years later, you’re already entering a pretty crowded space.”

Goldbloom moved the business from Melbourne to San Francisco in 2011 and, crucially, raised US$11 million in start-up funding from high-profile investors, including Google’s chief economist Hal Varian and PayPal co-founder Max Levchin. He also recruited a Kaggle competition winner and former McKinsey and AT Kearney management consultant Jeremy Howard to be Kaggle’s president and chief scientist.

Then Kaggle collided with reality.

The sound of one hand clapping

The problem was not a shortage of enthusiastic analysts. Data scientists couldn’t get enough of Kaggle, and still can’t. Goldbloom estimates about 1000 people sign up for its predictive modelling challenge each week and many of them are among the elite of the data analysis world.

Kaggle brings those data analysts together with organisations and their data in what economists call a two-sided market. Through 2012 and 2013, however, it became increasingly clear to Goldbloom and Howard that organisations weren’t as keen on using Kaggle as data scientists were.

“Companies sign up much slower than users,” Goldbloom reflects. “It’s a challenge.” He switches into the language of economics, his first profession. “We have a mismatch in the velocity of the two sides of our market.”

"[I] don’t really want to spend the next few years of my life on this.” Jeremy Howard, Kaggle ex-president

Many of Kaggle’s problems come down to this: organisations don’t want to expose their private data to prying eyes – even the eyes of the world’s best data analysts. Yet much of Kaggle’s proposition rests on getting them to do just that. Goldbloom admits that potential customers can be cagey. He argues they can afford to be more open (see breakout, right), but Kaggle isn’t winning that argument often enough.

“It certainly limits our market,” he says. “To the extent that companies aren’t willing to put their data up online, we are not really a great solution.”

To cope, Kaggle has had to evolve. However, even that has turned out to be complicated.

The oil slick

In 2013, as it tried to find ways to make money from data competitions, Kaggle found itself doing a lot of work in one booming field – oil and gas. If selling predictions to just about every company in the world was plan A, energy consulting started to look to Goldbloom like a viable plan B.

It looked less attractive to Howard. On 16 December 2013, he posted on the Quora website a comment that made clear he had left Kaggle: “I don’t know why the company decided to go all-in on [oil and gas] … I was travelling overseas when that decision was made.”

A few months later, Howard posted another comment on Quora: “I think it [Kaggle] has potential, but I’ve done an analytics vertical start-up before. [I] don’t really want to spend the next few years of my life on this.”

Professional Development: Developing character for preseverance and resilence

Goldbloom posted an upbeat explanation two days after Howard’s original message. “Our early work in oil and gas is incredibly encouraging,” he wrote, adding that the company was focusing on “using geological data to predict the best acreage to lease and to drill”.

Half a year later, in July 2014, oil and gas prices fell off a cliff. That was a complete surprise, not just to Kaggle – a specialist in predictive modelling – but also to most of its oil and gas industry user base. In six months, oil prices went from US$115 to US$50 per barrel, and most of Kaggle’s exciting new energy customers either stopped spending or disappeared entirely.

“We were doing a lot of work in the oil and gas industry and then the oil price crashed,” Goldbloom says levelly. “We lost a lot of revenue in a short period of time. It was quite difficult.” 

Kaggle’s energy consulting business, like so many others, had to shut up shop.

Time for plan C.

Determined but flexible

Start-up consultant Paul Graham, of Y Combinator, puts it well: in a start-up, you have to be “determined but flexible”. Kaggle might have closed its energy consulting business, but it still had a solid business in data analysis competitions. It still had buzz, too; MIT Technology Review ranked it 19th on the publication’s list of “50 Smartest Companies 2014”, just behind US mobile chip giant Qualcomm. Goldbloom and his team started looking for the next opportunity where they could apply their data expertise.

The most public new idea to date is a data science collaboration platform called Scripts, which allows its community of 500,000 analysts to share and expand upon their work together. Goldbloom says Scripts offers value beyond data competitions.

“That’s how we have gotten most of our customers – through our community,” he says. “Similarly, if we build a sharing and collaboration environment they really like, we expect that that’s how we’ll get their companies to start using it.”

Goldbloom plans to launch Scripts as a commercial product later this year. In the meantime, the company is fine-tuning the platform with the help of its existing community of data analysts.

“We are adding new features bit by bit and we see whether or not people use them,” he explains. “If they use them, then we enhance and keep those features. If people totally ignore them, we know it wasn’t a very interesting thing to do.”

For Scripts to be a commercial success, consumers must be willing to see beyond Kaggle’s roots in data competitions. “We are very well known for competitions, but the shape of the company is really changing,” says Goldbloom.

“We want to be not just a place for data scientists to go to work on data mining competitions, but the place people go to do all of their data science, whether it be in their company or their personal projects or competitions.”

In short, Kaggle is trying to leverage its reputation with data scientists into a growing new business. As Goldbloom acknowledges, this is a tricky task.

“One challenge that we have is moving the community with us. It takes time and careful, clear communication to do that and it’s just something that we are learning how to do.”

“Companies sign up much slower than users … it’s a challenge.” Anthony Goldbloom, Kaggle

However Scripts performs, Kaggle is no longer in the boom stage of a fresh start-up. It has endured collisions with reality, management differences and a massive market downturn. When MIT Technology Review issued its latest list of “50 Smartest Companies”, Kaggle no longer featured.

If all that has wounded Goldbloom a little, he’s dealing with the scars. Kaggle has just turned six.

“We are a young, ambitious company,” he says, “and so we don’t just want to be resting on the business that’s working, but [want to] keep looking for what’s going to take us to the next level.”

Why companies don’t like to share

Anthony Goldbloom believes there are three main reasons why companies may not wish to put their data in the public domain – the problem that has limited Kaggle’s original data-competition business model.

  1. Customer privacy This is an obvious one; however, Goldbloom points out that Kaggle doesn’t work with personally identifiable information. 
  2. Competitive intelligence Goldbloom says this is easy to navigate through data migration tricks. “There are ways of scaling data in such a way so you’re not really giving away very much by releasing it publicly, but it still allows [analysts] to crunch on it.”
  3. Companies don’t want others to know what they’re up to “I think that, having worked with a lot of companies, there’s very little that hasn’t already sort of bled out from one company to the next,” says Goldbloom. “The benefits of getting outside ideas [to solve] your problems actually outweighs this mythical idea of protecting IP.”

Kaggle’s solutions

Most of Kaggle’s competitions involve mining company data. However, it’s not only commercial enterprises that have sought the help of Kaggle and its community of data scientists. Here are three ways that Kaggle’s community has solved problems beyond the bottom line.

Challenge: Find ways to measure the true shape of a galaxy by mapping dark matter.
Participants were given 100,000 galaxy images, blurred to simulate the effects of dark matter. The winning algorithm outperformed the state-of-the-art techniques most commonly used in astronomy for mapping unknown areas of the universe.
Client: NASA.
Winners: Cosmologists David Kirkby and Daniel Margala, from the University of California.
Prize: An expenses-paid trip to NASA’s Jet Propulsion Laboratory, where the winners presented their findings.

Challenge: Diagnose heart disease from MRI scans.
Data analysts were asked to analyse MRI images from more than 1000 patients and develop an algorithm to detect heart disease in an instant (a cardiologist spends about 20 minutes assessing MRI scans).
Client: US National Institute of Health and Booz Allen Hamilton.
Winners: US hedge fund traders Qi Liu and Tencia Lee.
Prize: US$200,000 awarded to the top three placegetters.

Challenge: Identify who will go to hospital in the next year.
More than 71 million people are hospitalised every year in the US, creating at least US$30 billion in annual avoidable costs. Contestants were given three years of historical medical data of anonymous people and were asked to devise an algorithm to predict how many days each person would spend in hospital in the following year. With this knowledge, care providers may now reach patients before emergencies occur.
Client: Not-for-profit US health organisation Heritage Health.
Winners: Team POWERDOT, which included Phil Brierley, analytics consultant at Tiberius Data Mining in Australia; David Vogel, chief scientist of Voloridge Investment Management in the US; and Willem Mestrom, business intelligence specialist at Independer in the Netherlands.
Prize: US$500,000.

Goldbloom’s advice for start-ups

Anthony Goldbloom thinks entrepreneurs should spend less time thinking and more time doing. 

“We spent a lot of time daydreaming about what we could do – I think that you should just do it,” he says. “Don’t try to create the perfect business plan. Just get in and do it, otherwise you end up being paralysed by all the things that might go wrong.”

Goldbloom is inspired by a quote from Pixar president Ed Catmull, published in Harvard Business Review: “If you give a good idea to a mediocre team, they will screw it up; if you give a mediocre idea to a great team, they will either fix it or throw it away and come up with something that works.”

Even though it’s a management quote, Goldbloom says it also applies to starting a company. 

“If you’re a smart, confident person, just get going and whatever crappy idea you have to begin with, you’ll refine it and hopefully be able to turn it into something good.”


Like what you're reading? Enter your email to receive the fortnightly INTHEBLACK e-newsletter.

June 2016
June 2016

Read the June issue

Each month we select the must-reads from the current issue of INTHEBLACK. Read more now.

TABLE OF CONTENTS