Cloudera | Interview with its founder & CTO – Amr Awadallah
In Palo Alto (CA), we talked with entrepreneur Amr about the business model and history of the highly successful company Cloudera.
In the second part of the interview, Amr shares 7 key advices to entrepreneurs.
The transcription of the interview is included below.
Martin: So this time we are in Palo Alto in the Cloudera office. Amr, who are you and what do you do?
Amr: So I am one of the founders of Cloudera and I serve as the chief technology officer for the company.
Martin: Great. What is your background and what did you do before you started this company?
Amr: So let me go back a little bit actually. So, I’m from Egypt originally and I came to the US in 1995, so it’s about twenty years ago, to get my PhD degree from Stanford University. And my goal was to get my PhD and then go back to Egypt to teach. I really liked to teach, that was my dream when I was young is I’m going to be a professor and teach and that’s what I wanted to do. But then I frequently would say when I learned in Stanford, the entrepreneurship bug infected me and I got corrupted and I cared more about building companies than teaching, per se.
So a few years into Stanford I dropped out from my PhD program and I made my first start-up, which got acquired by Yahoo. So I end up at Yahoo and that was a small company, we’re about five people. And we were acquired for nine million dollars within one year, which was not bad. And then I spent eight years at Yahoo before I left Yahoo and joined a VC firm called the Accel Partners as was called an Entrepreneur in Resident, EIR. This is kind of a transition role where you go with the VC, and you spend some time researching what should be the thing you should do next. And then after three months with them they give us funding and Cloudera was started. So that’s briefly my history before Cloudera.
Martin: And two questions. What did you study at Stanford? What kind of topic? And then, the second thing is, in this entrepreneurial residence program, how did you get in touch with Accel? Did you know these guys before or just by accident?
Amr: Both are very good questions. So the first question, I was in the computer engineering department and I was studying essentially distributed large scale distributed systems. And I was doing my PhD with Professor Mendel Rosenblum and Mendel Rosenblum actually is one of the founders of the VMware. He’s a very nice guy, I can introduce you to him if you want to interview him as well.
Martin: Sure, thank you.
Amr: He’s an amazing guy. So I did my PhD. I actually did go back to Stanford and finished my PhD while I was working at Yahoo. So I had dropped out but I go back and finished. So virtual machines and distributed systems is the main topic.
And then on the other question about EIR and how do you get to be an entrepreneur residence. So usually, you don’t apply to be an entrepreneurial residence. Like VCs don’t open like, ‘Hey, we are hiring EIR’. Usually, the EIR thing happens because of connections and because the VC knows you from before and they want you to become and work with them before you do your next company. So in my specific situation, one of my previous managers at Yahoo, he had left Yahoo and joined that Firm as a VC. So he was there, he knew me very well because he was my manager at Yahoo. So when I was leaving, he said, ‘You have to come here and be in EIR’. My co-founder, Jeff Hammerbacher, who is the my co-founder at Cloudera, he comes from Facebook, a very similar story. So he was one of the early employees at Facebook. Excel Partners was one of the very early investors at Facebook, so same thing, they knew of him and when they heard he was leaving, they said, hey, come to Excel and work as an EIR. And that’s how I connected with Jeff, who’s my co-founder.
Martin: Ok, great. So you met over there at Accel?
Martin: Ok, great. And how did you come up with this idea of Cloudera?
Amr: So it came from my work experience. From my own work experience and Jeff’s work experience. And we have two other co-founders, Mike Olsen, who is our chairman of the board and the chief strategy officer. And then a fourth co-founder from Google, his name is Christophe Bisciglia though he left Cloudera two years and he’s now doing other company. He’s also kind of an interesting guy, I could connect you with him if you want to chat with him. So what was the question again?
The idea. Where did the idea come from? Yes. So, the idea essentially… In my work at Yahoo, I was responsible for doing BI and data analytics and the data science for Yahoo News, Yahoo Sports, Yahoo finance, Yahoo search, all the different products that Yahoo has. And I had to do a lot of analysis of what’s working, what’s not working, new features when they launch, how effective they are at retaining their users, etcetera, etcetera. And I had a bunch of challenges in my existing business intelligence data technologies I was using and at the same time, when I was at Yahoo, I was lucky as there was this other open source technology was being built, which is called a Hadoop, the name of technology. And being built inside of Yahoo for Yahoo search, how to build web index at scale.
But when we talked to the team, it was very clear that the technology solves a lot of problems that we had. So I tried the technology in my team, and then very quickly within a year, it just changed everything I do. And for me that was a very clear signal that this is a very good aspirin for anybody that has the headache of how do I manage big amounts of data or big data as its known today. Same thing happened with my co-founder Jeff Hammerbacher at Facebook. He used the Hadoop on his own infrastructure and he saw how effective it was in solving problems for him.
Martin: Ok, great.
Martin: Let’s talk about the business model of Cloudera. How does it work right now?
Amr: So first it’s important to note that business models evolve over time as the function of the company and its maturity. The more you understand your customers, the more you understand your business. So at the beginning, when we were first forming Cloudera, our business model was more structured around doing training and doing consulting or professional services for our customers. But then it was very clear that while you can make a lot of money when you’re doing training and consulting, it’s not high margin money because it’s a people business, you have to go and hire more people to be able to do more consulting and more training, so the margins are limited, how big your margins can be.
So we changed our business model to be a combination of still training and professional services but also having a software subscription business model as well. So right now we charge our customers as a function of how many servers our software is running on per year. So it’s a subscription per server per year. That’s how they contract with us today. And I should also note that we had a pivot-shifting Cloudera in our history and that’s why our name is Cloudera, by the way. So our name is Cloudera, it’s because initially, we were going to build this cloud platform where we put our software on the cloud, our customers upload their data, do their number crunching and then download the results. But within six months of doing that, it was clear that all of the big banks we want to work with, the big retailers, they were not comfortable giving out their data.
So we shifted company from being a cloud company to being a software company. So we give them software that they can then deploy within their organization or in the cloud if they want to, but most of them choose to deploy within, right now. So that was a big shift for us from being a cloud company to being a software company. But we kept the name Cloudera because it was a cool name.
Martin: Okay. What problem that the software solve for your clients?
Amr: It’s a very simple value proposition. So if you look at most of the legacy data technology, legacy systems like for example Teradata or Oracle or standard databases, standard databases are very good handling what we refer to as structured data. So it’s very well defined data where you have columns and the columns have types like that string for names and then date for date of birth, and then decimal for an amount, for a salary or something. Very well-defined, very well structured. And these systems were very good at doing that.
But the reality of the world today is we have multiple types of data. We have structured data that comes from databases but we have a lot of semi-structured data that comes from web servers, that come from mobile devices, and then we have unstructured data like PDF documents or emails or even images and videos. So future data systems which is what our system represent have the capability to absorb any data, whether they be structured, semi-structured or unstructured and then allow you to process that data in many different ways. So in a nutshell, our value proposition is we allow our customers to extract value for their business from all the data that they have and then use that data to ask bigger questions than they’re able to ask today.
Martin: And in terms of this unstructured data like form PDF files, do you need to teach your algorithm to extract this data and put them from an unstructured into a structured way or is it manually done by, for example, by the client who is teaching the algorithm? How does it work?
Amr: All of the above. So in some cases there are some standard format where we have really have parsers that know how to parse out the content and read out the content from these documents. So in this case there is a library, you just pick the parser that applies to the type of document that you’re trying to parse. But then you could have a more sophisticated document where you’re trying to extract the sentiment, an email and from that email, you’re now trying to extract, maybe that email somebody sent to the support team for a given company. And then, you want to extract was that customer upset? Was the customer happy? Was that customer neutral, when that email exchange took place? So that is more involved, for that you have to write codes that do what’s called sentiment. And that’s to extract that.
And then, there’s an ecosystem of partners that we work with now, other companies that are building tools around our platform that make it easier to do that. So for example there’s a company called Trifacta, it’s a very young start-up. There’s another one called Timr, T-I-M-R. There’s a number of one now, trying to make it easier to do that.
Martin: Ok. Assuming I have all the data and put it into a data warehouse, what else can the client do then with this data? Are there any kind of pre-defined reports I can generate or does the clients have to connect all the data so we can get some analysis insights on that?
Amr: So we are the platform. We are not a front-end application, we are the platform and think of us just like a database, except unlike a database like Oracle, our platform is much more flexible. So it can take data at any time, it is much more scalable, it can really scale to massive amounts of data. And it’s much more agile in terms of, it’s not just sequel, you can do sequel with it but you can also do search, you can do machine learning and there is many other types of workloads that it can run.
But still, it’s a platform, so now how do you connect that platform to applications? There is a lot of existing applications that just integrate with our platform. So companies like Click Track, Tableau, Microsage and Informatica, there’s a lot of companies out there that built applications that do visualizations and do that analysis that then connect into our platform using the APIs that we provide.
Martin: And are you also promoting in this type of ecosystem where you have different kinds of apps that once clients subscribe to Cloudera that they can choose from different types of apps, how they can analyze the data that you generated using Cloudera?
Amr: Ultimately we will want to have an equivalent of like App Store of big data. Where you just have an App Store and you go and you click on the icon of the app you want. We’re not there yet. Today, it’s still an enterprise software sale where we’ll have to go and talk to that company and sign a contract with them and then get the software and deploy it. So it’s a bit more heavy. But hopefully in the future, yes it will be a simple app within the Cloudera management interface, you’re going to see a bunch of icons for different apps and you just tick the app that you want but we’re not there yet.
Martin: How did you acquire the first customer and convince them to buy with you or try you?
Amr: We are lucky in the sense that our business model is also open source in nature. So our core product that we release, which is called the Cloudera distribution for Hadoop is 100% open source as also free. So what that helps do is it helps see the market where developers they look at it, they see it’s very powerful, they download it, they start to build apps on it and then once they build an app which is viable for their business then they come and they talk with us say, ‘Hey, can we have a relationship with your company to maintain that software, for us going forward’. And so for us, because of the open source nature of Hadoop, the initial customers were coming to us. And there was no other vendor out there when we started Cloudera that was supporting the Hadoop platform, we were the only one. So we got a lot of our initial kind of growth in the company was organic, just coming from customers that deployed our software.
Martin: Okay. What have been your thoughts on when you started out between bootstrapping the company and taking external money?
Amr: That’s a very good question. So in our case we, if you follow Cloudera’s history, we took a lot of money. At Cloudera we actually raised to date more than one billion dollars in funding which is a lot of money for a software company. But that comes because of the fact that this is an exploding market. Like, very quickly, we saw that this market is going so quick that technology is important but having that sales force that can really sell this technology worldwide is even more important. And you have to realize that when you are hiring sales, when you are hiring sales people, you have to pay their salaries for the first six, even twelve months before they start making any deals or bringing any money in, you cannot bootstrap when you’re doing that, you have to have money to be able to pay their salaries. So from day one, we have been raising money in Cloudera.
Almost every year, like in 2008, we raised five million from our Accel partners, which I mentioned earlier. At 2009, we raised another six million. 2010, I think we raised like double that, and just like every year we are raising double what we raised the year before. And mainly doing it as a function of; we want to continue to grow very, very quickly to capture this opportunity because we see this as a massive opportunity. And the one who captures the full opportunity will get the most value in the long term.
Martin: Amr, what is a typical customer lead time?
Amr: So it depends. So in some cases there are customers who already have downloaded our software, as I said its open source and free. So they already downloaded, they already built an app, it’s already running inside the company and they come to us and they say, ‘Okay it’s great, we love it, where do we sign?’ And usually that would take like a week until we get them to sign and they pay us, and it’s great. So that is the case in the early days when this technology was still kind of in the beginning, and there were lots of earlier doctors. Now, in the latest stages where we’re moving with this technology into very, very large companies and part of what we’re doing is convincing these large companies of, ‘Hey, your old way of doing things is not going to work for you going forward. You need to have this platform’, and in this case you have to go in and do what’s called a proof concept and show them that this platform truly will deliver the scalability, the flexibility of working with any data and the agility of being able to build new projects very quickly. So that process can take anywhere from four weeks to even four months until we can convince them that this is a valuable system for them. And then that’s when they do the first purchase. But our technology is not about the first purchase. Our technology is about how we get that first purchase but then grow it. Because once we get inside of a company and they have ten servers running or software and they see what these ten servers can do in terms of scalability and economics of storing the data effectively. Then, they start to grow it from there and that’s where our potential is much bigger from that, from the expansion that we get from these customers once we land them.
Martin: Amr, Let’s talk about corporate strategy. So I mean you have some kind of technology part in your company and then you have this kind of distribution part. What other part would you think or consider in terms of competitive advantage that is needed for your business model? And which one is the most important?
Amr: We actually have four pillars that underlie our strategy of how we win in this market, both win for ourselves intrinsically but when against competition as well. And these four pillars are:
- the technology,
- the team that we have,
- the track records and
- the ecosystem.
So let me talk about these briefly. So technology simply, our technology needs to be more superior. And in open source it’s tricky, how do you make your technology superior when everything you do you put it back into open source? So what we’re doing at Cloudera is not everything we’re putting back into open source. We’re putting roughly maybe 85% of what we do into open source but we’re keeping 15% proprietary to us. And that is very important to maintain uniqueness for our solution compared to other vendors out there. So there’s other companies out there, small companies and even big companies like, IBM for example that can come in and just take everything that we do and say, ‘Hey, we can do everything Cloudera can do, the software is all open source’. But by keeping 15% of what we do proprietary, we maintain that uniqueness, not only unique, we’re different. If you go with IBM, or go with some other player, you’re not going to get the full value that you’ll get if you come with Cloudera. So that’s number one, where we differentiate ourselves.
Number two, is the team that we have. So in open source, it is very important for customers that they see that you have in your company some of the open source project leaders that created this technology. So in our case for example, the Hadoop technology was created by Doug Cutting. And Doug Cutting he works at Cloudera. And there are nineteen other open source projects, and most of these other projects were either founded by Cloudera or the creators of these projects we eventually hire them to work at Cloudera. So that gives us a lot of value in our customers. they now believe that we can control that open source artifact, we can add the features they care about, we can fix it when it breaks, and so on. So that’s number two.
Number three is the track record, like I mentioned. So we use our own Hadoop technology, our own data technology, we collect data from all of our customers. When our customers are running a cluster, we are collecting data from them into our Hadoop cluster. And that data is not the data that their data. That’s how they’re operating, the telemetrics, the telematics of how the cluster is operating. We have that. We can see that from them and from all of our other customers that we had from the last six years. So now, whenever anybody of them experiences a failure, we can very quickly correlate that across all the other traces that we have and resolve that failure much quicker than any of our competition. Furthermore, we also do what’s called predictive maintenance. That’s where we can even predict that the customer’s going to have failure. We call them up and say, you’re going to have a failure if you don’t change this, or change that, you’re going to fail. So track record is our third advantage.
And then fourth advantage is the ecosystem. When you’re building a platform technology, like the one that we have, if you look at companies like Oracle or VMware or Windows or any company who is building a platform, their success comes from how big of an ecosystem do they have around them. So we have been very focused on building a very big ecosystem. We have more than one thousand partners that work with us right now. Some of these partners are building software applications that run on our platform and some of them are building hardware that underlies our platform. So for example Dell, is one of our largest partners and their also investor. Intel is our largest investor, actually. And some of these are SI, solution integrator vendors, like Capgemini or Accenture that go inside of large companies and implement these solutions. So we have the largest ecosystem right now among the other players in the space.
Martin: What is your recommendation for software service start-up that tries to find some distribution channels? Like you’ve talked about Capgemini, which is I guess one of Hadoop distribution channels because they are consulting other companies. Would this be one of your recommendation for a SaaS company to partner with? Whether it’s Capgemini, or Ernst&Young, or whoever?
Amr: Yes, absolutely. I mean, when you want to sign big deals with the large corporate organizations, many of these large enterprises, unlike typical enterprises, unlike for example Google or Facebook. If you look at a big bank, or a big retailer, or a big telecommunication companies, they have massive, massive engagements with these large SI’s and they use them to do the implementation. So it’s very important. One of the very important strategies for any company in the enterprise software space, which is the space that we are in to establish these types of channel partnership where they can come in and help you sell your software much more efficiently and effectively. I will not however – that we right now are not software service.
Again, despite our name being Cloudera, we are not software that you go and get as a service, we are software that you deploy inside of your organization. One of these deployment options is to deploy from the cloud, which kind of looks like a service but it’s not really the same, as for example a box of net or equivalent.
Martin: Amr, let’s talk about the market development, especially related to the cloud industry. What is your impression on that? What are the major trends happening?
Amr: Yes. So cloud is definitely happening. And cloud will happen and it’s not a question of if cloud will happen, it’s a question of when, when will cloud really take over completely. When we were starting Cloudera six years ago as I mentioned earlier, we initially wanted to be a cloud company. Like, we initially wanted to do everything in the cloud. But back then, six years ago, it was very clear that big companies viewed their data, their backup data as their blood. And nobody wants their blood to be outside their body. They want their blood inside their body.
Now, that is very similar to us, I mean if you remember many years ago when ATM machines came out. Maybe you can’t remember, your dad can remember. When ATM Machines came out, people were very hesitant to go and put their money in an ATM machine, right? Because they’re afraid but eventually, people were okay now to put their money in. Now, they don’t even think about it. The same thing will happen with data and the cloud. So we think companies will get more comfortable with having their data move into the cloud but that will take more time. It would take more time than other types of applications.
So for example, if you’re building a web app, or you’re building a website or a mobile app, you’re much more likely to use the cloud today. But when you’re building a backend data platform for an insurance company, a finance company, a health company, a government organization, they’re still very sensitive about having their data go on the cloud but that will change over time. So we are about big data, so for us the important part is when will companies be more comfortable having their data go into the cloud. And we see that starting to happen, the beginnings of it right now. But it’s still now across the board. It’s still like a very small percent of enterprises are willing to have the core data systems move into the cloud.
Martin: Good. Amr, thank you very much for the time.
Amr: Sure, you’re very welcome.
Sometimes wе аѕk ԛuеѕtiоnѕ like who invеntеd thе есоnоmу? I know I've аѕkеd that ԛuеѕtiоn a couple …