Count Spotify among the businesses that are quite happy to be tied down to a single cloud provider.
Arguably one of Google Cloud's highest-profile customers, Spotify began a migration to Google five years ago when Google's long-term commitment to cloud computing was something of a question. Ever since then, the music and podcast service has doubled down on Google's infrastructure, building around higher-level services that trade convenience and ease of use for effortless portability.
And that's just fine with Tyson Singer, vice president of technology and platforms at Spotify, who oversees the technical infrastructure that serves Spotify's 356 million monthly active users. The 2,000-plus developers and tech professionals at Spotify also have a secret weapon called Backstage, an internally-developed management console that allows developers to use the dozens of tools in Spotify's arsenal through a consistent user interface. Backstage is available as an open-source project through the Cloud Native Computing Foundation.
In a recent interview with Protocol, Singer discussed the company's decision to marry its fortunes to Google Cloud, the pros and cons of using managed services and why "ML ops" is the next big thing on his radar.
This interview has been edited and condensed for clarity.
A few years ago, Spotify made a pretty substantial migration to Google Cloud. Where do things stand at this point?
We are all in on GCP. That was a really intentional approach that we took a number of years ago to get ourselves out of the commodity [infrastructure management] job, and all of the attention that it was taking from our organization, to focus on higher-level things. And we did it, I think, a little bit differently than a lot of companies that you see.
So if you were to compare us to Netflix, what we did is we went all in on a single vendor, but we also went all in on these high-level managed services. That was an approach that sort of doubled down on this whole philosophy that we wanted to spend more time focused on our business, and less time on infrastructure. That's an interesting thing for somebody who leads the infrastructure organization to say, but it's something that I actually truly believe in.
The other driver was really just speed. It's an organization that is oriented around speed. And in my organization, our mantra is that we're enabling speed, scale, and doing it safely for basically every Spotifier and all of our products.
Why Google?
We did the usual sort of due diligence that everybody does when looking at a cloud vendor. But what really stood out for Google was a few things.
One, they were leading on the data side. And we realized, based on the amount of data that we were ingesting, that we needed a partner who could handle complexity and scale and data, and get us beyond the limits that we were having. We had the largest Hadoop cluster running in Europe at the time, but it was still quite constrained for our organization.
And then second, we needed a partner that fit with us culturally, and that we felt like we could really influence compared to some of the other possibilities at that time. Google definitely hit both of those criteria because they were new entrants, and they had a lot of the same sort of cultural aspects that we had around autonomy and independence in our engineering team and just really focus on engineering excellence.
It's funny, because I've heard, even from people at Google, that one of the things they've struggled with is trying to empathize with their customer, trying to understand that not every customer needs a Google-scale approach to what they do. But it almost sounds like for you that's what was appealing.
It was and it wasn't. We did have conflicts at times, as in any good partnership. They've been on a learning journey of how to have empathy for customers, and what their specific requirements are that might not be the Google Way, and understand that there are these other sorts of amazing engineering organizations that may do things differently, and those make sense.
So it was a good journey. And since we were quite a large account of theirs, we were able to help them go on that journey as well.
I wanted to circle back on the managed services question, which I think is really interesting. I interviewed Mike McNamara from Target a month ago, and he was the complete opposite: I don't want any managed services, I want to run everything ourselves, we understand that we have to invest in people and skills in order to do that, but we want that flexibility.
Can you talk a little bit about your philosophy behind that? The pros and cons of building around managed services that really puts you in bed with Google for a very long time?
There's a part of the story that has changed recently. Going back to a couple things that I said before, which is we, first and foremost, are optimizing for speed. And secondly, we're constrained in our data ecosystem.
So when you look at a product like BigQuery, and the accessibility of that and the scalability of that — and also the complexity of building something like that yourself — there's a huge appeal in that.
We actually tracked the usage of BigQuery across the company, relative to our previous context, and the number of employees — especially technically-savvy employees that didn't have the data skills — we just saw this crazy exponential curve of adoption of that sort of technology. So that's like, "Alright, yes, we're extracting business insight at such a faster rate." And that was what was most important for us at the time.
However, as we adopted more of those managed services, then questions of efficiency start to come in. From the perspective of the customers that I look after — other Spotifiers — we want them to have that level of abstraction, so they don't have to get down into the nitty-gritty details of understanding infrastructure and really be abstracted from that; we are going through and layering in our own managed services so that we can get better scales of efficiency there and better, basically, unit cost.
Just so I'm clear on that, you're sort of building your own Spotify-managed services on top of, let's say, vanilla GCP? Rather than using some of the Google managed services?
Yeah, in very targeted fashion. We're not doing it across the board. We're just doing it where we think it really has an impact on the effectiveness of our overall budget and spend.
Can you talk a little bit about what some of those targeted areas are for you?
Data processing is one of those areas where we've taken a look at and are in process of [building] that. We're very transparent with Google about this. There are some other services that I don't want shared publicly that we're working on as well. But [data processing is] the one that's probably the most visible because we're also doing it in the open-source arena.
One of the things that we've also done [is] going from just being completely optimized for speed to saying now we might have a little bit of extra fat in the organization, we need to trim that back on how we spend on the cloud. It's been a journey to really change an engineering culture and mindset that was focused on a lot of important things around performance and scalability, reliability, observability; all those things that engineers love to work on, but they weren't focused on cost.
So we leveraged a tool that we've spent a lot of time on and our engineers love and adore — our development portal called Backstage — to add a plugin into that ecosystem. It is a cost-insight plugin that allowed us to sort of take a step forward in our cloud evolution, so that more and more engineers could understand the implications of their engineering decisions towards the company bottom line in a context that was meaningful for them.
Backstage looks like it could support a multicloud environment. Did you have that in mind when you built it?
We really want Backstage to succeed because it's so integral to how our company operates. It's the single pane of glass that developers, data scientists, sometimes even designers look at do their jobs: to build out the software, to manage the software, to create their software [and] to find new software. It doesn't matter if it's data, like a new data pipeline, a new back-end service or a new feature on mobile, a new machine-learning feature; it's all inside of this context.
Because this is so central to how we do development, we want to share it with the world. We want it to win as the developer portal out there. And therefore, it has to work on more than just GCP, it has to work on AWS, it has to work on [Microsoft] Azure.
But in terms of Spotify, you don't seem really keen on setting up multicloud yourself?
No, not super keen. There's simplicity in having a single cloud, and that saves us a lot of hassle and complexity.
Which emerging enterprise technologies are you most excited about, or which ones do you think could have the most impact on Spotify?
One of the areas where we've been investing for a while has been in [machine learning] ops or ML infrastructure, and stitching together all of our different parts of our solution there. I'm seeing more companies enter this area, which I still feel like is not a well-served area in the overall marketplace. It's not well-served from the cloud providers that don't really stitch together something that supports the sort of full lifecycle. And so I think we're quite close to having it stitched together. But I see a lot of activity in that, and that's actually pretty exciting.
How would you define ML ops?
Generally, the challenge is going from the training of models to the runtime ecosystem, and ensuring that you can do all the sort of standard software development practices that we're all used to in all the other disciplines; being able to do CI/CD type activities, and to iterate on your experimentation that you're doing in those ecosystems.
With our infrastructure, there'll be a new model created on a frequent basis. And then we run that through our experimentation platform, and see, "Did that actually move metrics?" Being able to organize all of that and keep that as something that's sustained, and [helping] people who've joined Spotify to really focus on amazing ML research that are currently bogged down, is where I got to see all the different aspects of ML ops.