With Codefresh: Is the developer experience around microservices lagging?
Recently, Arjun sat down with Kostis of Codefresh to record a discussion on request isolation for developer environments. Most interesting from Kostis is finding out that questions about PR Previewing is one of the most common questions about ArgoCD... something the tool is definitely not meant to do.
The confusion may stem from the Pull Request Generator in the ArgoCD documentation which certainly suggests that you can use ArgoCD for creating temp environments, albeit somewhat indirectly. Really though the question comes up with ArgoCD because the tooling for developer environments and acceptance testing have not kept up with the operational abilities of Kubernetes.
Put more simply: while the operational capabilities of Kubernetes operations have increased, the developer experience has gotten worse as devs struggle to run their code in an environment that in any way resembles production.
In this conversation Arjun explains how request isolation is part of the solution for letting large teams give their developers experimentation environments.
Transcript of the conversation between Arjun of Signadot and Kostis of Codefresh
Hello, and welcome to this discussion about micro services Kubernetes. And what developers do when building micro services and how platform engineers can help developers with micro services. So my name is Chris Davis. I'm working for office, and I'm here with Jim from sigma dot. And we are going to discuss about, you know, the challenges that start with micro services and how developers should cope with them. So the first question I have is, how is the landscape right now, you know, previously, developers were talking about just monoliths, things are really simple. You just had a single application. And then microservices appeared. And you had to split the monolith. And then containers and Kubernetes appeared. So two big changes at once. So how things are now for developers in this aspect?
First of all, thank you so much for having me, Kostis. I'm pretty excited to chat with you about this topic, which is near and dear to my heart. So, yeah, I think we are in this sort of confluence of technology shifts that are happening right now. So, this whole notion of being able to scale development is at the heart of it, right? Like, as organizations grow, you want to be able to have a large number of engineers working on a codebase. And that's kind of led to the question of whether we can organize the codebase in such a way that developers can make progress independently. This has led to the whole notion of splitting up the monolithic application into manageable chunks of code that can be independently developed, tested, and released. This is known as the microservices paradigm. The definition is a little bit vague, but in general, it's all about modularity and decentralized shipping of software. On the infrastructure side, containers have been immensely successful in giving developers and operations folks a platform where they can have an environment that's consistent between development and production. If you're running it in containers, you know it's going to work the same way no matter where you run it. That's been an amazing thing from an operational standpoint of view. Kubernetes has become the de facto container orchestration system. It's amazing to see the adoption of Kubernetes in the broader market. We see this confluence of these two strands really striking at the heart of software development as we go into the future. It's pretty exciting to see these two trends coming together. What's interesting is that Kubernetes and containers have made the operations side of the equation far simpler and more consistent. However, the developer experience has not really caught up. Developers are still trying to figure out the ideal software development lifecycle. They need to have environments at their disposal to be able to develop fast, test their code fast, and collaborate with other developers that may be working on other microservices that they could depend on. There are a lot of things being worked on in terms of what is the ideal software development lifecycle for this kind of technology stack.
So, I think an interesting point here is that as Kubernetes appeared, everybody tried to adapt their tools to work with it. Continuous integration, path adaptation, monitoring, and testing all had to adapt. We have seen some practices in the past regarding preview environments that might have worked in the old days, but with microservices, they are a bit strange. Can you talk about the limitations of existing solutions and how they used to work in the past? Now, with microservices, we need to rethink them. Are there limitations on what developers can do before we explain what they need today?
Yeah, absolutely. And so if you look at it, sort of like the main change, like you said, you know, is the, the splitting up of the monolith that really impacts developers, right? Because initially, I would have maybe a front end maybe middleware, maybe a database and you know, all of them fit on my local workstation, and I can happily sort of like you know, do whatever Recording that I need to do. Now think about the situation where now you have like, you know, maybe 30 microservices, right, right or even more, right, or 50, microservices or some people have even hundreds of microservices. Now, it's no longer possible to have a development sort of environment that fits all on my local workstation anymore. Right? So that's sort of like it's, it's gone are those days where I can have everything on my local workstation? So So what is the developer to do in this case, right. So in this case, like, traditionally, so you can have you can, there are a few aspects, you can say that, okay, maybe I don't need all of the services running on my local workstation. Maybe I'll just make do with like, you know, the few subset that I need, like, that's gonna be one solution. That could work for some companies, right? Like me, I may have 50 microservices, but maybe I only need like three or four that I need to run on my laptop, right? The remaining can be mocked out or stubbed out. Right. So that's, that's one possible solution. And there, you have things like Docker compose and things like that, that's very popular in that model, right? You just run a bunch of them locally, and you just bring them up using Docker Compose. So that's approach that's one possible scenario that's possible. Certain customers, like certain users, for example, they say, Okay, I don't want to do everything and just Docker Compose, because my production is not running Docker compose my production is running Kubernetes. So I don't want that sort of Delta like where I have to do something for them, and then completely something different for production. So I want to really standardize on Kubernetes for even development, right? So in that case, you know, the common approach we see here is you spin up, spin up a namespace in Kubernetes, right, so you can spin up a namespace in Kubernetes, have whatever services you need there. And this could again, be a subset of services that you need for your development. So essentially, it's going to be like, it's kind of replacing the local workstation in a way, right? So it's kind of saying, I can't run everything very consistently on my local laptop. So let me actually use a namespace in Kubernetes. And actually deploy, let's say, a subset of services there. And then I get that is my dedicated namespace, right? So I can use it, I can do whatever I want, there, I test my changes there, and then I'm ready to sort of merge my code or, you know, get on to the release, CICD process.
So so that is what we see as sort of like, you know, the, the traditional way of like, okay, let's use Kubernetes, as a development environment, and namespaces are very natural, sort of a way to isolate different development environments, because that's inbuilt into Kubernetes. And you get sort of a decent amount of isolation there between different developers. And you still use the same Kubernetes cluster. And so the control plane is reused between the namespaces. So and then you can actually have different SLAs are all back and other controls that you can use to determine how long these namespaces live and who has permissions to what namespaces, and things like that. So that's a fairly common approach that we see in terms of like using developer environments. And you asked about sort of, like, you know, what do you see as the pros and cons here. So the pros here for this model is it's easy, like it's understandable, you know, it's inbuilt into Kubernetes. And another advantage is that it separates sort of like the operational aspects of bringing the namespace up to the developers just using it like so from a developer perspective, they just are working on a namespace, they don't care about sort of like, you know, how it was brought up, and things like that the, the, maybe the platform team, or the DevOps team can actually own the sort of provisioning of these namespaces. And determining, like, how they come up, and how they get turned down. And developers just mostly are using these namespaces for their testing, right. So they may have some developer tools, like maybe a CLI or some kind of UI to work with these namespaces. But mostly developers are just using it, right. So they're not, they don't have to do much work. So that's kind of the advantage.
The downsides, we see especially relate to how these performance scale, right? So as the like, for example, if you have as you have more and more number of microservices and developers in a team, what happens is, all of these environments become fairly bloated up, right? You know, it may not be just a few services that go into a namespace, it could be like, you know, 10, or 15, or 20 services that go there, along with databases and message queues and cloud resources. So you have to think about, okay, how do I bought a subset of the application or possibly the entire application to fit into a namespace? And then that's could be challenging, especially when you have so many components involved and now you're Managing like, potentially like hundreds of these namespaces at a time, right. So if certain things don't work as desired, who's going to manage it who's going to support it like, so there's going to be a lot of burden on the platform engineering team, to make sure that these are always, you know, coming up, very consistently, they work consistently and things like that the manageability of these become an issue. The other issue that we seen is the, how, how up to date, these environments are right, like, remember, like, these environments are being spun up on demand by the developers. But code is continuously being merged to the trunk, like the main branch of all the microservices, right? So you need to make sure that these environments like don't live for too long. Otherwise, what happens is they become stale, like, so you will have, you will be working on your own namespace. And then in the meanwhile, this code being merged into the main branch of the trunk branch of all the dependent micro services, and you don't have that in the namespace, right? So you're working in an old sort of baseline, if you will, and your testing is not valid anymore, right. So the more you, the more longer you keep these environments on, it becomes, the probability of it going stale is much, much higher, right. So the other approach you could take is, you know, continuously refreshing this with whatever goes in, in the main branch. But that's operationally quite burdensome, like you have, you may have hundreds of these namespaces. And you gotta like patch all of them through like a CI CD process or something like that, where you're updating all of these environments at any given time, which is also not trivial to do, right. So that's the other sort of drawback we've seen is it could go stale. And the third drawback is obvious, which is basically the cost aspect, like the cost and operational overhead of maintaining these environments, because you're going to duplicating these environments. And if you have, like, hundreds of developers in a company, you're going to have hundreds of these namespaces. And, you know, the cost will linearly increase with the number of environments that you have, right. So basically, that's, you know, there are ways to, obviously mitigate the cost, like you can use, you know, Spot Instances, and, and so on, so forth. But those are additional work that you need to do. But in general, like cost is a pretty important factor that we're seeing out there in the marketplace. Yeah, so those are some of the angles I would I would talk about.
So we will, you know, dive into preview environments or developer environments in a bit. But I want to talk about a more general problem, which I think we have right now. And you can tell me if you agree or not, but right now, you know, the cloud native, a good ecosystem, or the Kubernetes ecosystem is great, because you can see so many tools coming out. But all these tools usually are focused on operators, or SRS, or DevOps people. So nobody, you know, is thinking about developers. So when now we come to the question, let's talk about preview environments. Everybody knows how to create preview environments, manage preview environments, but nobody thinks about the end result, like what is the development experience for for end users? And for me, this is a missed opportunity. Because, you know, we have so many tools that are great for DevOps people, and developers and just, you know, looking at say, What is there for me? So do you think this is true? And if yes, do we need more tools, you know, specifically for developers and not just operators in a service?
No, that's a that's an excellent point, Costas. I think this is something that we see out there is that the operational side of this whole ecosystem has really moved past. Quite, they're moving at a much faster rate, whereas the developer experience has been neglected. Right. So it's more about, you know, like you said, like, how do developers use these systems? Right, like, what is the interface that they look at? And there are a couple of facets to this, like, you know, if you look at developers in general, they look at Kubernetes. And, you know, these kinds of systems as infrastructure, right? So they are looking at it as okay, I don't want to deal too much with this. I just want to work at a layer that, you know, that is makes more sense to me, because I'm writing microservice are actually solving an actual end user problem. So that's kind of the layer I want to focus on. I want to focus on, like, how do I design my micro service? How do I design the architecture offers like my data, modeling and you know, my relational database dependencies and other things like scalability and all those things that developers worry about? I don't necessarily want to think about infrastructure that much, right? So so it's kind of that abstraction layer that's missing right now is where either you're exposing too much of Kubernetes to the average developer, or you're not giving an interface that's very easy to understand in terms of like the concepts and the ideas and concepts that developers understand very, very easily. Like you know, they understand the concept of Microsoft I was but if you say, oh Kubernetes deployment or a Kubernetes Ingress, like, you know, they're a little bit confused, like, you know, why are we talking about these concepts that, you know, it's not so easily relatable to me, right? So yeah, definitely, there's that sort of mismatch we see. And also like specifically, coming to preview environments, and sort of developer environments, there are certain properties that make it more developer friendly. Like, you know, for example, like, you want these environments to be very quick, like developers don't want to wait, like, they don't want to wait for like, oh, I click a button and takes 30 minutes for something to spin up, right? Like, that's kind of a very bad developer experience. They probably want to have like a CLI or some kind of interface to this software or tool that's very easy to use, that's intuitive. And it's kind of fits within their development workflow, right. So, for example, if it integrates within the PR workflow, it's very good for developers, because they are used to sort of like, you know, always pushing a PR, and if I can get that environments to be spun up as part of the PR workflow is great, right? Even before the PR, they may want to have something that they can work with locally, like, for example, while they are actively working on a micro service, like they're not ready to do the PR yet, right? That phase, what is the developer experience look like? So I think the, the notion of like, I think they this is, the importance of the developer experience cannot be understated. Because at the end of the day, if developers don't use it, like, and they are the majority, in any company, right, like the platform team is usually pretty small, and usually, like Central, and they're catering to a large engineering organization or the team. And so if the individual developers don't use it, then the whole point is lost, right? Like, because you're not going to have the adoption, and you're not going to really accomplish the goals you set off to begin with, which is like, you know, how can I? How can I ship code faster? How can I ship code faster? How can I do it with higher quality? Yeah.
So coming back, you know, to preview environments, I think it's important to talk about preview environments, because first of all, I have to say, I was a developer for 10 years before I was a developer advocate. So if you look at the ecosystem right now, maybe you know, production deployments are taking care of like, as a developer, I commit and I know, at some point, my communities production, maybe I have some initial tools for, you know, local development, while I'm developing, there are no local clusters or local tools. And there is a big gap in the middle, like I have finished, you know, let's say, coding, and I want to see actually my feature, does it work? So I don't want to test it anymore. Locally, I want to test it in an environment, which is as close to production as possible. So continuing the previous discussion, yes, of course, you know, I could launch a temperate environment for me and have all bases that you explained. But then I don't get something that is close to production, I get something that looks like production, but it's not. So right now, when you know, most people talk about preview environments, usually that's the pattern. They think, like, I'm loading something temporary, it looks like production. But it's not I try my feature there. And it works. But there is another approach, which I just found out. For me as well, which is using, let's say temporary environments, which are not really completely temporary. They are far Papa said environment. And environments are not isolated on the namespace level. But they are isolated on the request level, which I think it's a super, you know, interesting concept. Once you, you understand, because, after all, you know, a namespace is great. But as a developer, why should they care about Kubernetes namespaces, this is an obstruction is not interesting to me. So if I got something that was the smarter and easier to consume, that would be great. So I believe because, you know, request level isolation is something new, maybe you can talk about this pattern and explain, you know, the advantages and disadvantages and why it's better than the traditional solutions.
Yeah, absolutely. And, you know, this is something that yeah, it's a different approach to having this sort of like developer environments, which are on demand. And the the interesting property here are the interesting properties that I'll quickly share about, but before that, I just want to explain approaches at a high level right. So, the idea here, if you if you can let me share my screen here. So it will be easier for me to go through this explanation. Yeah, so, if you look at the sort of like approach here, this is very much dependent on this whole concept of baseline environment, right? So and I refer to this earlier as well, where you can have any existing environment this could be like a staging environment or a QA environment or some kind of pre production environment. You already have setup, which has a CI CD process already hooked up to it, right. So there's already automation that make sure that This baseline environment is automatically updated with every core commit to a trunk or a main branch. So once you have that baseline environment, this whole concept is, you know, we refer to it as you know, sandboxing, or a sandbox environment, which is basically you're creating an encapsulation of certain services that you're actively working on. This could be something that you're working on locally, or it could be something that's in a dev branch. And you introduce that specific under test version of that service in the same cluster and make it work with the remaining dependencies, right. So the idea is like, so you create a sandbox with only the things that are changing. So it's very cost efficient, because you may have hundreds of microservices, but you're only changing a few at a time, right? So you're only changing a few at a time. And these things are changing can be encapsulated in a sandbox, right? And a sandbox is think of it as a logical sort of container of these workloads. Right. So this could be, you know, a few microservices, it could also have resources, which I'll talk about in a second, but essentially are containing this in a sandbox, and then you're using request based isolation to isolate sandboxes. And the baseline. And what I mean by that is, the other approach that we spoke about earlier is when you're creating namespaces, there is more like infrastructure level isolation, like every environment is completely isolated, as pods, right, like Kubernetes takes care of like the isolation. Whereas here, you're sharing the baseline between the different sandboxes. But you're using requests that are at the application layer to isolate the request flows that happen through the environment, right. So me as a developer, in this case, for example, I may work on source repo one, for example, I have a change that, you know, I have in my dev branch, I spin up that I spin that up in a sandbox. And then I make requests into the environment using like either my publicly available ingress URL, or I could use like an API to my microservice itself. But essentially, every request is tagged with some kind of unique identifier like you know, a tenant that identifies that unique tenant in the system. And based on the value of the headers that contained in the request, you can actually route the request dynamically in the cluster to go to certain sandboxes. Like in this case, for example, if I'm working on a version on microservice a in my branch, and I make a request to let's say, the front end of the application, the request flows will happen through the baseline. And then when it comes to deciding whether I the request flows through a or a prime and A prime is sort of this my dev version, there is mechanisms by which you can isolate it and say, this header is actually meant for the sandbox, which contains a prime. So let me actually route this request to a prime, right. And there are various ways to do this request isolation, or rather than in routing of request, a service mesh would be the very natural way to do it. So if you have a service mesh, you can integrate with a service mesh and sort of configure it so that it does that for you. If you don't have a service mesh, then there are other approaches like site cars and other approaches that we use to make that happen.
Right. So essentially, it's there are three aspects to the solution. One is the notion of a baseline that's continuously updated. The second one is that of tenancy, like the tenancy concept is very strong in this model that every request is tagged with tenancy. And the third aspect is how do you route based on tenancy, right, so that's the third aspect, which is routing. And the combination of these make possible to actually have these developer environments that are isolated from each other, right? So you can have, so it gives you the ability to tune isolation levels like and what I mean by that is, I can just have one service in my sandbox, or I could have like a few more services depending upon the exactly the needs of my development use case. So I can actually have a very good way to save cost because I'm not duplicating everything, I'm using the baseline as much as possible, and only duplicating the things that I'm working on actively. So this scales to like hundreds of microservices and hundreds of developers very, very cost effectively. And I have more control over this environment, like in terms of just by changing the routing aspect, I can now have very interesting use cases. Like for example, I can collaborate with other developers working on different sandboxes. But maybe I want to collaborate with them. And what I mean by Collaborate is actually make my request go through their sandbox as well. Because you know, maybe I'm dependent on one of the services that they are working on. That is also in a sandbox like that's not yet merged into main And it's not available in the baseline yet. So. So it makes for very interesting use cases where just by changing routing configuration, I can materialize an environment that is specified specific for that feature or for that microservice chain that I'm working on. So, and there are other use cases as well, that, you know, I'm happy to go through.
I think, also, it's interesting to note here that essentially, this is, you know, a new choice. Like until recently, what do you thought about preview environments? You said, Oh, I know what I'm going to do. I'm going to learn everything on my laptop. And of course, this doesn't work. When you said, No, I'm going to create a preview environment. And you know, you had relationship explained before by managing this and monitoring it. But I think one thing that nobody is talking about, about creating environments is, you know, databases and stay up. Because usually when you want, you know, to work with, let's say, production environment and a production, like an environment, you need data, and you know, launching, like some services in a namespace, that's super easy. But what do you do about data, and you can see, you know, people there, say, oh, we need to clone the database from production, but it has some confidential stuff, okay, let's remove the confidential stuff, or it's too heavy, let's get you know, some subset. So you see some tools that take a database, and, you know, make it smaller or lighter. And then you also have to manage the database for its preview environment. So it's not a matter of thinking, just the coding the applications and the data. So the solution is perfect for this. Because you can see, and I actually like what you have, you know, the boxes in the baseline, you can see, maybe I have a database that I control, and I make it, you know, look like it's close to production. Or maybe it's a production I created once it's centered, so I pay, you know, for the resources only once. And then I can say to everybody, Hey, this is my baseline. So unless I'm actually changing the database itself, which happens, but let's say it's not the usual case, I have some centrally centrally installed databases that can be used by everybody. And I can, you know, even say, okay, my new feature maybe will save something, and they, maybe my colleague will read something from the database. So I'm mentioning this, because I've seen, you know, so many questions about discussions and questions about preview environments, and nobody's talking about databases. And until recently, I didn't find like a good solution. I mean, if let's say you have a really big database, the scenario of installing everything in your laptop is not an option anymore. So you completely forget about this scenario. So I think this is super interesting. But I also think we need to that bit more in the isolation stuff. Because, you know, some people might say, What happens if, you know, I don't want to touch the work that is being handled by another developer? Is this enough? Like, do I need to do something here? And also, what are the requirements? Let's say, you know, I like this pattern, what do I need to do as a company that wants to adopt it? And what do I need to do as a developer that wants to use it?
That's an excellent point classes and data isolation is a huge topic, right? Like you said, you know that everything is very data driven? And you definitely want to address, you know, how do I share data safely? And how do I isolate data safely, right, like both use cases are important, but but this approach gives you the ability to share that baseline database with all the sandbox environments. So it's a great first step. And it's a also, I would argue that it's a majority use case, like most of the time, you want to be able to share that high fidelity data that you already set up in that staging environment or a pre production environment. And we have seen lot of companies, especially companies that are set in some verticals, like for example, in FinTech, we see a lot of this happening, where, you know, they have a lot of data that they have already set up. And that staging environment, which is very rich, like which is already like close to production, quality data. So giving developers access to that data to do their developer testing much, much earlier in the lifecycle is huge. It's a pretty huge impact, because it can test in a realistic environment much earlier before even they merge code. Right. So that's a huge advantage here. And yeah, and definitely the other aspect is, let's say, you're doing some destructive changes in your code, let's say you're doing a schema change, right? Like, and you definitely don't want to connect to that shared database, right? Because you want to make sure that database is, can be shared by everybody else. In this case, you do need to isolate data. And in that case, like there are various approaches, you can do it.
There are certain approaches that we do at Signadot, where we allow you to create like what we call as resource, right, like a temporary resource. And we have a plugin framework that you can use to essentially encode a very easy provisioning step like you know, so basically think of it as a wrapper that allows you to provision like an on demand cloud database like you know, maybe a MySQL or Apple was grass, or maybe an s3 bucket, like it could be any any resource that you can provision on demand. And we usually link that or rather synchronize that with the lifecycle of a sandbox. So when the sandbox comes up, and if the sandbox says I want a database, then the the sort of like our platform takes care of running the script that you provide it to provision the database, and then when the Sandbox is destroyed, that database is also reclaimed, right. And another important aspect I want to mention here is you may not want to spin up like a heavyweight resource for every sandbox, like, you know, typically, like you don't want to spin up like a RDS server for every sandbox, like, what you want to do is pause, potentially create a schema and existing RDS server that you already have set up, because the schemas are very quite lightweight. And you can spin one up, you know, fairly quickly. So that what do you see many of our users do is do that kind of approach where you have these logical isolation of data. And that's kind of what is being these resource plugins that we use, along with the lifecycle of sandboxes. So so that's where you can benefit, like, you know, by this both approaches, you can benefit by shared data in that baseline environment, as well as you can spin up on demand schemas or On Demand, resources that you will need to have to test, let's say, schema migrations and other aspects of it. Right. So. So that's kind of how I see data, you asked the other person about the requirements to use such a system right? There. The main requirement here is because these environments are isolated based on requests, you know, one requirement is that the services actually propagate that request context through them, right. And this is not and this is very easily done using this open telemetry libraries like so open Telemetry is traditionally more known for distributed tracing. But there's also an important aspect that it covers in open telemetry, which is context propagation, right, you need that for distributed tracing, also, because you need to propagate the context of a request through multiple services, right. And that's kind of how the traces are stitched together. So what you need for this kind of approach to preview environments is just the context propagation path. Like we don't need you to have tracing necessarily enable, though that is useful for obviously observability purposes.
As long as you have context propagation, then this is that is the only I would say the main prerequisite to using this system is having that context propagation through your services. And that can be done in a more incremental manner. Like you can do a few services if you don't have it. And then you get request flows that go through those services and can have sandboxes for that request flow. And as you see value, you can add context propagation to other services as well. And this is not and this is generally useful practice to have, because it sort of like paves the way for tracing as well. Right. So which is very useful for debugging microservices based application. So it's not something that is very specific to this environment solution. But it's something you should do anyway, like if you have a distributed application, right, so so that is the main requirement. But once you have that, then all of these things fit into place. Like you can have, like, you know, the baseline, you can have the tenancy concept introduced in those headers that are then propagated by open telemetry, and then using service mesh or the sidecar, you can actually start to dynamically route these requests and get that request based isolation.
Okay, so I think it's time to wrap this up. Any last words for our audience and also where to find you?
Yeah, we're at signadot.com. Feel free to sign up for our free tier and try the product without needing a credit card. We also have a Slack group where you can discuss microservices, developer productivity, and platform engineering related to cloud native stacks. We're excited to have you in our Slack community. You can also sign up for our monthly newsletter to stay up-to-date with the latest news in this domain. Don't hesitate to ask any questions you may have.
Yeah, so to recap, we talked about how microservices, containers, and Kubernetes have changed the ecosystem today. We also discussed traditional solutions that people used for preview environments and why they are not the best solution today, especially for microservices. Additionally, we covered the concept of request-level isolation, which was new to me, and I'm glad I learned it. I explained how this new pattern can be used for preview or developer environments and why it's better in terms of cost, simplicity, isolation, and collaboration with others. I hope you enjoyed this. See you all in the next one. Thank you.
Join our 1000+ subscribers for the latest updates from Signadot