Scaling developer testing for microservices in Kubernetes

In this presentation by Anirudh Ramanathan, Signadot's Chief Technology Officer, a unique testing approach is introduced. This method entails utilizing a single shared staging environment to conduct integration tests for each pull request and from developer workstations. We tackled the challenges associated with managing and scaling these tests within a Kubernetes framework, making use of open-source components in practical scenarios.

The session encompasses:

  • Explore the utilization of OpenTelemetry and Service Mesh in a shared staging environment for simultaneous microservice tests.
  • Conduct a detailed examination of the essential components required for effective microservice integration testing.
  • Gain insights from real-world instances of successful strategies employed by companies managing hundreds of microservices at scale, offering a distinctive perspective on scaling practices.
  • Acquire knowledge on organizing and overseeing comprehensive testing approaches for microservices, extending beyond integration testing to present a holistic view of microservice testing within an organization.

Transcript of the conversation between Anirudh of Signadot and Sam

Sam

So I think we can maybe get started. So welcome to the platform engineering YouTube channel. I think, as you might know, every week we have. Really fantastic webinars, and we've got a great one today. I think we're all quite excited for. So thank you so much for for joining. And as the session goes on, feel free to add questions in the chat.

I will be looking at those and you won't be able to see them. But I'll be looking at the questions. So as you ask them, I will read them out or I'll keep them in mind and we'll have a few moments where I'll be reading questions out. So you don't have to save all your questions for the end. We'll of course have 15 minutes at the end for questions, but we'll have a couple good moments where we'll be doing some, some questions.

So feel free to add them in the chat and I'll, I'll be reading them and I think we should just get started. So over to you. 

Anirudh 

Awesome. All right. Let me start my screen share. 

All right. Could you confirm that you can see that? Yes. Yeah.  Okay. Perfect. Yeah. Thank you all for coming this morning or whatever time it is where you are. I'm Anirudh Ramanathan. I'm the CTO and co founder of Signidot. And I'm going to be talking about  scaling developer testing for microservices and Kubernetes. 

But a little bit about me. So I  have been part of Signidot since 2019. And we built this Kubernetes native testing platform, which enables developers to move really fast, backed by YC and, and Redpoint. Prior to this in a previous life, I worked at Google on Kubernetes. So in the early days on stateful sets and deployments and such, and on Spark quite a bit.

I, I contributed the Kubernetes scheduler into the Spark project and then became a committer on that. And so the the agenda here, I'm going to talk about why developer testing matters how we actually enable that, like using open telemetry plus service mesh, like how you can bring these two components together to actually get really  comprehensive testing for your microservices. 

How such a system can be rolled out across an organization. And then a little bit of best practices from what we have seen in the wild from different people who have built such a system in-house.  So we'll have at different points I will pause for q and A. So when we get from the, when we are done with the, the problem portion of it, we'll, we'll talk a little bit and then I'll, I'll pause again before going into best practices and such. 

So  first why is it challenging in the first place? So with, with microservices, I think. The top challenge that  that I've seen is.  It's, it's a highly interdependent type of environment where each piece of functionality is distributed across many different microservices. So when you're looking to test something, you're really thinking in terms of the whole, but each of those individual pieces also need to be tested.

And they're all, and there's third party dependencies, there's like managed services cloud databases and such as well. to deal with. And then the other big factor here is getting a consistent environment. So this is actually surprisingly hard. The closer you get to production, the harder it is to maintain.

The more you start to create clones of the environment, the more watered down it gets, and the harder it is to debug and observe things in it. And Further, it drifts from productions. These are all problems that occur  when trying to test these microservices. And then of course the largest of them all, we have to look at  the testing strategy itself, like how much automation, how much of it is manual testing, who's doing what,  and how do we make these tests effective, like.

Any tho

There's many, many different versions of this out there. Different people have different strategies here. So, so we'll talk a little bit about the testing strategy itself. And based on I've seen in the wild what it is converging towards. And then I can launch off into how this system can be built. 

So, before that maybe this is a good point for a few thoughts from the audience. Is this consistent with what you've experienced? Is there anything that you would want to point out as being particularly challenging aside from these things? 

And Sam, if you can help read out any thoughts that come.

Sam

 Yes, I'll read out anything that, that comes in. Just give people a type to, a moment to type if there is anything. If there's nothing for now, that's all right too. Good point. From someone. I feel like this is mostly an issue in microservice hell when microservices have not been splitted correctly. 

Anirudh

It's fair.  I think there is maybe I can address that. There is some truth to that. I think the, it definitely exacerbates the problem when not following some sort of domain driven design pattern or something like that. But I think even with that I'm in my experience seen a lot of challenges around bringing those pieces together to test behaviors because each of these pieces is being independently released and there's still a lot of coordination between different teams that is necessary to  ensure like even internally that we're not breaking something that everything's backwards compatible and so on.

So needs a high degree of engineering discipline.  Even with the right architecture.

Sam

Okay, another quick point is, yes, I can definitely speak to these challenges was also going to say that the micro services world makes it more challenging when you have different teams that are responsible for different services. 

Anirudh

That's a great point. Yes. Absolutely. 

All right. So getting into the  testing strategy,  I think one of the key things about coming up with the right testing strategy is understanding what we are really optimizing for. And often like we like the many versions of the pyramid out there, which are like, okay, we optimize for maximal test coverage or, or some abstract goal in the end, in my opinion, testing is about getting clear signal early and make effective testing would be. 

Catching issues, right? Like actually preventing issues from getting into production and being seen by customers or partners causing revenue loss. So the earlier the better. And it's faster and cheaper for me to fix an issue. It takes minutes on my workstation, but the further along it goes, like if I'm finding it out in staging, I'm finding it out in some later QA bug bash that is going to.

be way more time. It is harder to figure out like where exactly the bug occurred. Whose microservice that was, which again speaks to the coordination problem that was mentioned. And finally, figuring out, okay, how do I fix and test this thing again? So taking this into account, we also need to look at keeping sort of a high bar on developer productivity through this.

Testing, because it's always possible to say, okay, let's just test everything ad infinitum and, and you know, like ship slow, but that's not really an option. So practically speaking, we have to balance the need for testing  with the need to ship things. And that means we have to make our testing very effective in the time available. 

So in, in in the modern scenario, this is a.  kind of testing pyramid that I've seen with a lot of people that I've talked to. So it generally, the, the, the way to read it is as you go up the,  it's testing sort of business y things. So like a feature or, or  some big scenario for the product and lower down, it's testing very small specific pieces of functionality so that if you have a failure, you know, okay, this is exactly where it occurred.

So it's easier to debug a unit test than it is an automated E3 test. Obviously, which is also why you have way more unit tests, which cover things comprehensively, but  you might not want as many because again, flakiness, it's difficult to know which specific issue it's catching and so on. So  this is sort of the idealized picture.

Of course, practically speaking, there are  versions of this, which are pretty much inverted where everything's a need we test and then. People realize, okay, the feedback really sucks, developers are unhappy and  start to invest in those middle layers of API tests, integration tests, et cetera.  So yeah, one interesting pattern that I've noticed is also a higher emphasis on exploratory testing, which is sort of the rise of all of these preview environment type of things.

So that actually helps a lot, because as a developer, when you start getting the full picture of.  Your change in the context of everything else and exercise it like a user would you get a lot of signals from that and that prevents a lot of issues from going further down deeper and then getting detected, like on a shared environment much later.

So. Yeah, I'll move on from here to describing how open telemetry and service mesh together can be used in a shared environment to actually solve several off the pieces here. So to solve API tests, integration tests some degree of component tests and also manual exploratory testing, which is pretty important.

So the idea behind testing in a shared environment is  You mark certain like the  stable versions of dependencies as the baseline. So that is essentially continuously being deployed from a CD process, and it is the stable version of, like, it could be in a  sort of, if I'm merging to master every time, then it would be the main version of the service that's running there.

So that's our common pool of dependencies. And then when I'm testing against it each developer would essentially deploy just their change. So  like for, for example, it could be a pull request here, not necessarily like from a developer machine, but we'll look at how you can deploy just a single change and then wire things together using open telemetry and service mesh to actually realize end to end testing behaviors, whether it's for previews or.

E2E testing. So that's what I'll describe in more detail. The, the power of this approach is that  we're only deploying what changed.  And therefore, we're always testing against the latest. There's no problem of stale dependencies or anything like that, because everyone's using the same common dependencies.

And on the flip side, the trade off is, yes, it is we are sacrificing a little bit of isolation in the process. And I'll explain.  Why that is still effective.  So going into a bit more detail, if this is what our baseline environment looks like, which is running the stable versions of everything. If a developer is testing out a new version of the service, a, they  deploy that into a sandbox, and then there is a way to send requests.

And, and these requests could be hitting like the front end. Or could be hitting any service really, but based on the presence of certain headers, it would exercise the service A portion.  And if those headers are lacking, then it would go through service A, the master version of it. And, and what this can also allow us to do is.

interesting things like you know, if I'm testing service B with a database, I can just deploy those things into a sandbox and then test them. The same concept applies. Like the green arrow represents like a request that is flowing through with certain special headers, which are being propagated through open telemetry.

I'll get into that.  And the solid arrows are the ones which are just baseline behavior. So this baseline environment could very much. I can, in the common case, it maps to like a staging environment. So it is having high fidelity dependencies, high fidelity data, et cetera. So I'll break this down to three pieces.

So the first piece is we need to actually deploy the workload. This is fairly easy. There are many tools for this. There's I think any, any tool like Argo CD would allow us to sort of package up a  microservice and then deploy that into Kubernetes alongside everything else. The challenging portion usually is.

Doing this effectively for every,  for every pull request. But I think there are strategies around that. So essentially the same microservice secrets, config maps, whatever, can actually be shared with this particular new workload as well. So that actually makes life a lot easier and it all you're really testing often is just an image change.

So a lot of times you're changing a few images or you're changing like an environment variable and that's what you're sandboxing and that essentially gives you most of the fidelity that you need. The, the second portion is context propagation. So this again is completely handled through open telemetry and.

OpenTelemetry here does not imply tracing. So this is not distributed tracing that I'm referring to where you have to have a trace back end set up to receive those traces, no.  Just having the library and instrumentation actually allows for this specification called baggage, so, which is a W3C draft, I think at this point.

But essentially, what it ensures is if you pass in a baggage header, on the incoming path, it will ensure that it transmits that same baggage value to the next microservice when it calls to it. So  what that enables us to do is we can set a particular header value  at the very start of this flow. And because of open telemetry as long as it's the baggage header, it is just going to get propagated on its own.

So we don't have to worry about reading that and then writing that back at each of these hops.  So that's essentially what we make use of. OpenTelemetry is great in that for a lot of languages which are dynamic, you just get auto instrumentation out of the box. It's really convenient that way. But if you, if your particular language doesn't fall in that set, then it, it may need some code changes.

And one pattern that I've seen is platform teams. creating like a shared library of sorts that can be used. Like if it's a specific server implementation,  then the shared library can have a lot of the middleware necessary so that developers can implement just a little bit of  glue code in order to propagate that context through their microservice. 

The other portion of this which is how the actual routing is happening between those microservices is using a service mesh. So for example, Istio, there are these constructs called virtual services, which will allow you to set rules on what to do in specific scenarios. Like if you see a baggage header that says  test tenant one, then take this particular routing decision.

So that's something I want to emphasize. And that's, that's the cornerstone of this model.  The request flows through the system the same way. Unless at that particular microservice, there is a different target, right? So even though this the request in red has a particular baggage that is saying, okay, I'm setting a routing key that says hit service a.

That the PR 15 version of service a the request travels through the system pretty much as, as it would for a normal request. And at this particular juncture where it is going to talk to service a, a localized routing decision can be made by Istio to instead say, okay, I want to talk to a prime instead. 

And the power of this approach now is you can compose this together. You can have routing key a comma B, and then it would. Now exercise two different microservices together and so on. So it's very composable because of that.

This is probably a question in everyone's mind, which is, okay, we talked about all of these request flows. That's easy. We can you know, route requests easily. It's all L7, you have your headers and we can set those routing rules on them. How do we deal with databases? So in practice, but a lot of use cases, what I've seen is for a system like this. 

People just use entity domain isolation. So like create a test org and you've already isolated like that particular flow. So if, if orgs are a thing in your particular application, so this means like we can default to using a lot of the high quality data that is already there in staging rather than have to always isolate by default.

So the, one of the ideas behind this. This model is you isolate as needed. So  it's a tunable isolation. So you don't necessarily have to isolate at the infra level unless you need it. So in this case, for, for many use cases, it's just entity level isolation where every team that's testing just creates their own org or their own. 

test entities and then uses them and that already gives them sufficient isolation.  But of course, there are cases where you need additional isolation. And in those cases, there are several options. Again, we don't have to immediately jump to infrastructure. We can look at logical isolation where it's just a different database table.

So you can have an environment variable, which will show up on the sandboxed workload and say, connect to  a  schema one auth table instead of the schema of the original auth table in the shed. Shared dependency so we can, and then very advanced would be like putting tenancy information into the tables themselves.

This is somewhat more tricky, but can be done as well. So there's many different options here. Or like another thing that I've seen as a pattern is people setting up these databases, which have snapshots of like staging data running in them. And then the sandboxes just dynamically pick one of these test databases.

when they desire isolation.  So that is essentially how you deal with data.  And then this is one of the, the trickiest things in a more traditional system. So if you were  setting up isolated, independent environments, like with infrastructure isolation, with either namespaces or clusters this becomes fairly tricky because you're essentially duplicating all heavyweight components like, Kafka, for example.

Here there is a different solution possible where we again make use of open telemetry and tenancy to make the consumers aware of tenancy. So the approach goes like this, the producers produce messages which contain this particular tenancy header that I talked about, like baggage. So they will contain that in the metadata of each message.

And then on the consumer side, the consumer is deciding, okay do I consume this message? So if it's a baseline consumer, it will choose to skip messages that are intended for a prime, because it will look at that header and say, I'm skipping this message, which is essentially why you need different consumer groups if you're using Kafka,  but then the the, the consumer under test.

It is going to reject all messages that are intended for the baseline and only consume messages which are intended for itself. So here you have a way to  tenentize the message queue which is far more efficient. So it's just one message queue still. It's still the same topic and we're just consuming differently on the  consumer side, depending on this particular header open telemetry context that I talked about earlier.

So,  yes, it does require some code changes on the consumer side. Just to be aware of and yeah, be aware of like what sandbox corresponds to this particular consumer group, but usually that's possible to do through environment variables pretty easily.  This is sort of the high level picture of the solution.

I'll pause here and take any questions. 

Sam

So I'll keep an eye on the chat in case any questions come in. There was just one statement at some point which I think is I can, I can read out in the meantime, in case anyone has a question from Scott Cotton. So, are we green by blue? Sort of, I think, but one, there can be many sandboxes, each with its own header, and two, sandboxes can have many workloads in them and be grouped together, and three, sandboxes are usually pre merge, whereas blue green is not. 

And we'll just give it a short moment if there's any questions at the moment. 

Are sandboxes some K8s object?  

Anirudh

In this case, they're an abstract entity that encompasses three things. Deploying the test workload,  the context propagation and request routing. They could be like in, in our implementation, it is a it is an object in Kubernetes, but you could implement them in several different ways, but they're not, but it's more of a logical abstraction in this case. 

Sam

That's all we've got for now.  Remember that you have any questions at any time, feel free to add them in the chat and we can read them out when there's a good moment.  

Anirudh

All right.  

Sam

So one more, one more question.  Is that isolation happens at header level or any logic at Kafka side?  

Anirudh

Okay. That's a good question.

So this is isolation that happens on the consumer side using the headers themselves. There's nothing special going on on the Kafka side.  Well, aside from each consumer creating its own consumer group, that's the only thing that's changing with the Kafka side of interaction, but otherwise it's just messages that have the tenancy header set on them from the producer side, and then the consumers choosing which messages to consume based on said header. 

Sam

And that's all we've got for now.  All right. 

Anirudh

So so that was a whole lot of like technical context there. I want to talk a little bit about why do this? Like it, it is  some effort to get from request isolation to data isolation, to isolating message queues and so on. So what is, what do we get at the end of all of this?

The, the benefits we see are like  many for it. So there's. benefits to developers. This is really fast feedback in that your environments are turning up very quickly when you want to test test your changes end to end. So because you're essentially only deploying like one thing and setting up a routing rule.

So that's way faster than any other alternate way of setting up an environment. From the platform team perspective, it's  very low maintenance burden because it's again one cluster. You're not setting up multiple clusters, setting up many infrastructure dependencies necessarily. From the QA perspective, yeah, you're, you're essentially getting stable environment, stable staging that you can test on top of, like test these behaviors as they occur.

So you're like, in some sense, shifting left. So this kind of testing can happen early on rather than wait for changes to merge and, and then.  Test it at that point. And then of course, PMs, UX engineers, you get very high fidelity. Look at your application before it even goes out. So you can have preview environments and then last but not least, you can run runtime security tests on those on the sandboxes as well.

So that gives you some, again, early signal before you merge that case as well. So this is a few best practices that we've seen from.  people building and adopting systems like this.  So typically they roll out for some subset of microservices. So a few microservices where this kind of and it's usually backend like API based microservices where the benefit is immediately I get  the ability to test from the front end.

So I can set a header on my front end and then immediately start testing my backend changes and that is exploratory testing. And that's immediate value.  And this is typically rolled out on a microservice by microservice  basis and by integrating into individual CICD pipelines. And then later on, as the degree of open telemetry propagation, as well as this kind of like the, the routing setup and so on is standardized across the organization, the ability to run more complex flows opens up so you can have E2E and integration flows running as well. 

that becomes kind of the second step.  And then there's of course, advanced things that people do. There is  the ability to make these sandboxes local. So you can use a tool that allows you to connect from the workstation to the cluster and essentially move that sandboxed test workload to the workstation.

So that's a bit more difficult with off the shelf components, but that essentially allows developers to like.  Not even have to build docker images and still get this kind of sandbox testing with the rest of the infrastructure staying the same. And then finally I think I alluded to this earlier. You can combine those sandboxes together.

So a person working on the front end and a person working on the backend, they can.  Set a particular routing context or decide on a particular routing context for both of them And then now you can have requests that will exercise each of their sandboxes So pre merge your testing like an entire feature end to end if if that is 

Yeah, and and this is solutions like this have been operationalized at scale. This is over slate Essentially, it's a great blog post that they wrote the underneath the hood is Pretty much the same. Of course, it's not on Kubernetes, but the ideas are the same.  DoorDash is a great blog on this, and then Lyft also talked about it. 

And I believe Airbnb as well, on how these  sort of request tenancy can be used for testing microservices at scale.  Some closing thoughts, and since I heard someone mention blue green as well this is, like, it might sound similar to Canadian.  But there's one key difference, you're running untested or like very early versions of code, whereas in Canadian, you're typically like it's, it's a, you're promoting a build which you have fairly tested to a large degree.

You can accomplish that as well using a mechanism like this. To shift traffic safely, but there is a slight difference in that here. We're talking about very much pre production code  and then testing in production. So this is something that is the North star for all of these approaches, where it with the right level of controls over data and tenancy and so on, it is possible to deploy the same system in production and have production sandboxes for either debugging or even testing your changes against production. Now that's a much higher bar because you want to make sure your test traffic and your production traffic is completely isolated. But it is possible and there are people who have done this and it's really awesome when we see that in action.  And then, yeah, a little bit on  the automated testing portion.

So we talked a lot about exploratory testing.  I think automated testing.  Also plays a huge role here and some one of the challenges that we have seen is actually in writing those tests. So knowing, okay, how do I even  know what behaviors to test or how to write an API test when I don't necessarily have all the documentation and the payloads and so on.

And that is something again that  this approach can help with because you can take this kind of service measures and like capture request payloads and try to construct models that help you author these tests or help developers or QA write these these automated tests that are deriving from some of the information in these request flows. 

That's something that  we're looking into, but generally just an interesting area as well.  All right. Any questions?  

Sam

So we have had a question come in from just a moment ago. How does environment governance work? Is the change in the staging environment or is coming from the dev environment and rest of the stuff being from staging.

When the testing is done, the change needs to be taken out or promoted. 

Anirudh

That's a good question. The, the way that it works typically is once the testing is complete, this change is taken out. So you  essentially do the same steps in, in reverse. You delete the test workload. You get rid of the routing rules that you set. And and then the change essentially goes away and in some sense when you merge and the CICD process is taking care of updating the baseline that is essentially promotion so you don't have to explicitly promote this particular build that you tested because maybe it has some debug flags turned on and so on, right?

So you can  use your CICD process to merge those changes and then just delete this at the end of testing. 

Sam

So we've had one more.  Summarizing, are the major disadvantages less defined isolation and increased complexity? Anything else? 

Anirudh

Let's see, I think I have a slide on  benefits. So,  I think it's the need for a gradual rollout is probably one  because open telemetry again, it's like it may be present to certain degrees in certain services and so on. If you're in one of those  auto instrumentation type of approaches, then that's taken care of, but there may still be some holes.

So, the gradual rollout is probably one of the things I would say is a downside. But of course, there's many benefits to contrast that with. And then once you get to the end state, there is  covering all these different forms of testing and really low maintenance and infrastructure burden by environment. 

Sam

I'll just give a moment if there are any other questions. that come in. There's been a lot of appreciative thank yous and thank you very much, which I haven't shared with you, Ani, but you can know that they're in there as you're answering questions. 

So we'll give it a moment if there's anything. There was a question above.  around where this recording will be. If you haven't seen it in the chat, you can of course find this recording on the Platform Engineering YouTube channel. So all recordings are there from past webinars and future ones as well.

And this one will of course be there as well. It will just take it might take a couple of days to come up but it will definitely be there.  So another question has come in in a continuous deployment context where exploratory testing is done in production with feature flags. Is this still worth doing just for automating test in CI, or is it too much complexity for the benefits? 

Anirudh

So, okay. So the context is,  there is feature flagging being done in production in order to test features. I, I think  If exploratory testing is done that way, there is still one benefit of using sandboxes, which is that it may be a lot faster. So this is anecdotal based on how, how I've seen this happen.

Feature flagging, while it is  beneficial for like, like once the feature is well baked. In the early stages of testing, when you're still mucking around with code and trying to integrate with someone else's code and so on the desire for fast feedback might still be,  like, might be helped by something like a sandbox where you're not thinking about feature flagging and turning on behaviors and, and, and so on.

You're simply putting pieces of code together and then trying to see the overall behavior far, like long before it reaches production. So. I think there's still a speed question there, but if if that's not a concern, then I think automated testing would be the, the portion where sandboxes can help. 

Sam

So we just have one more coming in as well.  Can you have a mixed mode sandbox where one pod in the service cluster is the new version? While the other pods are still on the existing production version. 

Anirudh

So  that is essentially this model as a whole, like these are existing production versions and this is the only new pod that we launched. If you're asking can I also isolate  because like an additional.  microservice in a sandbox? Absolutely. Like there's nothing stopping you from isolating that. It's just your routing context changes slightly.

So your routing rules might say service A and service B are having one single routing identifier. So Istio knows that when requests come in with a particular routing context, it has to send them through service A and service B, which are both in the same sandbox. So they could be running any version.

One of them could be master. But really, if it's the stable version, you would just use the common pool in most cases, unless you're worried about  some sort of side effect, in which case, yes, you can isolate that as well. 

Sam

So one more question. If you don't mind, can you show some practical examples?  Now show might be hard, but maybe if you have some practical examples, you can, you can think of to talk about. 

Anirudh

Practical examples.  There are tons of blog posts that I can share. So there is I think these blog posts themselves might be pretty detailed,  but let me see.  Practically speaking, I think the one flow that might be interesting is okay, I don't want to leave this slideshow, but essentially  taking like a mobile app development company using sandboxes to test their backend microservices.

And then the debug version of the mobile app actually containing a provision to add like these special headers, so add a baggage header there. So essentially, as a developer, they would add the baggage header on that test version of the app. And then that's actually hitting something either a sandboxed version of something either in the cluster or on their own workstation.

So that that's really fast feedback loops. And that's something we've seen multiple companies do, and I would highly encourage reading more on the Doordash blog as well, because there's good details. 

Sam

I think as well, Ani, you could maybe share these links in the Platform Engineering Slack. So for anyone here who might not know, we have a Platform Engineering Slack channel.  If you have any questions after this or want examples, I'm sure you can message me there. And of course, probably on his, his Twitter.

So things like that we can share after to continue the conversation and discussion there. So just one more question has come in. Can you give a setup overview for creating sandbox as a DevOps perspective? 

Anirudh

So the typical flow looks like this. So this is the easy the initial deployment, right? Like we're talking about the pull request use case for sandboxes. So people usually start with their own CI pipeline. Like that could be Jenkins, GitLab, GitHub actions, whatever. So the only prerequisite here is okay.

You are building an image for every pull request as a new commit is pushed. So as soon as the image is built.  You can create the workload so that that was the deploy workload step that I talked about earlier. So that would take that particular image and take any other maybe a modification to an envir or something and then package that up and deploy that into the cluster alongside the  existing like the baseline version of the workload.

And the second part of that would be setting the virtual service rules.  So this would be essentially modifying the Istio virtual service so that it contains this additional context of this new workload and it's associated my associate Kubernetes service. So you actually need to deploy  the new deployment or our Google load or whatever, plus the service, the Kubernetes service, and then the Istio virtual service would just have a new rule.

Underneath that says, okay, for this particular header context, send requests to this test version.  So that, that's what happens inside the CI pipeline in the create sandbox step. And then after that, you're just in like running integration tests, running exploratory tests. And for exploratory tests, for example, developers might just set a header using an extension on the front end and you know, exercise their change directly. 

And then eventually you have in CI, the step that says, okay, I'm listening to a pull request event, let's say when pull request is merged or closed, delete that particular thing. So that's going to delete the virtual service routing rules and the workload.  So that, that's usually how it goes.

People package this up into like a big Jenkins ish step, if they're using Jenkins, and then just invoke that from multiple pipelines on different microservice repositories. 

Sam

If there's any more questions coming in, I'll just give a moment  for them. I think we've covered a lot of ground.  So another question compare and contrast telepresence. 

Anirudh

Telepresence is definitely something that could help in this particular portion. So the, the local sandboxed workloads living on developer workstations, it could offer a way to have a, have a tunnel that connects to that, but the, the rest of it.  Including like like open telemetry and all of that, all of that is kind of outside of telepresence, like you have to do that in your application.

And then I think the, we also talked about like message queue flows. I think that that is pretty much out of scope. So it's like, I think it can definitely compliment this solution by giving you the local portion in addition, if you were to build the, the other infrastructure around it. So I would say it's complimentary. 

Sam

If there's anything else, 

question, but great talk, great questions, thank you very much.  Nice to hear that.  If there's no other questions, we can, we can give it a,  give it a moment, and pay close attention to some of the questions. Links, details on the screen here.  Yeah. You can reach out after. 

Anirudh

If you take any further questions on the platform engineering slack, I'll be there.

You can go into more detail on anything.  And yeah, we're, we're building something like this at SignalDot. If you want to check it out, do give it a whirl.  

Sam

I think that seems to be everything. It's just appreciation in the chat. And of course if you, if you think of any more questions later, something springs to mind as you're, you're lying awake at night, you can of course reach Annie on, on Slack or on on his Twitter.

And the recording for this will will be on YouTube. It was linked in the chat, but of course, just go to platform engineering dot org and you can find it there or you can find it Platform Engineering on YouTube. And of course, if you don't know, we have platform con coming up in just a few months.

So make sure to register for that because we've got some really, really fantastic talks coming up. And it's looking like it's going to be great. So lots of good content just like this. And as you probably know, every week we have webinars like this, so feel free to join and there. They're not often as, you know, they're, they're often quite good and they're often quite fun.

So make sure to join. And of course, join the platform engineering community, Slack, and ask questions there and have a good time. Thank you so much for joining. And thank you so much for today. It was a great one. So thanks everyone. And have a good day wherever you are.

Join our 1000+ subscribers for the latest updates from Signadot