Conversations about Software Engineering

Conversations about Software Engineering (CaSE) is an interview podcast for software developers and architects about Software Engineering and related topics. We release a new episode every three weeks.

Transcript

Sven Johann: Hi, and welcome to a new conversation about software engineering. Today I am talking to Sam Newman about insecure transit security in the microservices world. Sam is an independent consultant and the author of O'Reilly's "Building Microservices" book. He has worked with a variety of companies in multiple domains around the world, often with one foot in the developer world and another one in the operations space. He spent over a decade at ThoughtWorks, then left to join a startup before setting up his own company. Sam, welcome to the CaSE Podcast.

Sam Newman: Thank you so much for having me.

Sven Johann: The CaSE Podcast is a developer podcast. Developers, let's say, rarely do security... Why is security now important for developers?

Sam Newman: I think if you go back -- I'm a bit older than I look, so I've been doing this for about 20 years now; not microservices, obviously, but being a software developer... If you go back, say, 15 years, developers didn't do testing either, nor did they worry about usability and interaction design. And often, 15-20 years ago we didn't do databases either; you'd hand that off to specialists. You'd have a specialist set of testers, a specialist DBA, a specialist person to do your UI and UX... And over time we've realized that there's definitely a role for specialists, and a lot of those fields that I've mentioned are quite deep and have a lot of extremely important expertise, but there is some general awareness of those areas that's beneficial for a developer to have.

Sam Newman: So developers have taken on more burden of -- we make our own database changes now a lot of the time; we do our own testing, we think a bit more about how we pull in the creative interaction design aspects into the work we do... And part of that has been driven by shipping software more quickly. If you don't have these silos, these hand-offs, you can ship software more quickly. But also part of it is realizing that by building these ideas into how we think about building and writing our software, we end up with better results... And I think security is no different. I think it's a big, deep field, with lots of complexity inside it, but there's also an awful lot of things under the umbrella of application security, which absolutely I would consider as like low-hanging fruit.

Sam Newman: I wouldn't expect every developer to be able to, for example, diagnose a really slow query in a complex query planner on a database, at all... But I would expect every developer to probably work out how to do an insert statement. I think with security it's the same sort of thing. There's absolutely some easy stuff that I think most developers can be aware of. And ultimately, we have to recognize that a lot of the way we think about application security doesn't work. We often see it as a transactional activity. We do it once off; we do all of our development, then we think at the end we can just put a bit of security in... And we sort of realize at that time that it didn't work well for interaction design, it didn't work well for testing either... So I think it's also just part of that spirit of trying to pull more of those activities into our delivery cycle, so that we can ship software more quickly and more efficiently.

Sven Johann: So what are the low-hanging fruits...? Because for me security is very hard; I'm always talking to the expert. The only thing I'm sort of able to do is Spring Security, or something... The rest I give to the experts. What are low-hanging fruits?

Sam Newman: I guess the first thing I'd say is that, in general, I find that developers don't always have a good understanding about where the risks are with any piece of software... So I think we often just assume it's very complicated and we reach for complicated solutions. The reality is there are just some very small, basic things that you need to think about. Passwords - very simple idea. It's how do you store passwords for your own system that you've built, but it's also how do you store passwords for the accounts that you use.

Sam Newman: It's amazing to me - I still go to these conferences and I'll ask this question of the audiences, and I still find more people using a password manager for the passwords of their own personal accounts than are using a password manager for their work accounts. When you look at a lot of the evidence and the research into this, a large amount of data breaches occur because either credentials are weak and easily guessed, or because they're stolen. And if you use a password manager, you can drastically limit the chance of those happening, because you can use unique passwords for each account. That's a very basic example.

Sam Newman: Now, I can show you multiple public-facing websites that still have very bizarre requirements around how your passwords are generated. You still get some that don't allow you, for example, to paste a password, which means password managers don't work. You still go to websites that give you facial password links, which is crazy; there's no reason for that. That's just one very simple example - what are you doing with your own passwords?

Sam Newman: The second thing is credentials. When you generate a credential to allow someone to access a database - are you giving out everyone the same set of credentials? That increases the chance that those credentials could be misused, it also makes it more problematic if you have to revoke them... But that's another very simple example.

Sam Newman: And probably the other one that I've been really banging on about recently is just patching. I think a lot of developers have assumed that patching will be done by somebody else ("Somebody in the operations team will handle patching for me"), and there's two problems with that. The first is that often the environment in which we're deploying our software is a lot more complicated now. So it's not like we're deploying our software straight onto some bare metal, and therefore the patching we worry about is just the operating system level patches. We're often deploying into a Docker container which contains an operating system that's deployed onto a virtual machine, which in turn is deployed onto a physical machine, and you've got multiple different layers at which to think about "Are all those layers being patched and updated frequently?" And even if the answer is "Yes, they are", still one of the biggest sources of risk is actually the vulnerabilities in the developer dependencies, the developer third-party libraries that we pull in.

Sam Newman: I talked in my recent Insecure Transit talk about Equifax in the U.S. - the massive Equifax breach was caused by a unpatched version of structs being used. That was completely under the control of the developer. All they needed to have done is gone in and increased and bumped the revision of that library when those patches were released, and you would have stopped the illegal access of 168 million records containing everything from passwords, to date of birth, to everything else. There are lots of things that are under our control.

Sven Johann: We will dive into passwords, credentials, patching a bit later, so we can dive a little bit deeper... The podcast is about microservices security; what's the difference between microservices security and monolith security?

Sam Newman: I think somewhat on the low-hanging fruit stuff I just talked about there, a lot of it is about the scaling issue. When you move to a microservice environment architecture, you've got many more processes, many more virtual machines that need to be controlled and configured and managed. That means if you want to be handling things like rolling out or revoking credentials, that might be a manual process with a monolith, but it won't scale within a microservice environment. That might cause you to look toward tools that allow you to automate some of those activities. Can you, for example, get a good automated view of the patch level of all the things you're running? Again, with a monolith you might be able to do that manually every couple of months. Maybe you're going to apply those changes yourself.

Sam Newman: Think about something as simple as deploying a monolith as a virtual machine - even maybe a Dockerized monolith; so it's the monolith, that's where all your code is. You make a change to that, you deploy that, you push it out. Every single time you do a build, every single time you do a deploy, that's an opportunity for you to rev the versions of things you're using; that's an opportunity for you to automatically update to the latest signed off Spring Boot libraries, or whatever else it might be, and to apply the operating system patches as part of that deployment process.

Sam Newman: With a microservice architecture, you'll find situations where if you create a microservice that does one thing and does it will - whatever that means - there's a much higher chance that a change might have been made to that service six months ago, nine months ago, twelve months ago, and it's just sitting there, running, and no one's looked at it, no one's gone in and needed to change it, because there had been no reason to. But that effectively is something which has not been patched.

Sam Newman: So you have this issue of, on the one hand, visibility - does my architecture have vulnerabilities? - and then when you do need to make changes like incrementing revisions, revoking credentials, you have that scale problem... So typically you start looking for technology in that space, that can help you in those areas. That technology might be helpful in the monolithic environment, but if you don't have it in a microservice environment, you really are storing up some really nasty problems.

Sven Johann: Talking about passwords and credentials in a microservices world - you said you have to automate changing passwords and credentials... How do you do that? How can I automate changing passwords, for example?

Sam Newman: When I talk about these things, I tend to separate the passwords of your end users from credentials that we'd call "secret inside the system." So I tend to bucket these differently. Things like what is the username and password for the database that my service wants to talk to, things like your Amazon API keys... I think those are sort of internal, almost administration credentials, and I separate that from your customer-facing passwords. The reason is because your risk profiles are quite different, and the operating processes and procedures around that are quite different.

Sam Newman: As an example, with something like public-facing passwords, it is actually not good practice to require people to change their password frequently, because evidence has shown that all that does is you end up training your end users to be vulnerable to phishing attacks. If you look at the advice from things like NIST in the U.S. or the U.K. cybersecurity center, they actively advise against automatically requiring people to change passwords on a regular basis. Instead, they recommend that you make use of systems that detect for password and username combinations that may have appeared in other breaches, and then use that as a trigger to tell people to change their password. Troy Hunt has a great article all about this, which I'll share with you and you can put it in the show notes. So that's for the public-facing stuff.

Sven Johann: Sorry, how can I actually implement that? Is there a public API for "Have I been pawned" or something?

Sam Newman: has got a really good public-facing API that you can integrate into your systems. It allows you basically to say "Does this username and hash appear in one of the breaches?" and if it does, you can then e-mail those customers to say "Look, we don't think we've been hacked, but we know that we think your username and password appeared in another breach. It looks like you may have used the same password for that site as this site, therefore somebody could potentially use those credentials to gain access to your account here. Because of that, we have required you to change your password."

Sam Newman: I've had a couple of those e-mails, because years ago I used to have bad practices around these things. You're already starting to see companies carrying out those kinds of activities. Again, Troy is a person who's all over this, so he has some great pieces on how you use that API, how that works, and how you can make use of all that sort of stuff. I'm pretty sure that's free of charge; I think that's largely because I think Cloudflare are picking up an awful lot of Troy's costs, because he's done some very smart things about how he's built that system. If you're interested in that stuff, to understand how to implement that programmatically, again, for your public-facing users.

Sam Newman: Credentials are kind of a different thing, because the risk profile is a bit different. So if you're an end user of my application and someone gets hold of just your user and password, it's only your data that is potentially impacted. Obviously, if an administrator password gets stolen, then you have significant issues. In those systems, you treat those credentials a bit differently; you're very careful about limiting scope, and often you make things quite time-limited... So the software about that can be somewhat more complicated, but again, it is something that can be automated, if needed.

Sven Johann: One question is how do you automate it...? And the other thing is if you automate, then the credentials have to stand somewhere; they cannot be in a file, for example, in a repository... So how do I do that?

Sam Newman: The first is thing is to say they CAN be in a file, in a repository. You can absolutely do that, if it's in a file in a repository which has extremely tight control, as in only a small amount of people can access it. That might not be the worst thing to do in the world... And certainly, I've worked in environments, especially where you're implementing separation of controls, where only a small subset of the people that actually carry out production deployments have access to, for example, a source code repository which contains the production passwords in it... But only they have access to it, and the deployment process is something they're in control of; so it's not the stupidest idea in the world, it's more about limiting who can see those things.

Sam Newman: Really what it comes down to is, as you point out -- let's take the example of a username and password for a database. I've got a service instance, I need a username and password for that. I need to get that from somewhere, but where can I get that from? Well, because we want to be able to change it independent of the lifecycle of the application itself, we're not gonna hardcode it in code, so it's got to be in a configuration somewhere... So how can I read that configuration? You've got three options, really. You can have the application itself reach out to some central service and say "Please give me my credentials". Effectively, that's how the secrets systems work back in Kubernetes, and stuff, which is effectively what OpenShift and the like sort of piggyback on. You could reach out to some central system like a secret store.

Sam Newman: The other two options really are kind of reading from a configuration file, or reading from environment variables. Environment variables is not something I ever tend to do, because I'm never quite sure about how safe that is, and I get conflicting advice. But every application is able to read things from text files, and that again is not a terrible idea to read that information from a text file, as long as you limit who can have access to that text file. So if I'm reading it from a configuration file that's running in the same container instance as my running process, then for someone to read those credentials they've got to have gained access to that running container. If they've gained access to that running container, I potentially already have other problems to deal with.

Sam Newman: The issue you're touching on is if it's in a text file and I need to change it, how on earth do I change it? And that's the problem. If we maybe put credentials off to one side and just talk about normal configuration, for example - the log level of your application, the state of a feature toggle. This is where a lot of people will use systems like etcd, or Consul, or historically may have used systems like ZooKeeper for. These are centralized, cluster-based software, which your server or your computer reaches out to your service (ZooKeeper, Consult, etcd) and say "Can you give me this value? What is the log level I'm supposed to be using?" I don't know those systems well, but certainly with Consul you can also say "If this value changes, let me know, so I can change my log level." My application can dynamically say "Oh, I'm supposed to be logging you in at a more detailed level. That's what I'm gonna do." Now, for me to do that, I've got to change the application; and changing the application is a big pain.

Sam Newman: Secondly, of course, I've got to have this centralized software now running, which is another additional burden, although - again, it's not massively difficult, to be honest with you. That's one side of it.

Sam Newman: The second problem here, of course, is that information is often sensitive, so how do I restrict access to it. Now, Consul has a really kind of cool program - a sister program almost - called Consul Template. What Consul Template does is it just updates text files based on values that are stored inside Consul. The nice thing about that then is your application doesn't need to know that Consul exists. It just needs to have that text file, and reading a text file is easy. So when Consul Template runs, it will just dynamically update those text files with new values, as they change in Consul. That's one example of how you can push out new configuration. That's sort of your general configuration problem.

Sam Newman: Of course, you could equally use something like a Puppet, or a Chef, or a Salt in their server models, to push out those configuration changes. I've done that in the past, as well. It's not very microservicy to use those tools anymore, but that is a possibility. Where credentials come into play, things get kind of interesting, because of course, you worry about how that information is stored, you're worried about who can access it, and also you have to deal with revoking and changing those credentials, and that gets a little bit more tricky.

Sam Newman: The system I know the best for this is -- obviously, you've got the Secrets system, which is inside Kubernetes, and has got better, iteration on iteration; the security of it has still improved, but it still doesn't deal with and have problems around things like -- I might be right in saying I still think once you actually get down to it, the data is still stored in plain text, inside etcd. But HashiCorp's Vault system is a system that basically sits between you and the storage of that secured credential. When you start your service up, it connects to the vault, so you have to give it a key to start up and authenticate yourself. And the vault is then able to hand over credentials to that service instance. The vault will quite happily (using that Consul Template program I mentioned) update text files with those secure values. So you've got some really interesting options.

Sam Newman: The reason I like Vault is it's got some special tricks up its sleeve. The traditional idea is "I'm going to go and change the password", I've changed the password or I've changed the username and I've hit Enter and it's pushed the system out. That requires a human being to do it. It's much nicer if you just have that happening automatically. One of the things you can do with Vault is you can actually generate time-limited credentials for your username and password, so it supports certain database vendors, for example... And when you request "Can I have the credentials, please?", it actually generates you your own credentials, that will only work for a short period of time. That way, if someone gets a hold of those credentials, you're not as vulnerable to someone else getting access to that stuff.

Sam Newman: Actually, many people already do this for Amazon API keys. Amazon years ago rolled out support to actually -- some of my clients implemented the Active Directory Federation with AWS, and implemented a system (it's a very straightforward process, actually) whereby rather than you having a set of Amazon public and private API keys so you can programmatically control your Amazon account, instead you would just log in and you would be using your Active Directory credentials and you will be given a very short-lived API key, that would live for 45 minutes at most. You would use it to run your automation; if you came back an hour later, those API keys would already have been revoked. That's really keeping the times that those credentials live really short. It means that even if someone gains access to it, they can't do any damage.

Sven Johann: And you mentioned that you get an access key for Vault?

Sam Newman: Yes.

Sven Johann: How does that work? I access Vault with also credentials, and then--

Sam Newman: There are a couple different ways of doing this, and I think some things have been improved, certainly in the Kubernetes world. One of the challenges, of course, is something you're touching, which is totally valid - I've got a secure box with all my keys in it; well, how do I open the box? Well, I've got another key. Effectively that's what this is, right? So when you bootstrap a process into Vault -- when I say "Hello. Can I have my credentials?", I have to give Vault a piece of information. There's no way around that, really... With Atomist, the way we did it there was we actually built a system that when you start it up, it would give you a very short-lived credential key to sign up to Vault. We would then do some additional security checks around that service instance, saying "Have you got the right service name? How do you connect within the cluster?" and all these other additional checks.

Sam Newman: There's actually been some really smart stuff done around improving this process without bootstrapping, kind of making use of the information that you have from the service discovery side of Vault, to confirm that these services are who they claim to be.

Sam Newman: The other thing you can do to mitigate the impact of those short-lived keys being -- the key thing is now I've gotta bootstrap, so I've gotta have a key to bootstrap my service. Again, if those keys themselves are also time-limited, then that also helps. The thing we would do with Atomist is those keys - as part of the deployment process, when a human being pushed that, they would effectively generate that new key.

Sam Newman: I've found a really interesting project that recreated a lot of that work, so I'll try and dig that out and I'll share some links with how you make that, and how that bootstrapping process is kind of interesting. Again, the secret here is limiting scope of credential. So I wouldn't want the key I give my service to let me see every single piece of information inside Vault; it should be just my stuff inside Vault.

Sven Johann: Okay. Wrapping the password credentials section up with the three R's of password and credential security - can you briefly explain what the three R's are?

Sam Newman: Yeah, so I mentioned the three R's, which was something that -- I think it was Justin from Pivotal who came up with it a while ago, and he wasn't talking just about credentials, but he was talking about how you deal with things called APTs (advanced persistent threats). Advanced persistent threats are malicious parties that gain access to your system and have access to your system for a long period of time, during which they can siphon up more data and more information.

Sam Newman: A good example of an advanced persistent threat would be there was an attack on the supermarket Target many years ago; Target is a big retail in the U.S., and this malware was installed on the electronic Point of Sale systems in Target. So when you took payment, it was a resident and it would just siphon up all the credit card information. Then for weeks they were collecting this stuff and exfiltrating it out of Target's network. That's a great example of an advanced persistent threat.

Sam Newman: Justin, in his piece where he talked about his three R's, was saying that one of the easiest ways to deal with those sorts of people is really to just periodically burn stuff down. I wanna make sure I get the three R's right, because I always forget it... So his three R's were Revoke - make sure you're rotating credentials frequently; that means if someone gets a hold of a credential that was six months old, they can' t use it anymore. Repair - it is about constantly keeping your patch levels up, making sure you're always applying the latest and greatest security patches... And Repave is basically talking about how when you do a deployment, you actually scorch the machine. That's what we used to call Phoenix servers.

Sam Newman: Basically, rather than continually applying incremental changes on top of an existent system, if somebody's got some malicious stuff in there, you may not notice that... So if you've got a machine rootkitted, a lot of the time it will patch itself so deeply into the operating system you won't even know it's there... And so the Repave idea is when you're gonna do a new deployment, just burn the whole machine down and start again from scratch. On Amazon you might spin up a plain AMI and build that machine again. In the Docker world that's easier, because effectively each new deployment of your container instance is, in effect, a new repaved instance.

Sven Johann: Talking about Docker images or instances or VMs... Let's talk a little bit about patching VMs and containers and schedule managers. When do I have to do that, actually?

Sam Newman: The simple answer is "As frequently as possible", but it's how destructive is that operation going to be...? For Windows, for example, we talk about "Patch Tuesdays." Your operating system patches will come out once a week, on a Tuesday... And it would be nice to apply those patches as soon as possible. It is always a little bit of a trade-off about how much disruption does that process apply... Because if I'm having to redeploy everything every week, well how do I schedule that? Do I redeploy everything? Does that mean I take my whole system down? Well, no; you go into more of the rolling update process.

Sam Newman: When I used to do this for Windows machines, for example, when we had a patch cycle every week, we would do a rolling restart of all of our machines. It gets a little bit more confusing nowadays, because we're not just thinking about operating system patches. We've often got multiple levels of operating systems to think about... So a lot of the time you've got to look at your operating system vendor and understand how their patch roll-out process works... So when do new patches get released? Is it a certain day of the week, or day of the month when they get released? What is your risk appetite? On Ubuntu, do I subscribe to the more edgy patch updates, so I get improvements quicker, but they might be less reliable? Or do I wait for them to be officially signed off. This is actually where a good sys admin will help you.

Sam Newman: There was something like developer patch dependencies - it's a lot more uncertain, because the issue then is you often don't know when there's a new version of Spring Boot available. Often developers know there's a new version of Spring Boot available or whatever when there's a cool new feature they want that they hear about, but they're not checking on a daily or weekly basis to see if some new security patch was updated.

Sam Newman: If you think about the transitive dependencies that you might have inside your application, it's not just what's in your Maven file or your Ruby gems dependencies file, it's the transitive properties of it. You might have 50, 100, 300 easy third-party bits of code that you're linking into, so understanding when those things have changed or when a patch is available - I don't know how any human being can be expected to do that. That's definitely a place where looking for tools to help you is a very smart move.

Sven Johann: What tools can I use to do that? I remember - to get back to the necessity... Is it a third-party library supply chain? I once heard a talk from -- I forgot the name, but he's one of the leaders of the Rugged Software Development; he basically said that he consults banks, which run every version Spring ever brought out, and they have no clue what's out there, which security holes are still in the system... And that you can easily solve that with a nice supply chain for libraries, but actually I cannot quite remember how to do this. How do I keep my libraries up to date all the time?

Sam Newman: There are a bunch of institutional efforts in this space to do this with a certain, specific technology stack. A good example - and I'm not a Node developer, so I'm gonna double-check so I'm getting this right... An example of one that tried doing it specifically for Node is npm-check. With npm-check you can run that on your Node dependencies and it will if there are newer versions available, and you can build that command line program into your build process.

Sam Newman: These efforts were a little bit sporadic; there were people doing it for particular technology stacks. Nowadays a tool I point people at is a tool called Snyk. Snyk is not free, it's a commercial service. What they do is they curate the huge database of third-party open source - not just open source libraries, but libraries - and they look at where vulnerabilities are detected, and they match those version numbers. What they can then do is they can scan dependency files that you have, and say "Well, based on your transitive dependencies, these libraries have vulnerabilities in them, and they can be updated by using these newer versions."

Sam Newman: The nice thing about that process is not only can you integrate that into your build, so you could actually fail a build if versions need to be incremented; they can even send you a pull request saying "Please update to these new versions." I think the really interesting thing is that does such a great job of making it reusable. The real work I think is under the hood, to be honest with you. It's building that database of the vulnerabilities across all the tech stack they support. They support Java, and .NET, and Node, and Ruby, and lots of different platforms.

Sam Newman: That for me (from a developer point of view) becomes a no-brainer in your project. Yes, it's going to cost you a couple hundred bucks a month, but if you're doing commercial software development and you can't find it in your way to spend that much money on this service - it seems kind of like "We don't really care about security."

Sam Newman: One thing to say about that, of course, is that's on the developer side. You still need to have visibility in what's actually running... Because I can say I updated my dependency files, but have I actually deployed it? So what do I actually have running out there in my infrastructure...? Snyk are doing more work in terms of what's actually running in my real environment; they've done some stuff in this space. For example, they can do this for all running functions, for example in AWS and Lambda, and things like that... But there are other tools that you might want to look at to say "What's actually running out there in my production environment? Although I think I've updated and applied these patches on operating systems and elsewhere, but have I actually done that work?" That's the other kind of half of the world, which is once it's in prod - what's going on in prod? Are there things I should be aware of there?

Sven Johann: Nice. I heard the name of the tool. I think in my current project I wrote it down without actually knowing what it means, but... I think it was your talk I attended last year, so I thought, "Okay, I have to look at it."

Sam Newman: I know the guy who's the CEO, and I am friendly with him, but I wouldn't -- the reason I'm friendly with him is because the tool is really good, and it's taken a hard problem and they've done a great job around usability on that. It really has... And I'm really interested to see how they take a lot of that work into… As developers themselves are operating more, they're running the systems more, and I think they need tools that are familiar and helpful to developers... So certainly the things that they're doing -- hopefully, if they can take that ease of use into the production environment, I think that'll help.

Sam Newman: The other thing as well about what you're getting into production is that actually as we're running more containerized workloads, that does help us a little bit because of the way the Docker file system works; it opens up some really interesting possibilities for scanning or things, and looking for common potential vulnerabilities. You've got tools like Aqua Security now, which is a really good way of looking at your container images and the operating systems running those containers, and doing that in a production setting, which I think is really powerful. It gives you that visibility; it's saying "This is what's actually going on."

Sam Newman: Your developers can't just think "Okay, I've done my bit", and then hope it's going to be handled somewhere else... So I think it's about really having visibility all the way across (as you pointed out) the supply chain, to see where we are with all these things.

Sven Johann: Talking about containers... What I know is you can offer a secure base image, which can be scanned and updated. The other thing, if I understand you correctly, is checking the behavior of containers during runtime.

Sam Newman: Yeah, you've got two things there as well, because your base image that you have is secure today, and you deploy a container based on that image today, and as far as you're concerned, that's up to date and patched. But if that's been running for six months, and no one's rebuilt that container image and deployed a new version of it, that thing that would have been considered to be good is now probably now good. It's had six months where it could have been patched. So there's kind of that process of "It may have been fine today... Is it going to be fine in the future?"

Sam Newman: You can actually see that if you go to the public Docker hub -- I don't know if you can still see it, but if you went to the official Ubuntu images for Feisty, or whatever else it might be, you will see loads of known vulnerabilities in those images. And that's not because they're knowingly uploading buggy software, it's just because after that version was released, they found defects in it. So if you're still building from those, it doesn't mean you're necessarily protecting against that stuff.

Sam Newman: Something like Clair, which is an open source tool created by CoreOS - or I think there are better commercial tools, and I've mentioned Aqua - they can actually look at those containers and say "This layer is inside your container. This layer has these potential vulnerabilities. This may therefore be something that you want to change/address."

Sam Newman: I think the thing you also are talking about is the behavior of those containers... And that's something for which I suppose these will work, but again, stuff like Aqua is going to be already know well in that space that can help you look at the behavior of those containers. It's not something I've used in anger but I know the team there... And that for me is quite synonymous with security modules in Linux, where you effectively say "This thing is running. When this thing runs, I would expect it to be doing these things. I would expect it to be opening these ports. I would expect it to be sending this kind of traffic. If it does anything outside of that, assume something is going crazy and shut it down." That's something that security modules allow you to do on a much smaller scope... That's the kind of work I know the Aqua team are looking at, and another idea that expecting you've got something running in your container orchestration platform that is saying "I expect Sam's invoicing service to do these things. It's operating in a way I'm not sure it should be, so I'm just going to alert someone or close the whole thing off."

Sven Johann: This tool - the Aqua tool, for example - is it something probably a platform team must install the scanner or something like that on a platform like Kubernetes? And each service needs to have a configuration which the tool is scanning and then observing if the expected behavior is met? Is that how it works?

Sam Newman: So talking about the tools class - if you want to alert when the behavior of the application is unexpected, you kind of need to know what expected is. Now, I do know some of these systems can learn from your behavior, in a way, but ultimately you need to say "This is what I expect you to do, and this is how I expect you to behave."

Sven Johann: Yes, but basically the team developing the service is providing that information... Let's say the tool setup is more something for a platform team.

Sam Newman: Well, for me it's all about how you think about the how you own your service and there's lots of different models. If you're the team that is writing your software, configuring your Docker file, writing your Kubernetes pod specs, pushing that deployment into your Kubernetes platform via OpenShift or whatever else, in that environment where you effectively as a developer own the whole lifecycle of that, then I might expect a platform team to say "This is the software that's going to be running that's checking it", so their platform team might be responsible for making sure that those subsystems are running in the same way they might be responsible for making sure that, for example, my Consult or Vault cluster is running... But I would still be, as a developer, responsible for coming up with that configuration, if that makes sense, because I build the application and I know how that works.

Sven Johann: It makes total sense, yeah.

Sam Newman: I've got obviously other models, as you might have, that are still working, where you just write the code and hand it off to the operations team, who write all that stuff for you. In that environment, they would own that stuff. Now, I'm gonna be very honest now - I've had a conversation about how a lot of this stuff is done, and I know the theory and I have done this with Unix security modules. I don't currently know the state of the art on identification for containers specifically... But this type of software is not brand new.

Sam Newman: You can look at look at tools like Tripwire from years ago, for example, which are the kind of more heavy, high-end security tools which would be run by security professionals, where they'll come and run this stuff on your network and say "Here's what's running." UpGuard is a really great tool in this space. They can run inside your network; they'll use pen tests, they'll use these sorts of tools.

Sam Newman: There's still value in having specialists with expertise and maybe toolchains that are much more complicated, that give you that extra level of safety... So I think it's quite appropriate, for example, to say your application delivery team is responsible for making sure the software is patched, rolling those updates out frequently, understanding if there's anything under their control they can update the authoring process.

Sam Newman: I might still say that every quarter we're going to have an external firm coming to do penetration testing as well, as a safety net. The key thing though is it's a safety net. You're relying on them to catch you if you fall, but you don't want them to be the only way to catch these things.

Sven Johann: We haven't talked about the base image a lot... So what is actually a secure base image?

Sam Newman: Again, it depends on what you mean by secure. In terms of a base image, as an operating system that I'm building, my Docker container out for backup, so... Normally, people would try and have their base image being some kind of trusted image. So you would start from the point of view of "This is an image from a known person, and I trust that person", and then there's "I trust that this image is the one that they gave me", and there's a whole bunch of stuff you can look into there around things like Notary, and how some of a number of different container registries out there offer different levels of trust about making sure you get the right image.

Sam Newman: An image itself - it can only really be that secure in as much as has it got software with known problems on it? There's still an awful lot to be said... I could still take a perfectly great, lovely, secure, up-to-date and patched Ubuntu image, and then when I configure it myself, I could open it for guests, no-password access to anybody who wants to connect with any port... So a lot of the devil is really in what you then do on top of it. If you think about running your own Docker file, you're the one opening up the ports, you're the one creating those users. If you go to a trusted operating system vendor to give you that base image, they're going to do the best they can do. It's not only you that can screw it up.

Sam Newman: And to be fair, there is this idea that most software that I build on tends to do a good job of being secure by default. By default, for example, when I create an Amazon account, or I spin up an EC2 instance on Amazon, I can't talk to it. I can't even find it. I have to explicitly say "This thing is available." That's kind of a bit of the mindset with that "secured by default", with whatever you build on top of it. Some software isn't always as good as that. I know that there's historically been issues with Mongo. For example, when you launch a Mongo instance, by default it gives you a completely unprotected port that allows you to query directly the Mongo database. And it's a good practice to close that port off, as you'd expect, but a lot of people don't, because they don't realize it's there... So there are actually people who run Google searches to find unprotected Mongo data.

Sam Newman: So it's always understanding what you're building on top of, what actually it is giving in terms of security. Again, from a Docker container point of view, the ports are only open if you've opened them in your Docker file.

Sven Johann: Okay, thanks. I am now switching to something else - authentication and authorization. Maybe just as a starter, could you briefly explain what that actually means?

Sam Newman: Of course. Authentication - if you think about this from a human being point of view to start off with, me as a human being when I log into a website, the authentication process is me saying "I am Sam, and here is a password to prove to you that I really am Sam." The server receives that username, receives that password, it hashes that passwords, compares the hashes and says "Yes, you've given me the correct password. You are Sam, and you have been authenticated as being Sam."

Sam Newman: Authorization is then what is Sam allowed to do. In a microservices environment we have to think about authentication also if a computer is talking to another computer. Some people embrace this idea of implicit trust, which say that effectively two processes talking to each over a network, within the same network, they can just talk however they want, because I'm assuming that if you can run a process inside my network, you must be trusted. Increasingly, people are moving away from that, and are now expecting that when a process talks to another process inside your network, there's got to be some kind of authentication.

Sam Newman: So I'm the finance service, I make a call to the warehouse service, and the warehouse service wants to know "Are you really who you say you are?", which is the authentication piece. And then the second piece of that is "What are you actually allowed to do?"

Sam Newman: You've probably heard of mutual TLS, which is where you've got this client and a server-side certificate, over HTTP for example... In that situation, you effectively get that authentication piece. And with mutual TLS, I know the server I'm talking to is who the server claims to be, and the server knows the client is who the client claims to be. That handles your process-to-process authentication piece.

Sven Johann: Yes, maybe we talk about mutual TLS in a second. First things first - or at least from my point of view... So now we know what authentication and authorization is. For me, it's kind of easy when it comes to a monolith - we use something like Spring Security, just to mention one thing... And then we implement it in our monolith, and everything's fine. But in the microservices world, to me that is a bit tricky, because implementing that stuff is hard, and then you have to distribute that code over all kinds of microservices, and if you use Spring Security, it only works with the Spring-based microservice, but not with the Node.js one... So what can I do in a microservices world to implement authentication/authorization?

Sam Newman: If you think about this just from the point of view of -- putting humans off to one side, and just thinking about it in terms of one service talking to another service, you have to have (in the same way as with humans) something like I authenticate myself with the monolith by providing a password. That'll authenticate me with the monolith, and then everything's great. As you said, it's all a single process, and everything's nice and easy.

Sam Newman: Now I've got the situation with the finance service who's going to go and call to the warehouse service. I need authentication there, as well, typically. So you've got a whole load of different protocols out there by which when I make a call, I can convince you that I am who I claim to be. If you think about what happens when you use (for example) AWS, or Azure - when you maybe write a piece of code, you provide your API key as part of an input to that program. That is actually used to generate a hash, and the header is sent to the AWS API gateway... And it says "Oh, you provided a valid API key. I know that's a valid API key, and I'm going to let you make that call happen."

Sam Newman: You can do exactly the same thing in terms of one process talking to another process. In other words, I can use a private key to generate a hashed request, I can send that hashed request to a downstream server, that server can validate that request really came from you by just doing simple, public key type stuff. So that's a simple example of how you might go about doing something with authentication. Now, as I said, there's not just one way of doing that. There are different protocols out there to handle those sorts of things. That process I analyzed was something called hmap.

Sam Newman: The other problem you touched on is "Well, I've got lots of processes, that have got to have lots of code to handle that", and that's a legitimate concern. This is one of the reasons why some organizations will standardize on certain technology stacks... Because they know that if you spin up a service in their known technology stacks, that they will be able to handle that authentication in a very simple way. Famously, for example, Netflix - I don't know if that's still the case, but they used to have this rule that if two processes wanted to communicate over a remote network, they had to be talking with JVM on both ends, because they knew they could use all their shared libraries and did all this stuff for you.

Sam Newman: Nowadays actually things like service meshes might show us a future where we effectively can share code to do these common concerns, and do that at a heterogeneous technology stack. That's sort of part of this... A great example being mutual TLS.

Sam Newman: Now, something else you can do to mitigate this is actually if you think about something like effectively a server-applied authentication schema is certificate-based, sometimes you can have that handled for you by effectively middleware. Think about having a gateway between the client and server in that relationship. So if you're going through the Amazon API gateway, they can often handle -- I don't know about Amazon, but I know Azure can do this... If you want to use something like mutual TLS, which can effectively give you authentication, that is something you can now offload to the gateway. The same way that you determine HTTPS at our load balancers, we can do that at the API gateway. The devil is in the detail of how those API gateways are implemented and configured.

Sven Johann: Basically, once I pass the gateway, all services behind the gateway do not actually check if I'm --

Sam Newman: Yes, exactly. If you go and terminate, that's exactly what happens. And again, that may well be appropriate. I touched on this in the beginning, that we often as developers are not good at assessing risk; one of the things I think we talk about is how useful it is going through a threat-modeling exercise, where you think "What am I actually concerned about?" Because often I think we react and we are like "I want this technology", without understanding what it is it gives us.

Sam Newman: So what are we really worried about? You might say "I'm worried..." -- because when we're talking about this stuff what we're protecting ourselves against is somebody gaining access to our network and being able to make direct requests to services or our network, or masquerading as a service on that network. Is that really the risk you have? If it is, then this is the kind of thing you need to worry about. You can mitigate that risk by having effectively DMCs. So you could segment that network into different pieces, so that within each network segment you might have implicit trust, but you might require some additional level of trust between it. There are lots of different models of how you can approach this. This is getting a little bit more into the space where you want to get some advice from a security professional, because you have so many different options here.

Sven Johann: One option I also got to know was having a generic proxy in front of every service. In that system, services were based on all kinds of technologies, but each service has something like an authentication proxy in front, which is basically checking if you're allowed to access certain resources or not. That turned out to work pretty well, and I think the difference between the proxy and the service mesh is not so huge, if I understood service meshes correctly.

Sam Newman: The one big difference is latency. Typically, if you've got, say, a generic set of machines... If you've got a set of machines that are working as a proxy, first thing is you're going to have to require that all communication between services goes via that proxy, if you want the proxy to handle that. Now, that proxy is gonna be a set of machines that act as a network hop. So effectively, if I've got service A calling service B, service B calling service C, service C calling service D, and now in between each of those calls I have to go via this third proxy, effectively from just a pure network calls I'm doubling the number of network calls I'm operating. So that's the one concern.

Sam Newman: The reason that service meshes sidestep this is because with service meshes you effectively are talking -- it's not that dissimilar from a logical point of view, it's more the physical deployment topology that changes this. With a service mesh, that proxy instance (if you want to think of it like that) is going to be running on the same physical machine as your microservice instance... So the communication in that proxy is going to be happening effectively over a local single machine network, so you won't experience the same kind of latency issues that you had before. Conceptually you can see it as the same proxy idea, but it's a much more distributed proxy. There are some other nuances around service meshes as well, but that's sort of one difference.

Sam Newman: Now, if you actually don't have that many situations where you have long service call chains going on... If you may have a small number of services, or the number of hops you go through to carry out your operation are quite short, you indeed may not see any real impact by having that generic proxy in the mix. But as your architecture becomes more fine-grained, you may have more significant challenges.

Sam Newman: I do also know that at least one of the commercial vendors in this space, when they were trying to sell one of my clients on this gateway, when we got down to their architecture, we realized that their generic proxy was actually hosted on their own infrastructure, so that meant every call that we made actually went via their API gateway, that was in a different dataset, in a different part of the country... So the latency impact there was horrendous. It was so bad to the point where one of my colleagues started referring to their service as a "Latency as a Service". It's like, we paid money to inject latency into our system.

Sam Newman: Again, those things may not be an issue for you if you don't have really tight latency requirements, but for some people it does become significant, which is why the service meshes kind of have a different architecture.

Sven Johann: The project/system I mentioned, we had those requirements, but the proxy was running on the same port... So whenever you deploy a service, you also have to deploy the proxy with the service. So it was still the network.

Sam Newman: Yes. And in fact, before we had these things called service meshes, the people that wrote things like Envoy Proxy - that's exactly the architecture they came up with. All a service mesh really is is a way of controlling some of those local proxies that are sitting on the same machines as your services, right? So it's a very sensible approach.

Sam Newman: Some organizations actually reclaimed back the library idea. The proxy idea is nice, because you can effectively get code/functionality reused across lots of instances. It can be used with a polyglot environment. The downside is that it might be a latency impact. I've certainly spoken to at least one -- I think I can name them, because this is actually a public case study... I was talking to the folks at Soundcloud about this, and they said some of their authentication/authorization code, although it could be in a proxy, they actually have it operating inside their actual service; they're actually running it inside their Scala service. It was on the actual service, because it reduced network hops. So they didn't want a generic proxy, and they actually said "The trade-off for us around reducing latency was worth having to deal with the fact that there was in a shared library behind your code."

Sam Newman: Again, to your point, local proxies have changed the game significantly on this. I think the challenge is just that although the quality of the Envoy Proxy is fantastic, the quality and the general maturity of the service meshes is still not fantastic. It's been taking a long time for them to mature. We've got a 1.0 of Istio, for example, but they're still iterating along some of the core concepts, and I think you'll see -- if you look at Kubernetes, once Kubernetes got to 1.0, that meant that "Okay, the functionality at 1.0 was good", but they still kept creating new concepts into the Kubernetes that changed how you used Kubernetes. If you look at how you create a pod spec six months after releasing the 1.0, it's quite different... And I think with service meshes we'll see the same iterations, as we find those patterns to get this working correctly.

Sven Johann: I recently attended a training on Istio, and also they said you can start thinking about it in production mid-2019, so...

Sam Newman: I've been giving -- for the last 2,5 years, whenever someone said to me "What about service meshes?", I said "They're really interesting. I think logically it's the right answer for so many problems we face... But the implementations are not mature yet. Wait six months." And I keep saying "Wait six months." So yeah... Wait six months.

Sam Newman: Now, what's been good is there are lots of people moving into this space. Consul is moving into it - HashiCorp moving into this space... You could look at some of the stuff Datawire are doing, as ambassador -- you know, what's happening there is that it's looking like a crowded world. Some of the things that look like could be gateway proxies for Kubernetes can also work… So there's all these different things happening. But I think the deployment models and the fundamental architectures models have stabilized. The sidecar model deploying the proxies inside your Kubernetes has one out as an architecture, so I think that's been really helpful. And then the question is really gonna be who can get to a good, mature offering... Because the problem is this isn't -- if you don't like Spring Boot, you could rip out Spring Boot of one service and replace it with something functioning, an equivalent, and the impact is one service. If you make a change in this space, it's a lot more disruptive. I think as a result I've been a bit more cautious with some of my clients.

Sam Newman: Some of my clients that are maybe more risk-averse, I said "Just wait. Maybe for you, stick to a known number of tech stacks. Pick a good framework that you like, Spring Boot being a good example... Just standardize on that. That might be a way around that." Or if you're more edgy and you're happy to take more risks, by all means do take a look at Istio and see what it can do for you.

Sam Newman: One of the fintech firms I know here in the U.K. is in the process of switching from one service mesh to another. I know it's not being a very smooth process. I'm going to catch up with them and see what their feedback is being in that process.

Sven Johann: To me it sounds like all these systemic changes across your system is a very risky thing to do, and I'm just wondering -- you cannot introduce the service mesh partially, right?

Sam Newman: I mean, you can, in as much as -- I mean, firstly, you could have different clusters. Secondly, you could set up your networking around that... So it is possible, but of course, the benefit of a service mesh actually does come from having everything on it. So you could have "Okay, wait until you put it in here and see what happens", but a lot of the benefits that you get, like managing your JWT tokens, and mutual TLS, and your correlation IDs across your whole infrastructure - that's great. I get some level of support for open tracing out of the box with Istio, right? That's great. But if I only implement Istio partially, the distributed tracing isn't going to work.

Sam Newman: So I think to get the maximal benefit, you do. I think there is scope for doing it... And again, that's the kind of thing I would consider doing, would be to say -- maybe thinking about my pod deployment mechanism and my network layout, there are some couple places where I could maybe have... Okay, we're going to effectively have these services running within an Istio group, and then we'll just see how it works out. But you're right, you're not going to get the full benefit from that.

Sam Newman: And lots of people, of course, are not running vanilla Kubernetes; they're running OpenShift, or Rancher, or whatever else... A lot of those people are sort of buying a package solution. Eventually, I think those package solutions will include a service mesh as well, and which one is it going to be? It's probably Istio, right?

Sam Newman: And of course, where things get interesting is looking at Knative, which I think will take a couple years to mature as well, but that is in effect being built on top of Istio as well, so we better hope Istio is going to work, right? So there's a whole bunch of things to consider here.

Sam Newman: I think we're used to -- certainly the Java programmers and a lot of people like myself, we'd often raise our eyebrow at the pace of change in the Javascript space, and I think now the Javascript people could quite rightly take a look at what's happening in the container orchestration deployment space and raise their own equally arched eyebrow at the amount of churn and stuff happening in that area. It's a bit of wild ride trying to keep up with everything.

Sven Johann: Yeah, that's true. A few years back Mesosphere was still a thing, and...

Sam Newman: Yeah, and you go back, and they weren't -- I mean, Mesosphere weren't wrong, in that what they built was really great tech, but it was really great tech that about ten people needed. Most of us don't run 20,000 nodes, you know?

Sven Johann: Yeah... You mentioned mutual TLS for talking to processes... I just want to briefly come back to it. Can you briefly repeat what mTLS actually is and how it works?

Sam Newman: Sure. I think many of the listeners are familiar with TLS as in it's the S in HTTPS; nowadays it is. SSL is pretty much dead. So we use TLS as a way of providing a level of safety/certainty around HTTP-based communication. When you go to a website, you go to your bank, hopefully your bank communicates over HTTPS; they put that server out. What that means is that they make available a certificate that says "This certificate says that this is really www.mybank.com. It's not somebody pretending to me Mybank.com", and then your browser is able to validate that certificate and say that "Yes, this is a real certificate."

Sam Newman: That's the process for what we talk about -- the HTTPS Everywhere movement has been happening in Chrome, and now other browsers are starting to say "If your public-facing website doesn't have a certificate, we're going to start saying it's insecure." And you're probably aware of the public internet.

Sam Newman: Now, mutual TLS takes that one step further. Now with a normal public-facing website, I just go to it and I'll log in, and that's me; I logged in. The certificate on the server side gives me as a consumer trust that the site I'm talking to is who I think they actually are. When I use mybank.com, it gives me some other benefits in terms of making sure that the communication I send hasn't been intercepted or manipulated, so it stops things like man-in-the-middle attacks.

Sam Newman: However, from the server's point of view, they don't have any real understanding as to who I am. They don't get anything out of that handshake, effectively, that says I am who I say I am, and that's why they do other things like checking my username and password to make sure I am who I say I am, and those sorts of things.

Sam Newman: With mutual TLS, not only does the server have a certificate, but the client also has a certificate. So effectively, at the level of the transport you're able to confirm client and server authentication effectively. So I now am talking to the server, the server knows there's a known client, and that's one level of trust that you can get to the client-server. If you're using HTTP-based communication - which you would be if you're using, say, gRPC or HTTP - then you can implement mutual TLS fairly easily. Some of the API gateways and the cloud providers will allow you to do that, as well... But it only handles program-to-program authentication from that point of view. They don't do anything about the human being problem.

Sven Johann: Let's say for IoT devices, I know that you have to have it. Some mobile phones also need a client certificate. I'm just wondering, when I have two services in the same cluster, and service A is talking to service B, what kind of the things I have to think about to say "Service A is only allowed to talk to service B if service A has a client certificate"?

Sam Newman: For me, again - it's something I talked about earlier - it's down to the threat model; what is it you're worried about? Because you might be happy enough to say that no malicious party can get access to my network, and that this cluster - all the traffic has to go through a gateway, and security, and everything else. So it really does come down to your threat model... And it's a balancing force, right? Because the more painful and complicated this stuff is for you to implement, the more convinced you've got to be that it's a good idea. If something's really easy, you just turn it on.

Sam Newman: Traditionally, managing certificates was painful for server-side stuff, and for client-side stuff. Trying to roll that infrastructure out with Puppet and Chef, for example, has always been a bit of a pain, so you needed a higher bar before you'd consider it. I think now we're running on platforms that make these things easier, and so I think it's shifting the point at which you think "Yes, we'll do it."

Sam Newman: Honestly, that is a lot of what application security is about - it's a trade-off of like "How risky is this thing? How easy is it to stop that risk or reduce that risk?" That's actually most of what threat-modeling is really about, it's saying "What are my risks? How can I reduce those risks? What's the cost of reducing that risk? Is it something we're going to do?" And I think as that technology gets better, things like mutual TLS become easier to implement.

Sam Newman: If you think about what LetsEncrypt did, when LetsEncrypt first came out and said "Let's make it easier for people to run their websites over HTTPS", everyone got obsessed by the fact that these certificates that they were issuing were free. The reality was that was not important. The important piece was the fact that they created an automated toolchain for creating and issuing those certificates. That was the killer thing. That reduced the ongoing cost of ownership of that process. I think as you reduce the cost of those things, it becomes more applicable to say "Yes, we should do it. It makes sense."

Sven Johann: Okay, cool. I have one question left, actually... You mentioned once the confused deputy problem... What is it?

Sam Newman: The confused deputy problem is sort of a generic problem that you can have, you talk about in application security circles... But to speak generically, a confused deputy is where you trick an intermediary party into asking for things that they shouldn't be able to ask for. You effectively dupe an intermediary into doing something they shouldn't do. This comes up in the microservice situation when we consider the authorization part of the problem. We talked about authentication, but we didn't talk about authorization, which is "What am I, as a human being, allowed to do?"

Sam Newman: As an example, I go to the website of some online whatever method and say "Can I see my user profile?" And that request from my browser goes to a server, and the server running on the server-side says "Okay, Sam wants to see his profile. But hang on a minute, I don't own Sam's details. Sam's details are stored somewhere else. I know this service looks after Sam's details (a microservice), so I'm going to go and fetch those." What I may have done at this point is I've authenticated myself with that server, the server says "Yes, you are Sam", "Sam is asking for details... Okay, I'm going to go and ask the user service for Sam's details."

Sam Newman: When I go to the user service and say "I want Sam's details", you've got some questions to ask. The first thing is "Have we already made sure that I am allowed to ask for my own details? Who makes that decision?" Do I make that decision upstream? Do I make that decision in each service, in the primitive? Do I put that in the gateway? Or do I have the downstream user service make that decision?"

Sam Newman: Now, if you think about what happens if you break your architecture down to smaller and smaller pieces, it seems odd to me that you have an upstream, centralized, proxy-based system that knows every operation I might want to use. It makes more sense to me for the user service to know what Sam is allowed to do, because the user service contains all the functionality about user information, or my order information, or whatever else that might be. The issue is that once the service call ends up at that user service, or that order service, or the inventory service, often what you've done is you've lost the context of who's making the request.

Sam Newman: So the user service gets a call that says "I want Sam's details." What the user service wants to be able to say is "Okay, I know you, the programmer that made the call to me, I know you're a trusted program, but who are you asking on behalf of? Because my logic inside the user service says that I will give you the details only to the same person. If Sam asked the user details, he can have Sam's user details, but he can't have Alice's user details", and the user has that logic in it, but to be able to deal with that process, it needs to know who is the request being made on behalf of.

Sam Newman: Effectively, it's not enough for the upstream server to establish trust that this is a trusted program talking to me; you also want to be able to pass the context of the originating call. This is what we use JWT tokens for. You can think of that much like a cookie state, effectively, but I pass some information downstream that says "This is the person that's asking. This is maybe the roles this person has, or the groups this person is in. I'm now letting you, the downstream server, make a judgment call about whether or not this is allowed."

Sam Newman: That's one model for how you solve the confused deputy problem, and also push the logic around authorization into the microservices themselves. That avoids the need for centralized authorization models sometimes, which require additional round-tripping.

Sven Johann: Okay, thanks. Did I forget anything important to ask?

Sam Newman: No, I think that's it... I think we got through quite a lot there. To be honest with you, it's rare that I get into confused deputy with my clients, because often there are much more basic things to deal with. When you're going through those threat-modeling things, those sorts of concerns are a long way down the list. They're less likely to happen, you're less worried about them, and there's more work to do to defend against them. It's often all the other things that you want to sort out first. You want to sort out your patching, your passwords, your credentials... Then you worry about your transport security, and then you might worry about your… You need to prioritise on this. So do some threat-modeling... There's loads of great stuff out there, like STRIDE and DREAD. Microsoft have got loads of great information out there on this stuff... And I'm saying that as a Linux person. So other than don't necessarily give the hard stuff -- it is sometimes the less cool stuff that you wanna focus on first.

Sven Johann: Alright. Sam, thank you very much for being on the show. This was an episode from Conversations About Software Engineering.

Sam Newman: Thank you for having me.