Richard Rodger on Message-based, Generic Microservices

Transcript

Stefan Tilkov: Welcome, listeners, to a new episode, a new conversation about software engineering. My guest today is Richard Rodger. Richard, nice to have you on the show.

Richard Rodger: Stefan, lovely to be here. Thanks for having me on.

Stefan Tilkov: I'm happy to. Richard, why don't you start us off by introducing yourself?

Richard Rodger: My name is Richard Rodger. I am the CEO of Voxgig.com, a new startup for only about ten months old, and we're building a software-as-a-service platform for the events industry, a collaboration platform... And it's all built using microservices.

Stefan Tilkov: Who would have thought that? What a surprise. That, of course, is going to be our topic today...

Richard Rodger: Absolutely.

Stefan Tilkov: You are pretty well known as a microservices person. I believe you run a microservices meetup in Dublin, if I remember correctly...

Richard Rodger: Yes, yes.

Stefan Tilkov: ...so that's going to be our topic. I think we've had a number of episodes about this and touching on this, and one of the interesting things is that the definition of microservices always differs slightly, so I'm going to start by asking you how do you define the term "microservice."

Richard Rodger: Yes, that's a really good question. I remember having to give a talk in San Francisco at one of the Node conferences - I think it would have been 2014 time - and I wanted to shock and awe my audience, so I sort of boldly stood up and said "A microservice is something that is deployable individually and independently, and no more than 100 lines of code, independent of language." It was a great definition; it's nice and exact... Of course, it's complete rubbish. It's not a useful definition at all. I mean, I got the reaction I wanted from the audience; they thought it was crazy, but it generated tons of controversy and discussion. And yes, I only half-believed it at the time, and I think I was searching for a quality in microservices that gives you a lot of their benefits.

Richard Rodger: My slightly more sophisticated definition, which you'll find - I've written a book about microservices - in the book is it's an independently-deployable component in your system, and it's possible for another developer on the same team to rewrite the full functionality of that component in one iteration. So if you have a shopping cart service or a service that manages discussions in your online service, or a service that manages tasks or something like that, within one week (or one iteration, whatever it is), you can rewrite the entire service, it can be rewritten by somebody else on the team, and you can throw away the old version. That really gets you to a lot of the advantages of microservices, in particular the idea that your entire codebase is disposable, and you can throw away parts of it if they no longer fill the need that you have.

Stefan Tilkov: How independent are those microservices of yours?

Richard Rodger: Extremely. I think one of the big advantages comes from taking a message's first perspective. The name microservices is a reflection of the physical characteristics of what they are as software components, and it's a little misleading. From my perspective, you have all these components of your system, however they're deployed, interacting via messages. And again, it doesn't really matter if those messages are synchronous or asynchronous; what matters is that there are encapsulated pieces of data that are transported from one service to another.

Richard Rodger: If you take that perspective, then you have a very decoupled system, because you can't interfere with the data structures of other microservices, for example. All you can do is send messages to it. So I would be fiercely defensive of the independence of individual microservices.

Stefan Tilkov: Let's say one of those microservices handles a message, if that's the terminology you use... Will it send messages to other microservices to fulfill whatever it's supposed to do as a reaction to that message?

Richard Rodger: Yes, yes. I'm going to jump right into some of my (I guess) philosophical thinking.

Stefan Tilkov: Please do.

Richard Rodger: If you think about software component models -- so this discussion is now going meta, so we're going up a level... Microservices are just another example of a component model. The same way that EJBs were, or DLLs, or Erlang processes, whatever - this idea has been going around for a long time; people have tried many different approaches to building a component model, because we want to be able to build, engineer software systems the same way that we build LEGO toys. We want to be able to plug together reliable pieces that we've already tested, we want to be able to reuse them, to execute your software engineering faster and then deliver business value quicker.

Richard Rodger: So if you take the perspective that you have interactive components, and one component affects another component, whether that's via message passing, or remote procedure calls, or whatever, you have to deal with the problem of identity. If you think about it, there s no fundamental difference between an object calling a method on another object and one REST microservice making an HTTP call to another REST microservice. In both cases, you have this concept of identity. Service A or object A needs to know about and have a reference to object B, or service B, whether it's a pointer in memory, or a reference in a virtual machine, or a network address, or a topic on a message bus. A always has some notion of the identity of B, and I think that is really dangerous. I think that breaks a lot of the loose coupling. And when I build these systems, I like to remove that concept of identity.

Richard Rodger: This is why I take a message-oriented approach, because instead of thinking about the system and characterizing it in terms of the microservices, A and B, I characterize it more in terms of the message that goes between A and B. In practical terms, this means that from A's perspective, A is sending out messages, and it doesn't know who's going to get them; it doesn't know if they're going to go to one other microservice, or many. Conversely, B is receiving messages and it doesn't know who they came from, and if it responds, it doesn't know who the response is going to. That way, you get fairly extreme decoupling between the two services. That approach gives you a considerable degree of flexibility, because you can scale by just adding more instances of B, you can add additional functionality by adding microservices C and D that do other things, but respond to the same message, all without making any changes to A.

Richard Rodger: It doesn't free you up from the technical challenges of service discovery, that's not what I'm talking about here. I'm talking about more about the developer experience of dividing the business logic, and writing it from a perspective of sending messages, and either expecting or not expecting a response in asynchronous or synchronous cases... But nonetheless, your entire world as a developer is just about messages in and messages out. A practical example in the current system that we're building - we have lots and lots of different people that collaborate: speakers, event organizers, attendees, exhibitors, and all the different types of people who might attend an event... And they have lots of discussions. We have a discussion service, and it accepts messages that add comments to a particular discussion, and it will provide you with the contents of a given discussion. Then from any other service, whether it's a service dealing with checklists that a speaker has to go through, to a proposal, to an integration with Slack let' say, or whatever - whoever is writing those services, they simply have this message API that they use; they don't know that there's a discussion service that implements. And later on, if we have to scale, or if we have to add new functionality or it turns out that some types of discussions are privileged and need to be encrypted, it's not a case of modifying that one discussion service; we might add more services that handle different types of discussions.

Richard Rodger: We're focused on the messages that provide discussion functionality, rather than the individual service that implements a particular implementation of discussions, which we may change later.

Stefan Tilkov: Let me see if I can get this straight. It seems that the major difference is that as a default or as the standard case, you typically have several implementations of every interface, as opposed to having a service providing a particular interface as the default case. Is that a fair characterization?

Richard Rodger: Yes, and it's very powerful. I know it sounds crazy, but that actually is very powerful.

Stefan Tilkov: Let's talk a bit about this. It's interesting to me, because there's a huge overlap to stuff that I understand or that I like to advocate for in those scenarios, in those architectures, but it's clearly not the same thing. To our listeners, there was a previous episode with Michele Bustamante that provided a similar experience for me - lots of overlapping vocabulary, but different details, which is why I find these kinds of conversations hugely fascinating. Let me try to ask a few questions to elaborate on the differences between the approaches.

Stefan Tilkov: One of the things that you mentioned was that you don't see an object identity as a concept that should be in that philosophy or in that architecture, and I can relate to that... That seems to be similar to discussions that people used to have when they talked about RPC-style services, where initial naive implementations essentially just mapped objects from the OO paradigm to the network, and later found out that that's not a good idea...

Richard Rodger: Yes.

Stefan Tilkov: Because once you have an object identity, you start having a conversation with that object, and you have a lot of state that you have to keep non-explicit or implicit in that conversation; so you're still talking to the same exact object, and if you do that to too many of those objects, you end up having to maintain all of the state of all of the conversations, which is never going to scale.

Richard Rodger: Yes, yes.

Stefan Tilkov: Many of those things ended up switching the architecture style, including some of the things that I was involved in, so I committed that sin as well, and learned from it. So they switched to a model where you provide the object identity as part of the actual invocation, as opposed to making it a part of the connection... So that's a status approach. That's something I understood. But you seemed to go a step further by saying you don't even want to know which service it is you're talking to; you only ever deal with the messages. You send the message, and then whoever is supposed to handle it, handles it... Which sort of reminds me of the typical arguments that people use when they advocate for the use of messaging systems in general, right? So how is what you propose different from having a pops-up style messaging system. Isn't that exactly what you proposed?

Richard Rodger: It's certainly similar. But even with the pops-up style system, you still have to identify the topic channel. Now, I know there are examples of systems where they're more like message boards or notice boards, where services can kind of pick the messages they're interested in... There used to be tuple spaces, I don't know if you remember those...

Stefan Tilkov: I do. I'm an old person, so I do...

Richard Rodger: They were sort of popular 10-15 years ago... This isn't the same thing, but there's certainly similarities. Maybe an example is a good way to go here. I'll give you an example which is something that is happening right now for us. We built an MVP, and now we're doing private trials with some of our customers, so naturally, we have a system where there's user accounts and you can log in... So we wrote a user service. The user service knows how to handle messages to log in a user, to register a user, to handle password resets, to do private invitations, all that sort of stuff, all that business logic. And it's actually now at this point broken the rule of "rewrite it in one week or one iteration", because it's got too complicated.

Richard Rodger: So what we're going to do with that service is split it into at least two services - one that handles logins and one that handles registration. We'll probably take a small part of the functionality where it generates a user session description and put that in a completely different service... And there may be one or two other things that we leave in that service. But think about the migration from has become a mini-monolith, essentially. There are these messages to do with users login, registration etc. At the moment, they're all going to one service. Because of this zero identity approach, we can introduce a registration service and a login service, and make a change to the third service, in parallel to having this old user service running in the system... And for a period of time, there's a little bit of transition where certain percentages of the traffic go to different places. It's the responsibility of the new services to handle the backwards compatibility, but usually that's fairly straightforward, with feature flags and things like that.

Richard Rodger: Once we're happy that the new services aren't going to break things, we can turn off the old user service... But the key thing here is that because everybody that uses the user service - which are mostly services that deliver front-end web pages, or API gateways, that sort of thing - they do not need to be redeployed, or changed, or reconfigured. They're not dependent on updating a service discovery mechanism. Literally the same in memory processes that we're running right now, we'll talk to new services with no changes - no changes to code, no need to redeploy, nothing.

Richard Rodger: From their perspective, the system hasn't changed at all. They're still a sub-group of messages to do with users, which namespace; they're namespaced to the user messages. As far as they're concerned, there's a single service that still handles all these things. And down the road, we might find that there are specific latency issues around certain messages, we may merge parts of those services together...

Richard Rodger: Here's an interesting case - we want our system to be very secure, so one of the ways you do that is when you're calculating the password hash. You actually need to work the CPU. You should literally make sure that the CPU spends at least a second performing 10,000 hashes repeatedly, because that prevents rainbow attacks, and that sort of stuff. But we use Node.js, so if you're familiar with Node.js, you know that I've just described a system that will block entirely and not do anything else for one second... And at the moment, because we have such low numbers of users, it's not really an issue, but in production, we are actually going to have to take password hashing and literally put that in its own little service, and maybe even implement that in a threaded language instead, so that we don't have that issue.

Richard Rodger: And again, what we've done is in the current user service, it actually calls a message internally; so the message doesn't go over the network, it's actually just routed back into the same service, instead of a method call, which seems like a crazy amount of overhead, but it means that down the road if we need to pull out that piece of functionality into a separate service, it's possible to do so.

Richard Rodger: I haven't magically foreseen the future; I've obviously done this wrong before... But you can see how if you take a coarse-grained approach to your system and the coarse grains are the messages, you give yourself many, many points to split and merge services, and many extension points. This is where I go back to thinking of microservices as a component model, because all of these messages are effectively the extension points of the components.

Stefan Tilkov: Interesting. Let me paraphrase again. Essentially, if I send out a message, I don't know who I'm sending it to, so I just send that message and I expect somebody to deliver on that - either perform something only, or send a result to me in some way... And I assume that if I handle a message and do some processing, I can, if needed, send a return message, and I don't know who's going to receive that message either. I just publish the result... Which, in turn, puts a lot of burden on the infrastructure connecting those two things. The infrastructure now has to do all of the routing and has to know who has sent something, because the other one is supposed to see the result, and they have to know who's registered to handle which kind of message... Correct?

Richard Rodger: Yes, absolutely. This is not a problem unique my sort of extreme unique approach to microservices. I think a lot of people have issues with message routing.

Stefan Tilkov: How do you solve that?

Richard Rodger: I think this problem -- as people build bigger and bigger systems, people have recognized that this is a particular problem, which has started to be solved by things called service meshes. These are transport frameworks effectively, that run beside microservices. If you think of a classical microservice setup, where it's a REST endpoint that you're calling, exposed to the microservice as a HTTP server somewhere... Instead of sending that HTTP request directly to that service, you send it to the service mesh agent, which is running locally, and then it works out a good instance to forward that message on to and get a result. There's a whole bunch of different solutions out there at the moment.

Richard Rodger: There is an alternative approach, which is to embed the service mesh inside your microservices. That means the service mesh is effectively a local library, and then the library itself works out a way to send the message. In all of these cases you have a service discovery problem. The service mesh has to know how to route the message. At a service level, that comes down to an identity mapping question again. Say you want to send it to the user service - I've given my service a name - and there's a look-up table that says "Oh, the user service has ten instances, and they live at these IP addresses", or "It matches this particular Kafka instance on this channel", or whatever it is. So how do you distribute that knowledge?

Richard Rodger: You can have an essential registry, something like Consul, for example... Or some of these service meshes, especially the agent-based ones, are quite fancy these days and use peer-to-peer gossip protocols, and that's actually the approach that we take. I suppose we're lucky enough that our microservice approach is heavily based on building Node.js microservices. If you really want to have a strongly polyglot implementation where you're using lots of different languages to a significant degree, this approach probably isn't for you, and you should use an agent-based service mesh. But in our case, we embed the service discovery as a library within each microservice.

Richard Rodger: We used a particular algorithm... The acronym is SWIM. If you go to Google and look for "SWIM algorithm", you'll get the paper that explains it. Basically, it's a gossip protocol that propagates the identity and network location of every service to the entire network, without drowning the network in broadcast messages. Basically, each service infects nearby services randomly with the information, and it's designed so that within about 500 milliseconds, let's say; it's tunable. Every microservice ends up knowing about every other live microservice on the system... And our library then maintains that lookup table.

Stefan Tilkov: And essentially it's the service itself that registers as a handler for a particular message type?

Richard Rodger: Yes, exactly. The service announces, "Hey, I can handle messages of a given type." Now, there is a nuance there, because if you want to be (as I am) strongly opposed to identity, it's not much use to you to say "I'm registering a message of this type", because even the idea of a message type is a weak concept of identity... So rather than taking that approach and saying "Our system has types of messages, with specific schemas", we use a pattern-matching approach. Effectively, the service announces that it's interested in messages that match a particular pattern. And when I say pattern, that's really quite a wide concept. It could be matching certain attributes within the message; let's say if it's a JSON object, it could be matching certain network characteristics, timing... It could be anything.

Richard Rodger: The pattern-matching algorithm itself can be really simple or really complex. The base case is your message has a type indicator, and it's effectively just typing the messages. But in doing that, you're really weakening to the most extreme degree possible the concept of identity, and that preserves the characteristic that when a developer works on business logic, they're more concerned in their own minds about the message API, rather than thinking about which service am I using. That's what gives you lots and lots of flexibility.

Stefan Tilkov: That reminds me a little bit of words I'd rather forget, which is things like an ESB, and content-based routing, and lots of patterns that were there in a time when we thought that web services based on the WSDL/SOAP model were the solution, and essentially by standardizing everything and using the same message format, XML, which is of course today totally uncool, so we use JSON, of course... But back then, when XML was still nothing to be ashamed of, we used to say that we'll just have the infrastructure be smart enough to do routing based on XPath expressions, selecting where and what kind of message goes... Is it any different, or is it the exact same idea, just modernized by using Node.js and JSON?

Richard Rodger: In one sense, it is the exact same idea, but in another sense, those who ignore history are doomed to repeat it... What you want in a component model is just enough complexity to do interesting things, and allow you to substitute components for each other, allow you to compose components together... Composition is the fundamental operation of a component model. The problem with the enterprise service bus is I think developers sort of got drunk with the power of being able to put in really complex routing logic, and that complex routing logic became business logic.

Richard Rodger: So while you can apply any pattern-matching algorithm, in practice what we use is very, very simple patterns. Literally, this message contains this particular attribute, with this given literal value, and you might use one, two or three values at most, and that's pretty much it. And all of the business logic strictly must remain within the services.

Stefan Tilkov: That's very good... That would have been my next question, because the essential problem to me was always that business logic started to move into the ESB, and that was completely unmaintainable and a complete mess. So you see the same problem, and your solution is to restrict it, and to have only this very basic support in the infrastructure, as you said. That sounds very good.

Richard Rodger: Just enough to remove identity.

Stefan Tilkov: Okay. Do you also remove -- well, I'll just assume that you also remove the need for the sender and the receiver to use the exact same message format, right? It only has to be similar enough so that the receiver can extract whatever it needs from the message, correct?

Richard Rodger: Yes. This is another aspect of it... And again, it doesn't sound right, because from a traditional computer science perspective, you would think the next step is "Okay, let's define some schemas, and do schema validation, and all this type of stuff, and make sure that we have strong types in our system. That's how we get robustness." But it isn't how you get resilience, and it doesn't give you another benefit of microservices... So I'm going to step up again - one of the questions you can ask is "Why use microservices at all? What's the point?" The particular characteristics that they give you are more suited to the early stages of the project, perhaps the first 6-12 months, because they give you this ability to make really radical changes to your system without completely destabilizing it. They give you the ability to throw away mistakes... And that is not something that comes without trade-offs, because you have a more complex deployment environment, and you do have a distributed system, all that sort of stuff.

Richard Rodger: The ability to do that means that as you develop things like data models, and you realize that they are incorrect, you can change the data models without generating tons and tons of technical debt. If you think about it, if you have a particular business domain that you have to model, in a classic monolith you try and come up with a data schema that is relatively extensible, because you know that you're going to miss out some fields and relationships. The problem is that in practice what happens is that you end up with a hundred tables, and each of them have fifty fields, and some fields mean different things in different contexts, and you have all sorts of foreign keys, and the whole thing becomes a horrible mess, and you accrete tons and tons of technical debt that you can't remove.

Richard Rodger: By keeping a lot of the things like data models encapsulated within microservices, it means you can make changes to those models without affecting the rest of the system, which means that you can actually accumulate technical debt within a microservice pretty quickly. Fred George, who was one of the people who originally started speaking about microservices a couple years ago - he used to talk about the fact that why do you need unit tests anymore; if the code is so small, you can just eyeball it to see if it's correct. Now, again, he was pulling the same trick I was, I think, with my 100-line microservices... It gets a good reaction from a conference audience.

Richard Rodger: But there is a kernel of truth there, because there's always going to be technical debt, but if you can contain mistakes nicely inside the physical boundaries of a microservice, you sort of prevent them from infecting the rest of the system. But in order to make all of that work, you can't use strict schemas, because you need to be able to change your interpretation of the message down the road. You need to be able to add fields that subsequently get ignored; you need to be able to run two microservices at the same time, version one and version two, where version one is missing functionality, and yet still needs to work, so version two might have to assume certain default values for fields... This type of thing.

Richard Rodger: If you use strict schemas, you don't get that flexibility and you don't get the advantages, and you don't get what you want in the early stages of a project, which is this extreme ability to change your architecture as you discover new requirements, or new requirements get thrown at you. Later on, you can introduce strict schemas. I actually know the philosophical opposition to strict schemas. As such, like most developers, I'm very tempted by the delights of Haskell, and things like that... But unfortunately, we live in the real world, where if you want to build a software company, it's gotta be Java, Javascript, or C#, or something like that if you want to actually find people to work for you.

Richard Rodger: The later stages of a project, where you've identified latency bottlenecks, where you've stabilized your data structures, actually allow you to merge microservices. So my development model, especially when it comes to business systems, is it almost starts off with nanoservices and then they start coalescing over time. And then it almost ends up, two or three years down the road, looking like what's called macroservices, where you can't rewrite the things in one week anymore. That's sort of how it ends up looking.

Stefan Tilkov: It seems you strongly disagree with the idea that you should start out with a monolith.

Richard Rodger: Yes, absolutely. Because no matter how strictly you define component boundaries on the monolith, they're always going to break, and your developers will always be tempted to follow what their professors taught them in school, which is if you see the same code three times, generalize. So I would say you've gotta take the opposite approach.

Richard Rodger: I'll go back to my user login, my user service as an example... It's often the case with business systems, which is mostly what we build and what microservices are mostly appropriate for, that the business analysts define requirements in a general sense, and then only later realize that there are edge cases and special cases and things like that.

Richard Rodger: Imagine you're building an enterprise system, and the users -- initially, a user logs in and they see one type of dashboard, but then it turns out that managers should see a different dashboard, and then these admins should see something else, and then "Oh, wait a second... It's not just employees, we also have contractors, and they're not allowed to see XYZ..." Suddenly, you have a whole bunch of extra complication, even though in a 1,000-person company, 900 people are just front-line employees and they all see the same thing.

Richard Rodger: So how do microservices make that easier to handle? Well, if you start with a monolith and then these complexities start coming in, you start trying to generalize your data model, and start adding features to your entity relationships, and your logic, that become very difficult to disentangle. If you start with a simple microservice that can only handle the employees and can't deal with any of the other special cases, you can preserve that code throughout the lifecycle of your system... Because instead of adding complexity to the data model and handling special cases with extra if statements, or cases, or whatever in one codebase, you don't expand the user service when it turns out that the managers and contractors have different business logic; you write a new microservice for managers, and a new one for contractors. You cut and paste the old user microservice and then you make your changes.

Richard Rodger: I know this goes against literally decades of computer science best practice, but the perspective I take on that is it's a Pareto's Law perspective - you need to apply your effort where it's going to solve 80% of your problem. 900 users are always going to have to see the same thing. If you add complexity into the 90% case in my example, you're just making life difficult for yourself. Put the complexity in its own place. Isolate it, put it away from the main case.

Richard Rodger: What this means in practice is that as you build your system over the first six months, you're going to have a failure to meet edge case requirements for quite a long time. The manager will log in and see an employee screen, but you tolerate that, because you don't want to make the user's logic more difficult. A more intense example perhaps is let's say you're building an e-commerce site and you have special pricing for a certain category, or you want to have special pricing for a certain category of users... Well, you might have to tolerate some months of not giving them special pricing.

Richard Rodger: Another principle that arises is to accept that there's always going to be an acceptable error rate; there's always going to be some level of errors in your system, and if you accept that as a basic business principle, that there's a level of perfection that just doesn't make business sense, it frees up your software architecture to be more flexible.

Stefan Tilkov: I think some of what you said isn't as controversial as it used to be. The redundancy seems to be a more and more accepted trade-off these days, with a growing popularity of DDD and bounded context and all these ideas of isolated models that have some overlap... But as you advocate for very small services, maybe you can handle just one type of message or one selection of filter for messages - maybe you want to describe that - what kinds of restrictions do you impose because of two services sharing the same data? I'm not really talking about the user service and the employee service. I can see how those would be related, but different aspects. But what about the message used to create an employee versus the message used to query for an employee? Those two would have to be handled by the same service in your model too, right?

Richard Rodger: Yes, yes. This model, although you could do CQRS and things like that with it, you tend not to do that. Things tend to be encapsulated by the business domain aspect.

Stefan Tilkov: Actually, you've sort of given me the perfect counter-argument to my own question, or the perfect answer... You could essentially do it, but then you'd just end up with CQRS; maybe that's what you want, but if you don't want, you'll have to keep them together. But the fact that you can separate them, that your model allows for a separation of those two sort of makes CQRS a built-in, standard model.

Richard Rodger: Yes, exactly.

Stefan Tilkov: Nice. I should do your marketing.

Richard Rodger: It's a really nice feature. Speaking of some of the benefits that this model gives you on the developer experience side... For example, we just had a graphic designer join our team. Now, they're quite technical, so they're actually able to open up a terminal and run a Node process, and make changes in CSS, and whatever... But of course, being developers, we were running Minikube locally, and all that sort of stuff, and let me tell you, even with a fancy Mac, trying to run Minikube at the same time as Photoshop, and all that sort of stuff, it doesn't really work... And then if you try to do a Google Hangouts and your gossip protocol uses a ton of UDP and you're running 50 local services - that doesn't work either.

Richard Rodger: Here's the thing... Because we've completely abstracted away the transport layer, how messages get from A to B, it's trivial to package up each of the microservices into a single Node process, because you change your transport mechanism by configuration from being one that goes through the network to one that's simply an asynchronous method call. So we were able to provide the graphic designer with a single monolithic process that they can run.

Stefan Tilkov: What you mean is you deploy all the services in one process... Is that what you explained?

Richard Rodger: Effectively, yes. That's really powerful. It's one of the reasons I would advocate for the library approach to service meshes, rather than having a separate agent. If we were using separate agents, we would never have been able to do that. In my previous company -- my previous company was a pretty large IT consultancy, and because we'd become known for Node and microservices, we worked with a number of clients that had built really large Node microservice-based systems... And one particular client had to pay for every single developer on a 50-developer team, had to have their own large AWS instance to run the rest of the system that they developed against, because they needed to instantiate all the other microservice processes. And when you end up in that situation, you kind of know you're in trouble.

Richard Rodger: This is where some of the valid criticisms of the microservice approach come in, because you do end up with a lot of infrastructural challenges if you're not careful.

Stefan Tilkov: How much complexity is there in your infrastructure? Let me try to rephrase that in a clearer way... What I mean is the interacting services for the actual application that users will be interested in, users will probably use some UI to access something that will end up invoking a number of services, maybe with the same or with different kinds of messages... But the resulting system is the result of all the message implementations plus all of the infrastructure. Is the infrastructure really so simple that you can still understand what's happening, or is there a level of complexity introduced by all of this routing and all of this configuration of message selectors, and stuff like that... Do people actually understand what's happening within the system they've developed?

Richard Rodger: There is absolutely no such thing as a free lunch, Stefan. The short answer is no. If you use this approach and you are going more beyond that 20 services, you're going to end up very confused. This is a hard learning experience that I went through with this approach. Yes, you get all these benefits of this component model, which helps you manage technical debt and manage changing requirements, but at a practical level, when you have a deployed live system that certainly won't match for a local developer's machine, how do you know which messages are going where, and how do you know the system is actually behaving in an appropriate way. It doesn't even help you to have a staging system. You literally cannot have a staging system that mirrors production. That doesn't work, either.

Richard Rodger: So you've gotta accept, first of all, that this is a trade-off. You're going to have to introduce additional infrastructural complexity to manage this approach. You've gotta be bought into using things like Kubernetes, you've gotta put time and effort into your own monitoring, so you want to be doing things like tracing message flows between services... And you know, although there's some open source tooling around that - Twitter has a nice thing, and there's a few other things - you do end up building a little bit of custom code to help you manage the system.

Richard Rodger: One thing in particular that we do that has been really helpful is we have a small monitoring system which samples 1% of messages (or 5%, whatever you configure it to sample) through each microservice, and then just sends a summary to essentially a central time series database. Then we take that and we use D3 to build a nice little dynamic chart showing each microservice instance level or type level, and the message flows between -- but you end up having to do things like that to understand your system. So if you're thinking about the production system and you want to know which messages are going where, you pull up this representation of the live system that's built from sampling message flows, as opposed to having a definitive UML diagram of what your system actually is.

Stefan Tilkov: One of the things I find interesting is that you actually ended up building your own microservices toolkit or framework to support that, and you open-sourced it. It's called Seneca.js, right? Is that the name?

Richard Rodger: Yes.

Stefan Tilkov: How did that turn out for you? Are you happy with that? Was that a good experience? Would you recommend doing that, or would you recommend people just adopt your toolkit?

Richard Rodger: It's a mixed experience. I would say if you like the philosophical approach that I have, it's perfect... But there's a lot of rough edges. This is not a large, mainstream open source project by any means. I'll go back to the beginning - the first version of this particular system was built in 2010, and that's long before the term "microservices" arose. Back at that stage, and for a number of years afterwards, it was a monolith-oriented approach... But it did have the core ideas of trying to solve this component model approach, trying to remove this idea of identity, and trying to handle technical debt and provide reusable business logic.

Richard Rodger: It's easy to have components, where the component is a utility component for accessing a database, or generating SQL queries, or resize the images, or talking to AWS, or something like that... But it's much harder to write a reusable component that handles user login flows, or shopping cart logic, or sales tax calculations - the types of stuff that you have to build again and again for clients if you run a consulting company, which is where the motivation came from. So it made sense to open-source what we were doing, so that we can reuse it on our own projects... But I would say I was pretty naive about what it means to run an open source project.

Richard Rodger: It's very easy to put up a nice website, and reasonable documentation, and all that sort of stuff at first. But then it turns out that a thousand different people want to use it in 1,000 different ways. There's a lot of expectations. Users of small open source projects like the Seneca.js project have expectations that are created by really big projects that have IBM backing them, and literally hundreds of developers who are willing to go and fix bugs, and update documentation, and that sort of thing.

Richard Rodger: There's this really common thing in open source, especially amongst projects, where they have just one maintainer, where the maintainer ends up completely burnt out, because they're trying to keep the community happy. So I'd say that side of it has been difficult, because what we've built collectively, the whole community, over the last eight years, is incredibly useful. I'm founding a startup with it, a startup that I'm putting quite a bit of my own money into. I wouldn't use it if I didn't think it was actually going to be effective and give me value for money. But that said, it's an open source project that you are not going to find in-depth documentation for, you're not going to find a huge community... Although there is a community and there's certainly a lot of people who will help out, but at the same time you'll hit roadblocks where the only answer is "Go and read the code."

Richard Rodger: I mean, it's Node, it's Javascript, it's only a couple of thousand lines, and it's not rocket science either... But at the same time, you will end up in that situation. And other frustrations that I identify are things like Javascript has moved really to a promise-based approach for asynchronous operations. The Async/Await syntax is awesome, I totally love it. It's definitely the right way to go. It has simplified a ton of my own code... And yet, we are still struggling to release a version that properly supports promises, because you have to take care of backwards compatibility, and you have to think about API design, what the developer experience is going to be like, and document it... And you don't want to break -- the system has about 200-300 plugins and you don't want to break the plugins... There are a ton of challenges.

Richard Rodger: I was hugely inspired by the older writings on open source like the Cathedral and the Bazaar, and things like that, to start doing open source... But I think a lot of those observations apply to the very large projects that were prevalent at the time. Nowadays, when anybody can push anything to GitHub and anything can be open-sourced immediately, if the thing that you're building gets any amount of traction, you end up with a community that you have to manage, and you shouldn't underestimate the challenge there. It's a big responsibility, if nothing else.

Stefan Tilkov: One of the things I wanted to talk about is I've heard you mention before that one of your mantras is "Generalize first."

Richard Rodger: Yes.

Stefan Tilkov: Can you talk a bit about that? Because that sounds kind of counter-intuitive, and it also sounds slightly different from your earlier take. Can you elaborate a bit on that?

Richard Rodger: I think the example I was using, of different types of users logging in is perhaps a good example there. Another example I like to use that's more business-logic focused is sales tax calculation. If you think about it, if you were tasked as a developer to build a sales tax engine, you're building a rules engine, and you're going to have look-up tables... And I'm talking about for a proper sales tax engine that can handle purchases in the European Union, and the U.S., and all over the world. You're going to end up with an extremely complex piece of software, that's very difficult to maintain and will end up with lots of technical debt just because of its complexity. But if you think about solving that problem in a microservice context and you're building an e-commerce site, well, the way I would start that is literally just having a single message that would say something like "Calculate VAT" or "Calculate the sales tax for me", and a single configuration parameter, which is the rate. That will solve 95% of your sales tax problems. It won't always be correct.

Stefan Tilkov: Okay.

Richard Rodger: But you can still deploy that and run for six months, and then solve the sales tax problem in a way that it's normally solved in business, which is if you calculate it wrong, you resubmit your tax return or you give somebody a refund... You solve it the normal, business way - manual human intervention - in the small number of cases that it's necessary.

Stefan Tilkov: So essentially what you mean by "Generalize first" is stick to the general case first.

Richard Rodger: Yes.

Stefan Tilkov: Okay. Because what I was thinking is that you're advocating for having overly generic components that have tons of configuration parameters, so the exact opposite...

Richard Rodger: No, no...

Stefan Tilkov: Okay, I see. It makes sense then.

Richard Rodger: It's perhaps a dangerous way to phrase it. No, we're not talking about "Batteries included." We're not talking about what you get when you're doing it in Python, which they've done a fantastic job of, but no, you want to have absolutely bare-bones implementations. And in microservice systems that I've been involved in that have been long-lived and that have lasted a couple of years, what you tend to see is that there's a core set of business logic services that are very simple, that handle the general case - by which I mean the 90% case. They tend to be very long-lived. And you know, bugs appear and are fixed, but the actual functionality of those microservices doesn't tend to change or expand very much.

Richard Rodger: Then you have a proliferation of smaller services that handle edge cases. When you hear people say "Oh, I've got a 300-microservice system", if you analyze that system, a lot of the time you find that there are like 50 core services and then a whole bunch of edge cases... And the edge cases tend to be relatively short-lived, so it's not quite as bad or as complex as it seems.

Stefan Tilkov: Very good. Is there something that we should have talked about that we didn't talk about yet?

Richard Rodger: One thing I would say about being a software developer in general - and it's something that I've kind of noticed... I've been doing it for 20 years, but I've also been (I guess) lucky to have ended up on the business side of things, as well. And the reason I ended up on the business side is because I'm not a particularly good developer. I'm okay, I can build large systems without completely screwing up, but I definitely have worked with people who are way better. So if you're not going to operate at a very high level, go into business... Because then you are one of the smartest people in the room, because you can use your analytical skills quite effectively.

Richard Rodger: What I've noticed about a lot of developers, especially as they go through their careers, especially if they stay focused on development, is a certain degree of cynicism and bitterness creeps in, because business people messing up their engineering all the time... Changing requirements, or just playing politics, all this sort of stuff.

Richard Rodger: There's this really funny video where there's a developer in a room, with a whiteboard, and three identical--

Stefan Tilkov: The Expert? I love it.

Richard Rodger: Yes... And they're trying to insist that the blue marker is red, or something like that... And the poor guy at the end sort of goes like, "Yeah, it's red." He just accepts his fate.

Richard Rodger: I think that a healthier approach is to accept reality as it is. Whenever humans get together, it's like monkeys in trees. There is always politics, somebody is always trying to be the alpha, there's always a hierarchy... It's in our DNA, it's just what we do. And it doesn't mean that you have to suddenly start reading Machiavelli, although that's a really good idea... It doesn't mean you have to play that game, or try to be president or whatever, but if you cultivate an acceptance of the political side of human life, and the fact that in closed communities like schools and prisons and businesses those things get very amplified, you can at least develop a more scientific understanding of why business people do crazy things that are damaging to the engineering side of the business. It allows you to get beyond an emotional bitterness, and it allows you to be more effective.

Richard Rodger: You certainly won't win all your battles, but it allows you to do things like, instead of complaining that all marketing people are silly, it allows you to say "Well, I can probably find some of the marketing/salespeople who understand that if engineering is more effective, they will make more money", and those people can be advocates for the right approaches at a board level. It's really just about engaging with reality as it is, rather than how you want it to be, as a developer.

Richard Rodger: I think developers were so used to having so much control over our code and making the machine do what we want... We get really upset when we can't exercise the same degree of control over humans, and we think it's impossible, but it isn't impossible. It's just more difficult. To use a business phrase, you kind of have to lean into the politics a little bit to get what you want.

Stefan Tilkov: Makes perfect sense. What kinds of resources would you point people to to learn more about microservices in general and your approach in particular?

Richard Rodger: Okay, so I have written a book on microservices...

Stefan Tilkov: Which we will of course link to in the show notes

Richard Rodger: Yes. That kind of goes into a lot of these ideas and the philosophy, as well. Sam Newman's book, Building Microservices, is -- so my book doesn't have much code, and is sligthly less practical in that sense. Sam's book is sort of the Bible and it goes into a whole bunch of practical stuff. Even though it's about three years old now, it's still the first book you should read, for sure.

Richard Rodger: There's a really cool website called microservices.io. A really great thing that it contains is a catalog of microservice patterns. That's a really good way to take your thinking about "How do I architect a microservice to the next level?" Those patterns are applicable to all of the microservice architectures. They work as well for my weird scheme of things as they do for the more traditional "Netflix approach" where it's just REST web services. I think those three are definitely great places to get started.

Stefan Tilkov: Excellent. We'll link to that, as well as to any other of the resources that we've mentioned during the conversation... Which I think was great. Thank you very much for having been on the show. I enjoyed it very much.

Richard Rodger: Yes, me too. Thank you.

Stefan Tilkov: And thanks to our listeners for listening. Bye-bye!

Richard Rodger: Wonderful stuff!