Philippe Kruchten on Managing Technical Debt

Transcript

Sven Johann: Welcome to a new Conversation about Software Engineering, today with Philippe Kruchten. Philippe is a software engineer and professor of software engineering at the University of British Columbia in Vancouver. He is mostly known as the Director of Process Development at Rational Software, and the developer of the 4+1 architectural view model.

Sven Johann: Philippe recently wrote a book about managing technical debt, together with Ipek Ozkaya and Robert Nord which will be publicly available on May 4th, 2019. Philippe, welcome to the show.

Philippe Kruchten: Thank you, Sven.

Sven Johann: Did I forget to say anything important about you?

Philippe Kruchten: No, that's pretty good, thank you.

Sven Johann: Okay. Let's start briefly with some definitions. The first one is what is actually technical debt?

Philippe Kruchten: Technical debt is not like financial debt. It's not like you owe money to somebody. It's just a metaphor to express something that most software developers, especially when they're working on large, long-lived systems, have experienced. It's the accumulation of sub-optimal decisions, little imperfections in the code. They're not bugs; the code works, it does what it has to do, but gradually over the months and over the years it's become harder and harder to evolve the software, to add new functionality, and that kind of thing.

Philippe Kruchten: It feels a little bit like a mortgage. You cannot use all your money to spend it on whatever you want, because you have this debt to reimburse. If you don't reimburse it, interests are piling up.

Sven Johann: You mentioned interest - debt always has some interest and some principal to pay back... Can you elaborate on those terms?

Philippe Kruchten: Let's assume you've done something to the software, just as a shortcut. Well, you went through some deadline and everything is happy; you go back and you just fix it. You do the right thing. It was not quite right before, but now you take the time to do the right thing. This is just reimbursing the principal.

Philippe Kruchten: But let's assume you don't fix it, because you don't have time. Well, you'd have to pay some interest. Some evolution a few weeks later will be a little bit more complicated because you haven't repaid the principal. Then a few weeks later it's a little bit more complicated because you haven't repaid the principal, and so on.

Philippe Kruchten: If you don't repay the principal, there's still some additional costs, and those costs pile up. As you pile up new software on top of the old one, reimbursing the principal becomes harder. "Oh, no...! I'm really, really annoyed by this decision that we made two years ago. Let's fix it. Oh, but now that we want to fix it, there's a lot of things that depend on that not-quite-right code." So you decide "Well, maybe I shouldn't touch it, because it's too much of a house of cards built up on top of it." And then the interest continues to pile up.

Sven Johann: You mentioned cost. Is there also value in it?

Philippe Kruchten: Well, is there value in having defects? Well, no. Cost is what you're going to spend to fix it, to reimburse your debt. Pretty much like what you're going to spend to develop new functionality, or to fix a defect. Value is in the eyes of the beholder, where the users, the buyers, the people who pay you some money to get your software. Technical debt has no externally-visible value. In that respect, it's a little bit like software architecture - the value of the architecture is not visible externally. You see features, you see the functionality, the capabilities of the software, but you don't see the actual architecture. Technical debt is a little bit in the same category - you don't see its negative value outside.

Philippe Kruchten: Defects - well, they have a negative value and you see them outside. That's why you're going to roll up your sleeve and fix the defects. Fixing the technical debt - some managers say "Yeah, it's good enough. It works. Let's not worry about it. Let's move on."

Sven Johann: I thought that the value of technical debt is that you can go faster...

Philippe Kruchten: Yes, you can go faster. That's usually the reason you're willing to incur technical debt. You can deliver something earlier, but it's the long-term consequences of it. Is it a one-shot software? There's one delivery of it and then you forget it? Well, that's like declaring bankruptcy; you just move away from your debt. But if that software is successful and you want to continue to evolve it and to maintain it, then there's a compromise there. There's the decision "Do we want to reimburse our principal or not?"

Philippe Kruchten: So yes, there is some immediate value that you can get out of taking some technical debt, pretty much like there is some immediate value in borrowing money to buy a new car. You can use the car tomorrow, or this afternoon. You don't have to pile up enough money to buy a car and take the bus in the meantime, or your bike. So there is a little bit of value, but the value that you have is consumed immediately, and it's rapidly forgotten by the people who manage software. They say "Oh, I don't recall that we had any problems with release 1. Let's move on to release 2." Well, there's some work to do between release 1 and release 2 in order to get really 3, and 4, and 5, and 6, and 7.

Sven Johann: How do you communicate that? We software developers all know what technical debt is; we all took shortcuts, and I'm not sure if it's forgotten - developers know about it. It's just a very hard sell, because it's invisible... And I'm just wondering how you can sell something invisible like technical debt? Probably the same thing like selling something invisible like architecture.

Philippe Kruchten: Yes, it's a pretty hard sell in some shops, especially when the people making decisions about the trajectory of the product are not that technical. They've never developed lots of code themselves. Maybe they say "Oh yeah, I was programming in C++ when I was in school, but that's about it." It's difficult...

Philippe Kruchten: The reason technical debt took off as a metaphor in the last few years is because of the financial metaphor, the gap between the business side of the companies and the technical side can be bridged a little bit. You can use the financial metaphor to explain what's happening to the software. So... No, there's not much you can do. You have to explain, and explain again, and give concrete examples, and examples from other projects.

Philippe Kruchten: The danger also is that the metaphor is not perfect. You may accumulate a lot of technical debt, but it's not necessarily all the technical debt that you have to reimburse. You have to reimburse the technical debt that is annoying you now. The one that is slowing you now. If you have a large subsystem, with a lot of technical debt, but it's pretty stable, nobody is touching it, and you don't need to evolve it for future functionality, then leave the debt there. You don't have to reimburse it.

Philippe Kruchten: This is not the same thing as a mortgage on a house. You have to pay the bank the whole mortgage. You cannot say "Oh, you know what - I'm not using the garage, so I'm not going to reimburse you for the garage." That's where the metaphor is a little bit loose.

Sven Johann: Basically, that's the difference between potential and actual technical debt, right? If I have a very bad submodule--

Philippe Kruchten: If you're using static analysis to find what people nowadays call code smells, which is a form of technical debt at the code level, the tool is agnostic. The tool will tell you about technical debt in all your code. And then it's pretty daunting - you have this massive amount of warnings about technical debt and code smells and imperfection in your code... But actually, you can look at it and say "Well, I don't care about two-thirds of it. It's not in my way, so I can just focus on the technical debt that is actual debt based on what I want to do now with the software."

Sven Johann: I think there is an interesting tool - CodeScene is the name - which looks at the code smells, and also at the frequency of the commits.

Philippe Kruchten: Yes.

Sven Johann: That's quite interesting. On the other side, it also has a little downside... There are codebases where you have lots of technical debt and nobody is brave enough to touch that code... So nobody is changing it, although they would like to.

Philippe Kruchten: There are two other phenomena that come into play. The technical debt is not very well understood because the original developers are gone, and they haven't left much documentation, so people are reluctant to touch the code. It's not buggy, the code works, but it just looks pretty weird and ugly.

Philippe Kruchten: The other reason is if you do not have any way to test things, refactoring the code, if you don't have ways to retest it to see if you have changed functionality, that's also taking a big risk. Absence of regression tests and absence of documentation or knowledge about the code are two factors that are not a good incentive for fixing the technical debt, or reimbursing your principal, as you may call it.

Sven Johann: Let's say I start a new project, but the project is unfortunately a very old insurance system, it's already 15 years old... What you mean is before actually fixing any bad code, we should first start fixing missing tests and documentation?

Philippe Kruchten: No, but you need to understand... If you're going to refactor a subsystem, you need to understand what it does. And if you have no way to assess that the changes that you're going to do are not damaging, it's pretty risky. I'm not saying don't do it, but it's taking a lot of risk... Risk of introducing defects, just because you did not understand what was the original intent of that module.

Philippe Kruchten: But let's assume that smart people can read the code and understand the code. The second thing about not having tests - that's something that you can build. You can build not necessarily unit tests at a low level of granularity, but system-level tests to test the functionality of that subsystem is not affected by the refactoring that you're going to do to it.

Sven Johann: I think in some domains, like financial domains, if you want to replace a large financial system, you're not even allowed to do that without showing that there is a substantial amount of system-level tests, because people can lose money, and stuff like that.

Philippe Kruchten: People call that test debt, as a variant of technical debt.

Sven Johann: You said earlier that the metaphor gained some traction in the last years. It's more than 25 years old; I think Cunningham coined the term in 1992, or something... Is there any other reason why it took off in the last years? What comes to my mind - 20 years ago most systems were not so huge, so rewriting was easy, but nowadays you just cannot rewrite a system. Most systems are too big to rewrite.

Philippe Kruchten: Yes, there's a lot of big systems. I don't know exactly why... I think there's also some effect of communication. We have more and more people expressing themselves in the form of blogs and various online publications, and people discover technical debt, and they're running to somebody who expressed to them the metaphor and point them to some presentation by Steve McConnell or Martin Fowler and they say "Oh yeah, this is technical debt. Now I understand what I'm suffering from", and they express some opinions about it.

Philippe Kruchten: It was sort of growing. 6-7 years ago we'd google "technical debt" and there were tens of thousands of references, but not that much more information. It was people just repeating roughly the same thing, and giving different examples. This thing is not taught at school. There is no course on technical debt that I know of... So it's the kind of thing that people discover on the job.

Philippe Kruchten: And then a few tools, static analyzers started to hook on terms like code smell, and imperfection, and things like that, and they started to link that to technical debt. One in particular that was pretty instrumental in doing that is Sonar.

Sven Johann: Yes, I think Sonar is quite interesting. There is basically no project without a Sonar, and all the projects which don't have it - they are crying for it. So that really worked out.

Philippe Kruchten: Yes, that worked out. However, there is an issue there - it points at what I call relatively low code level technical debt. There is a relatively wide range of technical debts. Some is at the architectural level; some of the big decisions that you've taken about how to structure the system - which programming language, which framework to use, and all that kind of thing. A tool will not tell you that this is technical debt. That's only in the head of the designers. That's the kind of thing that you cannot just obtain by running some tool.

Philippe Kruchten: A few things about the structure - if you have cyclical dependencies, and strange class hierarchy, a few tools will detect that. But some of the fundamental architectural technical debt - only the local people who lived through it will be able to point it out.

Philippe Kruchten: But the metaphor took on mostly because people realized that they've accumulated a lot of technical debt at the code level, and static analyzers were starting to point at them... And they got some good traction in actually fixing the technical debt and inserting that into the regular development cycle. Some organization that I know, once every 4-5 iterations or sprints, they focus on reducing technical debt. But very focused, not just random technical debt in general; the technical debt that is the actual debt, the thing that is slowing them down now.

Sven Johann: That's a problem that you don't randomly improve the code. I know organizations which just look at the pure numbers. You have 20 blockers, 30 critical, and 50 major warnings in Sonar, and then developers just start to pick the low-hanging fruits to get the numbers down, instead of looking at the real problems.

Philippe Kruchten: Yes.

Sven Johann: I think prioritizing is tricky.

Philippe Kruchten: Yes, you need to prioritize... Unless you have infinite resources available. Bringing the summer intern from university and letting them fix the technical debt.

Sven Johann: But still, even if you have lots of resources, there's always opportunity cost, right? You could do something useful with that time.

Philippe Kruchten: And there are risks associated with it. Refactoring is not a totally riskless exercise. You may introduce bugs.

Sven Johann: If you don't have test cases, or good test coverage.

Philippe Kruchten: You know, testing has its limits, too.

Sven Johann: Yes. That's true. You said this one company deliberately prioritizes every 4-5 sprints, the technical debt... How would you do that? How do you prioritize, how do you plan this work?

Philippe Kruchten: You need to bring technical debt at the same level of visibility as new functionality, defects, something like that. It needs to be in your backlog, it needs to be decomposed in a sufficiently small level of granularity, it needs to be estimated, and then you prioritize it with other activities. You could prioritize a little bit of technical debt fixing strategically in various situations.

Philippe Kruchten: That company I've taken as an example - they just try to focus on technical debt at some point in time in their release cycle. Immediately after a release, the first one or two iterations, there's a little bit more room to fix/address technical debt, giving them a little bit more time to work on the requirements for new functionality. But that's just a strategy. It's just putting some discipline.

Philippe Kruchten: Technical debt needs to be brought up, visible, and the costs associated with remediation need to be evaluated pretty much the same way you do it for any other action that you're going to take on your software. The big change that organizations have to do is to make technical debt a visible element. It's not some dirty secret that just a few developers know about and grumble. It needs to be brought up to a level of visibility where the whole organization can take technical debt into account in their decision about "What are we going to do next week/next month?"

Sven Johann: If technical debt is invisible, but at some point the effects become visible... Making it visible in a company, do I already need to have some problems, or to express that? For example, if I just have small code smells, which I can easily fix, that's not a problem; if I have many code smells, usually they end up in a higher amount of bugs, for example... So with that it becomes visible. Or if I have big problems with the architecture; the architecture is not really meeting the requirements a system has, for example scalability or performance or security, then it's becoming visible.

Philippe Kruchten: Yes. The cost of technical debt, not just in the cost that it would take developers to change the code and repay the principle, as we were calling it earlier on... But you have to look at the consequences. There are dependencies between the things that you want to do. If we change the architecture, then adding this kind of functionality will be easier. If we remove these code clones, then we will have less errors of that nature, like the one we had last week... And articulate the value of fixing technical debt in terms that make sense from a global business perspective - reducing risk, reducing the likelihood of introducing new defects, and making future development easier, faster, more secure etc.

Sven Johann: I think if you already have a problem, then it's usually very easy to communicate. The big problem is communicating the iceberg. It's not really visible what's coming, but you will have lots of problems if you don't fix it.

Philippe Kruchten: Yes.

Sven Johann: It's tricky. If it's just a small change, then it's okay, but I worked on systems where people said "Okay, we have these problems. How long does it take to fix it?" and then it's like "Three people need half a year", and then it's like "Oh, god..." So that's then a totally different story. I think that's really hard to communicate early enough.

Philippe Kruchten: In some ways - I was describing earlier - it's not very different from looking at architecture. Why would a large system have 2-3 people called "software architect"? What are they producing of value? It's pretty hidden. They have a cost, but the value is very difficult to articulate very often.

Sven Johann: Yes, that's true.

Philippe Kruchten: Communicating, explaining the ramification, the dependencies, doing what-if scenarios, "What if we do this? What would be the consequences on the next release and the subsequent release?" and explaining that to the people who are involved in making the big decisions - the product owner, the product managers, the team leaders, the VP of engineering, whatever. Explaining things is unfortunately about the only way forward. Just saying "Oh, we've run Sonar and look at the numbers. They're frightening." Well, that's ridiculous, because you can say "So what?"

Sven Johann: Yes. I remember once one business leader told me "Your colleagues are always coming and saying "We have to fix this code because it's crap", but I cannot work with that statement." So it has to be a little bit more concrete than "The code is crap" or "The code is ugly."

Sven Johann: I think what's good these days is that most product owners or organizations already felt the consequences, in some way or the other... Especially large financial organizations, they all have their COBOL systems nobody can maintain anymore, and you cannot hire people... I think it's better than a few years ago, or like ten years ago.

Philippe Kruchten: Yes. But the problem had existed for a long time. It's not because we have this handy metaphor that suddenly there's a new problem that appeared in 2000 or in 1992. People have looked at software evolution for a long time, but it was not a very sexy field in software engineering. A few people had discussions, and books, and technical conferences about software evolution, but we've known the issues for a long time, using different words. Technical debt just brings another vocabulary on something that was known before, but it's not something that's that exciting.

Philippe Kruchten: It married relatively well with iterative development and the agile movement, because you could do something, you could have some tactical decision at the level of one iteration. You could discover that you have some technical debt, it's getting in your way. The next iteration - you reimburse that technical debt. The subsequent iteration you move forward with some new, clean code. So iterative development facilitated the identification and the resolution of technical debt, as opposed to a massive waterfall thing, where you would do coding after having done the design, and then you discover you have some technical debt, but you're running out of time because testing needs to get started, so that we can deliver.

Philippe Kruchten: So the big waterfall model, with no iteration, was not really suitable for addressing technical debt in a very tactical fashion... Pretty much like the big waterfall was not very friendly for architecture, and you had the people complaining "Oh, we have big upfront architecture."

Philippe Kruchten: Iterative development allows you to try out the architecture and build an architectural prototype, start building software and validate the architecture as you go along. The same phenomenon happened with technical debt. Iterative development allowed technical debt to be taken into account proactively in the development process.

Sven Johann: You mentioned software maintenance or evolution topic... I think as Scrum already said 25 years ago, iteration two is already maintenance. Everything is maintenance.

Philippe Kruchten: Yes.

Sven Johann: So in my understanding, since there are not so many greenfield projects anymore... Or even if you have a greenfield project, after a couple of weeks or months you already deliver - even if it's just an internal delivery - you're already in maintenance model. In my perception, that's also the reason why it took off in the last years.

Philippe Kruchten: Yes.

Sven Johann: When I speak to some business owners, I think explaining the value is probably easy if they already have some pain. They have lots of bugs, or lots of customer complaints, or the system is down for an unwanted amount of time... Then the value is easy to articulate. But how do you calculate the cost of fixing the debt?

Philippe Kruchten: Yes, that's still the big unknown in software engineering. We are very bad at estimating the cost of doing something. We are very bad, or very often too optimistic. You need to look at what is the current state, what is the next state, how much software development do we need to get there, how much testing, how much regression do we incur, how much defect are we going to introduce... This is just software estimation. Some people have given up on software estimation. It's not something that's that easy to do; it requires a lot of experience. No tool is going to do it for you. Very few people claim that they have some magic wand, or Function Point Analysis, and things like that... That may be good for greenfield development, but it's not so good for just evolving an existing system. You're not adding any functionality, you're just evolving, refactoring some code.

Philippe Kruchten: It's hard, and there is no magic in estimating the cost. People with experience in that technology, experience with that system, will be the most qualified to give some estimates about the cost. If people keep track of the actual cost, they could become a little bit more clever about giving estimation... But for some reason, it seems to be pretty hard for software developers to keep track of the actual cost of doing things, so... No magic there.

Sven Johann: I once talked to Dave Thomas about that problem, and I think you also had a similar discussion with him... He said that you can estimate, but it's really important if you change something on a very large scale that you first promise a relatively cheap prototype. You have to show that the change is possible on a small scale, within a certain amount of time, and also be able to explain that you can scale your solution to the whole problem... But the most important thing is that you are able to deliver a visible small fix to the problem, let's say within three months, if it's a really big problem. Then you already gained some experience, and then you can just give probably better estimates about it.

Philippe Kruchten: Yes, that's the whole idea in doing things iteratively. Don't tackle a big mountain of development or refactoring as one monolithic piece. Try to break it down into some smaller piece, and try to do some prototyping and some experimentation, and then step back and reflect "What have we learned, and how does that change our estimate for doing the whole job?"

Sven Johann: It's funny that you say that it's just incremental software development. I'm always wondering -- it's an old concept to do things in very small steps, or in small steps, but still, a lot of organizations, especially large organizations, don't like it. They really want to solve everything in one, they want to eat the elephant during lunch, or something like that...

Sven Johann: So we had the architectural debt, we had debt on the source code level... Is there anything else when it comes to technical debt, besides architecture and source code?

Philippe Kruchten: There may be some interesting debt in some systems in the production of the code and the deployment of the code, at the operational level. A lot of organizations use very complex scripts and manual steps to bring the software from the lab to the operational level, and it occurred to us when we were looking at that technical debt that there is a lot of interesting technical debt there because very often those scripts to put the code in place, and change the data, upload the data, and things like that - they are very often not even under configuration management. They are very dependent on a few people who know about them. So there is some interesting technical debt at the operation level.

Philippe Kruchten: Then, as I mentioned before, there's probably some technical debt at the test level, with the test suites that are not properly maintained, or that are incomplete, or that are testing the wrong thing. We hear about documentation debt... The code-level debt - now, there's quite a few tools, and people understand it, and it's identified, and it points at a small level of granularity where you can do some estimates and you can service it.

Philippe Kruchten: Architectural debt - it's still pretty big stuff, where people will be very reluctant to address it if they understand that they have architectural debt. Infrastructure debt - it's only for certain kinds of systems, and it has been not much explored, as far as I know so far... I'm trying to ease into it a little bit more.

Philippe Kruchten: The documentation debt - yes, sure. It seems to be pretty straightforward. It's all-around. The danger is to start calling everything technical debt, because if everything is called technical debt, then the concept loses its value. I run quite often into organizations which equate defects and technical debt. I think it's useful to make a clear separation between the two, and maybe in a few cases there is something that is a little bit ambiguous whether it's a defect or technical debt. Accumulation of technical debt will lead to more defects, but that's not the same thing.

Sven Johann: When you talked about infrastructure, I'm just wondering... Infrastructure as code is now not a super-new concept, but it's also not that old. It started maybe 6, 7, 8 years ago, or something, and let's say the last five years it's kind of normal... But I think it's very hard to really automate everything.

Sven Johann: In my project, I really feel also the infrastructure as code steps already, because usually it should only take executing a script to set up a new environment, but then it takes weeks and weeks, and you wonder how can that actually be...? Because we actually automated everything. But still, here's something missing, and there's something not working...

Sven Johann: I wrote also as a point that we have the technological gap as some sort of technical debt.

Philippe Kruchten: Yeah, that's an interesting concept. The name comes from Jean-Louis Letouzey. It's just time passing by. At the time you made this design choice, or this implementation, it was the best you could do. But now you are very successful five years later, and the context has changed. Although you haven't done anything to your code, what was good five years ago now doesn't look that good today. That's the technological gap.

Philippe Kruchten: It's unfortunate... It's technical debt that you incur, but not because of your fault. It happened to you by the passing of time, and by the fact -- all the things in the environment have changed. So what was the perfect API at some point in time to do some functionality, now everybody has moved to something much better, and you're still there with the old API... And people say "Why are you doing this?" "Because this is the decision we've made five years or ten years ago." That's technical debt that's happened to you unintentionally.

Sven Johann: Is it the same like software aging David Parnas once described?

Philippe Kruchten: It's one example of software aging.

Sven Johann: I think the technological gap is really interesting. It's one of those things that's very hard to get rid of. If you use technology XYZ and it's system-wide, it's very hard and costly to change.

Philippe Kruchten: Yes, some companies have gone bankrupt because of it. They realized that they had made some very small technical choice, but ten years later the whole industry didn't go that direction, and now they're isolated. They've painted themselves in a corner... And they try to re-engineer the system completely, but usually they die in that process, because they cannot deliver any new things to their customer base, their install base, they cannot develop new business, and it takes them two years to reimplement the whole system using more modern technology, and they die in the process.

Sven Johann: Yes, the big rewrite. I've been in those projects... Too often, actually. We didn't go bankrupt, but it usually costs an enormous amount of time, and you have unhappy customers because you don't deliver anything new, and it usually takes longer than you think, so it's bad... But how can you reduce the risk getting into the trap of a technological gap?

Philippe Kruchten: Well, drive with your eyes open. People really get to the point of almost bankruptcy because of technological gap. Or people who are not really keeping their eyes open and looking at what's going on in the outside world.

Philippe Kruchten: Another way you get into something similar to this technological gap is when you have companies merging and they try to merge their products, that usually brings a lot of technical debt up to the surface. You have a company acquiring another company, and they have similar products, and the goal of the business people is to deliver one single product that has the best of both worlds, and keep both customer bases.

Philippe Kruchten: We know that this merging of systems is pretty difficult, and a lot of architectural technical debt emerges at that time. That's where also you can feel some of the technological gap - the fact that some systems have made some assumptions that are hard to remove based on what we know today.

Sven Johann: Yes, I remember a German bank - they were in the news because they were a candidate of a merger... Or actually it was an acquisition. And even regular German newspapers wrote "Here are the three problems of this bank which make this acquisition difficult. Here's number one, here's number two...", and number three was actually the very old software systems, which nobody could handle anymore. I've found that quite interesting... That it's already part of the mainstream media almost that your old system is a reason why you shouldn't acquire a bank.

Philippe Kruchten: Yes. I've seen a financial institution running into technical debt extremely rapidly. It was a very large system and they had to reimplement it almost from scratch.

Sven Johann: How long did it take?

Philippe Kruchten: It took them two years to recover from it.

Sven Johann: Well, if it's a large system, two years is not that long.

Philippe Kruchten: When it's a large organization and software is not their main output... If a company has only one product; it's a software company and they have one product, that's very dangerous; because if that one product needs two years to be fixed or reimplemented, then they'll go bankrupt.

Philippe Kruchten: Organizations that have multiple products - well, some project can be delayed, and the company doesn't necessarily go bankrupt. And if software is not the main output - we don't expect banks to actually do software; they just use software - then yeah, they get into difficulties with their shareholders, but they're not necessarily going to go bankrupt because of that.

Sven Johann: Yes, but for example I switched to a new bank because the online bank of the old bank was really too bad, too slow, too everything... So for me actually the software was a reason to move to a new bank.

Philippe Kruchten: Oh, okay. Sure.

Sven Johann: Yes. Okay, so maybe a few words about managing the technical debt strategically and tactically. One thing we already discussed - can technical debt be a good thing?

Philippe Kruchten: Yes, technical debt can be a good thing. Technical debt is what allows you to get past some hurdles, meet some hard deadline... But you have to make it very visible that you're taking some technical debt in order to meet that deadline. It shall not be done secretly, and then we forget about it, and we come back to the office on Monday as if nothing had happened.

Philippe Kruchten: Now there is a new item on your backlog that says "Fix this. In priority. If we don't, we'll suffer from it forever." Honestly, technical debt needs to be made visible. And when you take some technical debt for some good reason, you must absolutely make it visible that you've had to take some technical debt in order to do that. You don't just say to the management "Oh, we met the deadline! Yeah! Let's open the champagne and celebrate!" We met this deadline at a cost, and that cost is ahead of us.

Philippe Kruchten: Awareness and education, information is key there. And then make it a regular practice to identify technical debt, make it visible, bring it to the same backlog as new features and defects to fix... Make it visible, and then do estimation. Break it down into some small chunk. Have some label that says "This is technical debt, but it's only potential. It doesn't block us from doing anything right now, so it can be postponed." Its priority gets much lower, and we don't need to do anything about it today.

Philippe Kruchten: Buy some tools, some static analyzer to analyze the structure of your software, and code smells, and imperfections and violations of a coding style, and whatever. And then look at it from a critical standpoint - is any of that slowing us down, or is any of that putting us at risk? Then let's address it. If it's not, okay. It's good to know that we have some crap there, but let's not fix it, because it's not impending anything. That's another attitude.

Philippe Kruchten: Then, at the higher level, people will understand the technology, understand the history of the system. What I do when I do some consulting on technical debt, I say "All the things that you regret having done now. Are there some design decisions, some choices in terms of tools, framework, libraries that you regret? Why?" And people will say "Yeah, if I had to do the system again, I would organize it like this..." Okay... Why? Why is that a regret? "Because now when we do this, it's harder." Okay, now you've put your finger on some potential technical debt.

Philippe Kruchten: So what is it that if you had to do it again, you would do differently, with what you know today? That's usually useful for identifying architectural debt... Not of which much can be fixed, by the way. If you realize that your biggest technical debt is having picked the wrong programming language - well, I'm sorry, but you have 200,000 lines of Clojure and nobody wants to use Clojure, and you have a hard time hiring developers that want to do things in Clojure. "Yeah, we should have done the system in C++ or Javascript." That's a pretty big technical debt.

Sven Johann: But on the other side, you could say we split the 200,000 lines of code system into several new system. If that's possible, then you could also get rid of the "too much Clojure" if you don't like it.

Philippe Kruchten: Yes.

Sven Johann: You said that if you communicate the technical debt, you make it visible - if you take it, that means you need to know that you took on some debt; intentional debt. But there is also the unintentional debt. Is that what you said when you had the system for half a year or longer, and when you then look back, "Now we know better"? Is that what you mean with unintentional debt?

Philippe Kruchten: Well, some people will never realize that the decision they made was not quite right. You need a little bit of humility and self-reflection to say "Yeah, maybe this was not such a great idea. Maybe we should not have done that. Maybe this solution that we see today was a better solution." It's not obvious to discover your unintentional technical debt taken two years ago or ten years ago.

Philippe Kruchten: Either the people who made the choices are not around, and it's difficult to understand why that decision was made, or the consequences are too big for people to even envisage it. But unintentional debt - you can reduce it by more awareness and education, and having a healthy discussion about it.

Sven Johann: Yes, get from time to time fresh blood into the project. That also helps, I think.

Philippe Kruchten: Yes, fresh blood on the product... But not just to do some nasty criticism. "Oh, this is crap. This is stupid." No. There were some good reasons to make some decisions, and the context has changed, time has passed, and the system is much more successful. The system had to be scaled up to millions of users... But what can we do to be able to continue to evolve the system at a reasonable cost?

Sven Johann: If I don't want to pay back the debt, I just have to live with it. Sometimes that's a good choice. When can I safely live with my technical debt?

Philippe Kruchten: At least you need to be aware that you're living with that debt. If the cost of the evolution is still bearable...

Sven Johann: Or you don't suffer yet from it. That would be another way.

Philippe Kruchten: This is why you can be very selective, especially with low-level, code-level technical debt. Don't rush into fixing things everywhere, wherever you can. Just be very selective. Do it in the places that need to evolve. The software, as you described, is the tool that -- I forgot now... The software where you have a lot of code smell, and at the same time where there's a lot of commits, to use the Git jargon.

Sven Johann: Yes, CodeScene is the tool. Okay, a final question from my side... Have you ever looked into error budgets? Something from the site reliability engineering movement, which Google and other companies started...

Philippe Kruchten: No, I have to admit I haven't. I didn't even know what you meant by SRE.

Sven Johann: Okay. I recently started looking into it. It's a way to balance delivering features at a high velocity, but also understanding the risk and consequences of doing that. With that, you balance work on architecture and on reliability and feature development.

Philippe Kruchten: It's a little bit like what I've recommended people to do... Rather than just looking at new development, and doing estimates, and planning of iteration and sprint based on new development, have 20 persons in bug fixing, have 10 persons in tech debt reduction. Have some budget allocated to that, and change them dynamically if they don't make sense... But at least acknowledge the fact that you will be introducing defects, so you must reserve some capacity to fix those defects.

Philippe Kruchten: Similarly, you will be introducing some technical debt, or you have some already, so drive with your eyes open. Acknowledge that you have it, and that you will spend some time and effort reducing it. Put that explicitly as some kind of a percentage in all your sprints and iterations in a given release schedule.

Sven Johann: Yes, exactly.

Philippe Kruchten: I don't know if that's the same idea, but...

Sven Johann: With them it's kind of similar. You really define a threshold, what you expect from a service to deliver. If you're worse than the threshold, then you know your customers become unhappy. The threshold is like a dividing line between happy and unhappy customers. That's communicated across the organization. And if everyone agrees that a certain amount of bugs, or slow service, or service with lots of errors doesn't meet the expectation, that has consequences. So you're not allowed to deploy any software anymore. You have to work on reliability, or fix the issues. I think it's quite interesting, so I will further work with it.

Sven Johann: Anything I forgot to ask, that you think I should have asked?

Philippe Kruchten: We need to keep the information out, make people aware of technical debt, make people aware of technical debt outside of the pure software development organizations. I had a discussion with some manager of a company a few days ago who said "Oh, this is an interesting concept. I never heard of that. Now, this explains a few things in my own organization." That guy knows nothing about software development, but... Just telling him what it was, the metaphor, with some concrete examples in software, and some warning about the limits of the metaphor, the place it breaks a little bit (it's not exactly like a mortgage), that guy is now really scratching his head about his organization, and will ask some questions to his IT people.

Sven Johann: Alright. Philippe, thank you very much for being on the show.

Philippe Kruchten: Thanks for inviting me, Sven.

Sven Johann: This was a Conversation about Software Engineering.