Conversations about Software Engineering

Conversations about Software Engineering (CaSE) is an interview podcast for software developers and architects about Software Engineering and related topics. We release a new episode every three weeks.

Transcript

Eberhard Wolff: Hello and welcome to a new conversation about software engineering. This is Eberhard Wolff from innoQ. Today on The Case Podcast we will talk about Spring [00:00:12.08]. My guest is Oliver Gierke. Oliver is the leader of the Spring Data project at Pivotal (formerly known as SpringSource) and he's also a member of the JPA 2.1 expert group.

He's been developing enterprise applications in open source projects for over eight years now, and his focus is centered around software architecture: Spring, REST and persistence. He is a regular speaker and German and international conferences and he also wrote a lot of technology articles, as well as the first book on Spring Data. Welcome to the show, Oliver!

Oliver Gierke: Thanks for having me

Eberhard Wolff: Thanks for joining in. What we are going to talk about today are some parts of the Spring ecosystem, so maybe we should start by talking about Spring in general. What is the Spring Framework?

Oliver Gierke: The Spring Framework is an open source application framework for JVM-based languages - mostly Java, of course - and it's probably the most used application framework in the ecosystem. It's used in enterprise applications, but also in smaller applications. It integrates with a lot of Java-y standards, but naturally extends it to some other areas that are not necessarily covered by the standards yet, and thus acts as a fundamental application framework, taking care of things like transactions, security, the way you structure your application and all these things.

Eberhard Wolff: Who is behind Spring?

Oliver Gierke: It's an open source project after all, but there's a company called Pivotal - the company I work for - that employs most of the engineers behind the framework, the core framework and the ecosystem projects. However, especially since our move to GitHub as our primary source tracker, there's been a lot of contributions from outside the Pivotal space and it's continuously growing and the integration with the community is still a great source of innovation and a great driver for us. It's the company aspect, people being paid for writing the software, but also integrating casual contributors.

Eberhard Wolff: I heard about Spring Boot, and that seems to be one of the latest big improvements in the Spring ecosystem, so what is it?

Oliver Gierke: The Spring Framework is a very configurable framework, so it lays a lot of foundations for things like transactions and security, but that also means that users of the framework, application developers have to make their own decisions about which persistence provider they use, which templating engine they use for their web views in case they're writing a web application, and Spring Boot takes an opinionated approach on exactly that, on top of the Spring Framework.

Oliver Gierke: It defaults a couple of decisions for you that you can undo if you know what you're doing, and it gives you a more packaged, out-of-the-box experience and gets you started easily with the framework itself. Then it also integrates with a lot of the ecosystem projects that we have out there, things like Spring Batch, Spring Integration, or the workload-specific projects that we have.

Oliver Gierke: There's even extensions on top of Spring Boot nowadays - Spring Cloud, which sort of covers the microservices/cloud native application space. Think of it as Pivotal/Spring consultancy baked into code, helping you to start with decent decisions about technology choices, configuration approaches and also the aspect of getting your app to production.

Eberhard Wolff: As you just mentioned, nowadays it's not just about the Spring Framework itself, but it's the whole ecosystem with Spring Boot and all the other projects that you mentioned. What we are going to focus on today is one part of the Spring ecosystem - we are going to talk about Spring Data. Obviously, you are the project lead, so can you say a few words about what that project actually provides?

Oliver Gierke: Of course. The project started almost ten years ago - at least some of the predecessors of it started back then already. It's an extension of the already existing data access support that you get from the core Spring Framework. The core framework takes care of things like raw JDBC access (improving that); it also helps you with setting up JPA, which is the Java Persistence API, the object-relational mapping standard we have in Java EE - it does that for you, or gives you guidance with that, but it turned out that on the one side we felt a need to raise the abstraction level slightly with the relational space. Also, back in the days (around 2010) we started looking into supporting NoSQL datastores, which especially back then were not really well supported in the Java space. Interestingly, the Neo4j guys (Neo4j graph database) started to work with Rod Johnson back in the days on a Neo4j template similar to the JDBC template, which is a helper class to ease data access with JDBC and, in that case, Neo4j.

Eberhard Wolff: Maybe we should mention Rod with a few words, because he is one of the original founders of the Spring Framework itself, and wrote the original book that started the whole Spring idea.

Oliver Gierke: Exactly. Yes, and Emily and Rod basically put their hands together and wrote something similar to the JDBC support for graph database, which was the starter... Later on, we pulled other things together, like the repository support in JPA and what have you.

Eberhard Wolff: Can you say a few words about why you would provide such an abstraction layer? Obviously, there is JDBC - JDBC is the usual API that you would use to access databases in Java; why would you add an API on top of that?

Oliver Gierke: On JDBC you mean?

Eberhard Wolff: Yes, or generally speaking. Neo4j I assume also has sort of a native API, so you're adding those abstraction layers on top of them, and I wonder why you do that.

Oliver Gierke: There's two aspects. I'd like to separate the relational side of things from the NoSQL side of things. On the NoSQL side we also cover different problems, but let's start with the JPA side. We already have a standardized API for object-relational mapping. There's already APIs to map the datastore onto objects, and we could use the EntityManager (that's a concept from JPA) to access the objects.

Eberhard Wolff: And JPA is the Java Persistence API that is standardized as part of the official Java EE standard.

Oliver Gierke: Exactly. The idea we started in that area is that usually your applications then implement some kind of data access objects, or if you're into the DDD space (domain-driven design) a bit more, then you think of a collection of entities or aggregates in terms of a repository. And what we originally started is a mechanism, a programming model for those repositories, an interface-based programming model so that you can get rid of most of the implementation code, because we use mechanisms that might be familiar from frameworks like Rails, where you can just declare a method in your repository interface and the framework will sort out the query to be executed for that method.

Eberhard Wolff: So you just declare a method like findByName and then, automatically, the correct query--

Oliver Gierke: Exactly, that'd be the most simple case. Of course, there's means to back those methods with more complex queries, because your application usually has more complex querying requirements... But it gives us a decent layer of abstraction to actually -- I hate to use the word 'hide', but to basically unify the way you work with the data access API. That then translates nicely into the NoSQL space that you mentioned.

Oliver Gierke: The NoSQL space is much more diverse than the relational space where you have different database providers, but still they all speak SQL, more or less. The JPA providers usually already abstract a lot of the differences between the databases - for better or for worse; you can have different opinions on that. The NoSQL space, just by the fact that it summarizes the entire space by something that the technologies are not (NoSQL), that basically results in the fact that you have very different and diverse database technologies to deal with. The question comes up of how do you actually ease the developer's task to navigate that space or implement an application on top of those different stores, being able to reuse some of the knowledge they might have gained with other stores.

Eberhard Wolff: Are you saying that on the repository level you can hide whether you're talking to a NoSQL database or a SQL database?

Oliver Gierke: I wouldn't go that far. The primary benefit that layer adds is first of all convenience, no matter if you switch stores or not, because it eases your work. The other part is that let's say you've worked on a project that has used JPA for quite a while and you get into a new project that uses MongoDB. Those are different datastores, and you will have to cater the different traits of those stores. But still, if you use Spring Data and MongoDB, it fundamentally works the same way; you still have the repository interfaces, you still have those query methods that you can declare...

Oliver Gierke: That doesn't actually free you from knowing about how to write a proper and performant query for that particular store, but all the application code facing bits work the same, just as in Spring Framework there are templates for JDBC or JMS (Java Messaging System). There's completely different technologies of course, but just by virtue of the fact that these templates sort of work the same, you sort of get some kind of knowledge transfer that's reasonable amongst the stores.

Eberhard Wolff: Yes, because the API just has templates, for example, and they all feel the same because they are basically the same concept, even though for different things.

Oliver Gierke: That's actually a good point to get into - what's underneath the repository. The repository is the most upper abstraction layer that we expose, and that can be technology-agnostic; it doesn't have to be, but we still try to strike a balance there.

Oliver Gierke: However, there's also lower level APIs that we expose that still give you some benefit in terms of resource management, exception translation, all the things that Spring users expect usually from happening. Exception translation means that you'll get the store-specific transactions translated into the Spring data access exception hierarchy, so that your client code doesn't have to catch those store-specific exceptions, but rather can use the generic ones.

Eberhard Wolff: What you're saying is that if I really need to use that very specific API function of that specific data store, I'm free to do so and I'm not tied to working on the abstraction level of the repository if it doesn't suit me for a specific use case.

Oliver Gierke: Right. The templates, by purpose, expose very store-specific APIs. The idea of that abstraction is a more technical one, even to take care of the technical aspects, like I mentioned, but still give you the access to very store-specific things like, let's say upserts in MongoDB - very specific database features that in the NoSQL space are usually features that you choose a particular store for. You choose a Neo4j because you can do graph traversals, you choose MongoDB because it allows you to nest documents inside documents; you choose them because of their special traits, and if you wanna use them, you can still do so behind these templates and can integrate them into the repository APIs very easily. That's the point here.

Eberhard Wolff: If I look at some of the persistence frameworks that in particular customers are creating, they often try to aim for a generic abstraction that abstracts away the specific datastore and tries to make it all look the same. So you're using a different approach... Can you say a few words about why you're doing that?

Oliver Gierke: Yes. Probably the most significant aspect here is that diversity in the NoSQL space. I don't think it's a remotely reasonable idea to try to get something like Redis, which is a key/value store, to share some data access API - and API, really - with Neo4j, for example, which is a graph database.

Oliver Gierke: We started the projects trying to do something similar, which was trying to look at document stores like the categories of NoSQL stores as one, trying to unify let's say MongoDB and Couchbase in the documents area, or Redis and Gemfire in the key/value space, but it very quickly turned out that even in that area certain aspects, especially from the store-specific features area, are very hard to unify... Because if you want to use the indexing capabilities of MongoDB, you'd have to use the mechanism that MongoDB supports for that.

Oliver Gierke: Of course, you can create a generic at-index annotation, for example, to allow to define those indexes, but if they then have to work the same way on MongoDB and Couchbase, that created attention that turned out to be not very useful in the end, so we thought we'd rather go with the store-specific APIs on the templates level, and then rather use that programming model approach on a more abstract, in other words, the repository level.

Eberhard Wolff: At least a few years back there was a discussion about how there are standards lacking in the NoSQL space. What do you think about that? Is that actually true? Should we wait for standards and they would solve all those problems?

Oliver Gierke: That's a tricky question, because there actually is some effort to create a JSR for NoSQL. Right now there are some folks working on it and we are remotely involved with that, but it's going to be a hard thing to do, especially because the standards are usually API-based; the challenges that I just described will basically pour into that work. I'm looking forward to see what they're going to come up with. I already shared my concerns, thoughts, opinions with them, and there's going to be something, but I don't think that it's really necessary. There is, of course, Spring Data, there's even a Java EE-based CDI extension, which means an extension to the standard that does similar things, at least for the JPA side of things, as we do. There are options for developers these days...

Eberhard Wolff: We should probably say that CDI is one of those standards that are part of the Java EE space, and the JSR standards that you just talked about are the process that Java EE uses to come up with standards. The standard that you were talking about is standard in the Java EE space that would take care of NoSQL databases for Java developers only.

Oliver Gierke: Right.

Eberhard Wolff: You said that the project provides data access APIs for the different data stores; there are all the different NoSQL flavors, relational databases, there's a repository concept where you would get a repository that gives you high-level access to the data store, and we talked about templates that allow you to work more easily with the proprietary APIs of the data stores. Those are the three parts of Spring Data that we've covered so far. Is there anything else that we are missing?

Oliver Gierke: Yes, one things that's special to the NoSQL space usually is that we also implement or take care of the way you map your store-specific data structures onto Java objects. That's something that's covered in JPA already, because that's taken care of by the APIs, by the standard, or the persistence providers implementing the standard, but the drivers that we have for NoSQL databases usually expose map-like structures - MongoDB or just key/value-based things in Redis... So there's still a need to map this data to Java objects, in case you're into not relational mapping, but object store mapping.

Oliver Gierke: There's some generic API that we expose our annotation-based model, usually also more tied to the specific store. You'll find Mongo-specific terminology in the mapping annotations for MongoDB, for example. There's been a lot of reflection involved, that's the way the JPA providers usually do it as well, but we've very recently investigated other approaches that make that more performant because it's usually that part of the data or of our code that probably adds the most overhead to the entire interaction with the store, because of course, if you read ten thousand objects from the database, it's the most performance-critical part, but there's been a couple of very decent additions recently.

Eberhard Wolff: So that's where you're doing what for relational databases JPA does, because you just wanted more performance. I thought that if I have a problem with the performance of my data storage, it's probably more about the database, and what you're saying is that object mapping might also be something that needs to be optimized.

Oliver Gierke: I don't think it's a critical part, because of course, if you read a lot of data, then the interaction with the datastore over the network is still the primary driver of what adds up to the latency... It's just that in our code there's different parts that create the repository proxies or the template implementation. That's sort of an indirection between your application code and the driver code that we eventually use to execute, and the object mapping part is the most performance-critical one; that's why we actually focus the most when it comes to performance optimizations on that stuff.

Oliver Gierke: There is no doubt an overhead attached to it, because we have to analyze your objects and use that, but that comes at the benefit of being able to just conveniently map your data onto the objects. There's also escape patches for that. If you really find that the object mapping is introducing an unbearable amount of overhead, then you can sneak into the code and provide your non-reflection-based, non-generic code to take care of that, and then on a very point-to-point basis optimize on that.

Eberhard Wolff: Assuming that I have a performance problem with my persistence, what would you try to optimize first - how you access the database or the object mapping? If I'm using your plain data stuff.

Oliver Gierke: You can't really answer that without having profiled the detailed scenario. There's two aspects to it: I definitely look into the query execution times in the datastores to maybe add in missing indexes - that can make a huge difference. Once you've done that and you're sure that the store interaction works decently, then you find, "Okay, this scenario is very slow compared to another scenario", then you can look into those hooks to do stuff. Probably looking at the store interaction and making sure that you've used all the means to optimize the queries and the store handling - I'd start there first. Make sure you have the proper indexes in the relational database before you go ahead and look into other things.

Eberhard Wolff: How is the project organized? What kinds of modules do you provide?

Oliver Gierke: There's roughly two separate parts to it. One is the relational side of things - that's currently only covered by the JPA module. There's of course a call module that most of the stores share, where we have a lot of the common infrastructure implemented. Then there's different modules per NoSQL store. We have Redis and Gemfire on the key/value side of things, there are Solr and Elasticsearch modules for the search-based NoSQL stores, MongoDB and Couchbase for the document stores, and then there's Cassandra, Neo4j support, and we've just recently added a module for LDAP - basically, it's just an extraction from this general Spring module that supports LDAP interactions.

Oliver Gierke: Not all of these store modules are maintained by the core Spring Data team. For example, the Elasticsearch and the Couchbase modules are community-based modules, which means that either someone from the store provider... Couchbase actually works on that module. Or it's even the community at large. The Elasticsearch module is completely community-driven, so it's someone from outside, not affiliated with Pivotal or with Elasticsearch or what have you.

Oliver Gierke: Then there's another module that we get to talk about later on that's taking care of the repository exposure via HTTP resources of Spring Data REST; we'll probably get to that later.

Eberhard Wolff: You do your releases in a release train where all of those modules are released at one specific point in time. How does that work? How do you do that?

Oliver Gierke: We have a release schedule that we communicate within the team, of course, but also to the community projects. We execute a release for all the different stores at the same point in time, so it's not that everyone does releases individually. One reason for that is we'd probably do releases every two days, just given the amount of modules. The other thing is internal compatibility. We're in a bit of a weird situation when it comes to versioning those modules, because a semantic versioning doesn't really work for us.

Eberhard Wolff: What is semantic versioning?

Oliver Gierke: Semantic versioning is basically the idea that you have a very specific structure to your version text - something like 1.5.7, and the individual digits express certain compatibility guarantees to the one using the API. So if I do just a bug fix release, I just increase the last version number; if I ship new features, I increase the middle number, and if I introduce breaking changes to the APIs, I update the first digit.

Oliver Gierke: The problem in that is what is a breaking change, and to who is it a breaking change? Because we usually try to not break user-facing APIs, and as that's usually the repository programming model, you can argue anything that doesn't break that is not a breaking change. There is however all the different NoSQL stores that release their new drivers or new versions of the NoSQL stores in a schedule that we totally don't have no control over. Let's say MongoDB does a new release in October - a new, major release - and then in January there's Neo4j coming with a new major release, and we sit in the middle and have to mitigate between the two.

Oliver Gierke: So we still use semantic versioning for these individual stores - whenever there's a breaking change, you need to upgrade your store, for example, for that particular module. We raise the major version number for that store module; we just combine all of those into a release train that then by purpose doesn't have any number associated with it. We use names of computer scientists, so that we have some leg room to communicate potentially breaking changes. That means for some user that's maybe using JPA, an upgrade in the release train can involve not changing application code at all, but at the same time if a store module was involved that shipped a breaking change, you'd probably have to do some work to upgrade.

Eberhard Wolff: And everything would be compatible if that is part of that release train.

Oliver Gierke: Right, that's the other aspect. That approach allows us to also make changes to the core module, because we control the other modules and we can do refactorings to that, introduce a new API and what have you. So we have control that within a release train all the modules work with each other - that was another very important driver for that release train approach.

Eberhard Wolff: Is there a difference between community modules and the core modules in that regard? Because so far you only said that the software is developed by other people, not people working for Pivotal, and that's really something that I probably don't really care about, but is there is a difference concerning the release train or anything else?

Oliver Gierke: We include some of the community modules into the release train, so the release train is not core modules only. However, I mentioned that explicitly because those guys are not paid working on that stuff; it usually means they develop a bit slower. It could be we're not involved at all in the development there. To some people it of course matters whether it's something they can come to us for support or not, for example, in the decision of whether they use a module or not. Other than that, there's no real difference.

Eberhard Wolff: So there isn't really a difference concerning the release train, or anything like that?

Oliver Gierke: No. Some blend in. There's also community modules that are not included in the release train - that's probably worth mentioning. We usually talk to people that are interested to join the release train for quite some time, and they work externally first, so that we make sure they're not just dumping code on us and then we have to ship a not moving project all the time. But it's not a strict separation here.

Eberhard Wolff: What are the experiences with the release train? Is it something that you would recommend to other projects?

Oliver Gierke: Interestingly, within Pivotal and within the Spring Engineering Organization, we've been the first ones to start such an effort, and other projects like the Spring Cloud project, for example, have started doing the same, for the same reasons. I don't necessarily think that if you're writing application code it's something that you should actually strive for. Of course, it creates a lot of coupling between the modules (organizational coupling).

Oliver Gierke: We are in a special situation here, because we need to mitigate the ever-changing world of NoSQL drivers that we don't control at all, and then make some guarantees to all the downstream projects like Spring Boot. We have to make compatibility guarantees to those, and that's where the release train helps - especially our downstream developers - a lot. We don't really do that for them, but it helps them quite a lot.

Oliver Gierke: A Spring Boot generation, like the current 1.5, is using the latest Spring Data Ingalls release train, and they can upgrade to a release train and potentially ship changes to their APIs that they need to implement. It makes it easier to communicate those changes to the downstream projects, that's basically it.

Eberhard Wolff: It seems to me that this is really a solution to a problem where you have a very tight coupling to a lot of projects that you either depend on, or other projects that depend on your project. It seems to me that it's a solution for your very specific problem, and in your case it's probably not possible, but maybe the more wise decision would be to avoid that coupling anyway.

Oliver Gierke: Yes, definitely. We work on Java projects that will be used inside the same JVM process as other Java libraries. That means that the compatibility guarantees have to be much stronger ones; if you work with different applications and one works with Spring Framework 4 and one with Spring Framework 5, they're just completely separated. That works fine, that should be the ideal that you strive for. However, if you have to interact with other jars or other frameworks within the same JVM, then this additional compatibility effort has to be taken care of.

Eberhard Wolff: We should mention that on the JVM, for some reason or another it's not possible to have different versions of the same library in the JVM at the same time, so that's why you can only have one specific version at any point in time, otherwise you get very interesting problems.

Eberhard Wolff: I did an interview for SE Radio not too long ago with Jürgen Höller; he is the project lead of the Spring Core framework. There are some extensions to Spring Core for Reactive. Can you say a few words about Reactive and what that actually is?

Oliver Gierke: You mentioned the interview where you spoke with him about the additions that we're going to ship with Spring Framework 5, coming in Q2 of this year. The Reactive story or the idea of being able to build Reactive applications with the Spring Framework is something that the team has had a specific focus on for the last year or even longer. The idea is you completely switch to a different model of writing applications. In Java, especially web applications are very thread-driven - multiple concurrent requests are bound to a concept of a thread in Java, and then that thread basically takes care of the entire execution.

Oliver Gierke: That means that if at some point you're communicating with the database down the stack, you basically wait for the answer of that database, and that makes the thread having to wait; that consumes resources. That sort of works, there's nothing wrong with it in the first place... It's just that you're not utilizing the resources that you have on that machine very efficiently. There is this idea of Reactive programming where instead of describing what you're doing step by step and potentially blocking on every call to another Java object, you basically describe the processing of that request in some sort of pipeline.

Oliver Gierke: That means you write code without really executing it - that's probably a bit of a weird thing to wrap your head around in the first place, but Jurgen explains it very well in the other podcast. You get to a different kind of execution model where you try to avoid having to wait for someone else like crazy. You want to avoid waiting for someone else.

Eberhard Wolff: So to sum it up, in the traditional model, an HTTP request comes in, a thread is assigned to that HTTP request; if it goes to the database, then that thread will be blocked, and eventually when the datastore reacts, an HTTP response would be sent out based on what the datastore returns. In the Reactive model I would just take the request, work on it, and then tell the database to do something. As soon as the database comes back, then I would react to that, and I wouldn't block a thread... It's just that every time something happens, like an HTTP request comes in, datastore does something. Every time something like that happens, some thread is assigned and works for that short period of time... Unless it would be blocked, and then it just waits for the next event and doesn't really block. So that's the idea.

Eberhard Wolff: What I found interesting and obviously from that description there is an impact on the datastore, because in the traditional model you can wait and block until the datastore reacts, while you can't do that in the Reactive model. What's the Spring data answer to that challenge?

Oliver Gierke: The challenge is actually coming from the typical flow of an HTTP request, for example, through the application. In the Spring Framework we've taken care of making the generic framework stack - and especially the web framework - working in that paradigm. So there's actually a completely rewritten version of the web framework that allows you to use the special API constructs that you use with Reactive programming, and allows the framework to not get in the way in the first place.

Oliver Gierke: If you're using Spring MVC, there's usually a class where you write a code that basically is invoked per HTTP request because the request matches some criteria. Then you do some kind of work on that, you invoke some service downstream, another application component that eventually speaks to a database. If you're able to get the Reactive invocation to exactly that point, to the controller, then the question comes up, "What do you do on the data side?"

Oliver Gierke: If you started issuing a blocking call to, let's say, a relational database at that point, you basically totally subverted the idea in the first place, because you don't need to do all the fancy Reactive stuff upfront if you then all of a sudden start blocking. So the first thing we actually need to do is get out of the way, in some regards. We were looking into how we - on an API level, on the repository programming model level - can make sure that you can use those Reactive types, and we can get invoked inside that Reactive processing pipeline that I mentioned before.

Oliver Gierke: That, of course, doesn't really help if that's, again, backed by some blocking APIs, so we were looking in the NoSQL space and we were looking for NoSQL database drivers that already expose that Reactive programming model, and it turns out there's one for MongoDB, there's one for Cassandra; the Couchbase guys actually have a Reactive driver already, so we took a spike on those, just for the reasons that there are Reactive APIs downstream available, and then basically reworked our repository and template internals to actually work with those Reactive APIs so that we don't get in the way in-between. There is a second milestone release already out there with Reactive data access support for MongoDB and Cassandra.

Oliver Gierke: That's what we've done so far, and we think we're probably going to extend for other datastores there, too.

Eberhard Wolff: You said that you're providing repositories that support the Reactive model - is there any change concerning the way that I write my code?

Oliver Gierke: There is a change - that's not Spring Data specific usually, but a change that you have to go through when you're interacting with Reactive APIs, which is that, as you said before, you're basically reacting to the appearance of events. You're never going to hand around a person (if you have a domain abstraction of a person) through your code, but you hand around the idea or the notion that an event for that person could arrive. In the Spring Framework implementation Reactor we distinguish between something that emits a single person, like a mono of a person or a flux, which is basically a stream of persons.

Oliver Gierke: That's something you can probably better relate to thinking of it as coming from the database. You're not going to get a list of persons back from the datastore, but an event stream of persons. That's the fundamental mind shift that's to be made here, and that of course leaks into the repository abstractions, because you have to change your method signatures in that regard.

Eberhard Wolff: You said that this is being supported from MongoDB and Cassandra, and both of them are NoSQL databases. What about relational databases?

Oliver Gierke: Yes, there are two aspects that make relational databases sort of not to well fit for that kind of thing. One is a very technical one, which is that all of the standardized relational data access APIs that are available, which is JDBC, the low level data access API, and the aforementioned JPA. They've been invented when that kind of programming model wasn't even thought of, so they're all blocking APIs, effectively. They return lists, there's no way to way to actually make them Reactive by just adding some bits and pieces onto them. That's one thing - the existing APIs do not really match that paradigm.

Oliver Gierke: The other thing is that we have the avoid the impression that you could just switch from imperative to Reactive by just changing APIs. It usually involves architecting your system in a slightly different way, and also using different technologies that then in turn support this Reactive space. The relational database spaces are usually tightly associated with the notion of transactions, which means creating a unit of work where you block resources, where you want to isolate things from others, then execute a bit of code and then basically free the resource again. That kind of blocking approach is something that fundamentally subverts the idea of Reactive, where you want to avoid blocking resources.

Eberhard Wolff: Are you saying that transactions and Reactive are mutually exclusive?

Oliver Gierke: You can sort of "make them work" together; you can try to get them as much out of the way as possible, but they're not really fitting the concept in this case. You're rather into more fine-grained interactions with the datastore; it's more of a pull model than a push model, even. Instead of saying, "Give me the five customers", you're saying "I'm subscribing to that event stream of the customers that match a certain criteria", and then you get notified by the store.

Oliver Gierke: That's a different interaction model in the first place, and the other thing is that for read-only access you can probably do something about it, but it's not the nicest fit. Even internally, we're still not quite sure what to do about transactions in general with Reactive. For the low-level data access space there are some Reactive drivers for some databases. That's the next topic - downstream the database actually has to support that way of working in the first place.

Eberhard Wolff: You're saying even though the JDBC standard doesn't support it, there are relational databases that would support non-blocking I/O?

Oliver Gierke: Exactly. For example, there's a Reactive Postgres driver that we could theoretically start to build something on top of, but then we're basically back to database-specific integration of relational databases, and the entire JPA space, for example the object-relational mapping area is not going to be an option for working with that anymore.

Eberhard Wolff: We've spoken quite a lot about the different datastores and the APIs that you provide, the repositories and so on. There is a different project that you mentioned, Spring Data REST, and that seems to be quite different. Can you describe Spring Data REST in a few sentences, and what it does?

Oliver Gierke: Right. Of course, it's a bit of the odd one out, because it's not a database connecting store module. The story behind is that we've seen that people started to build RESTful APIs with the Spring Framework - that's a very common pattern these days anyway... But they were using Spring Data and built REST APIs in a very canonical way, with repeating patterns, and we were starting to explore how much of those patterns could be generically implemented in a dedicated Spring Data module.

Oliver Gierke: If you're building your applications, you have your aggregate roots in that application, basically the entities that the repositories manage, and you have the repositories, so the question was "Can we do something about all this information and generically implement some HTTP resources on top of that?" One of the reasons that played into that is that there's a bit of an overlap - or at least a connection - between some patterns that you find in the REST world that are also described in a couple of very good books on the topic... Things like collection resource/item resource pattern. You have an HTTP resource that exposes, let's say, a list of customers, and then you have the individual resources for the individual customers. That sort of resembles to the notion of a repository being effectively a collection of entities, and the individual entities being accessible through that repository. So there is some match here, and that was the starting stone for starting that particular module.

Eberhard Wolff: As you said, exporting a repository as a REST resource seems like a great match, because at the end of the day both are about CRUD operations... So what is there even to add? Why would I even write my own REST handle anymore, because I can just have it created from you by Spring Data REST, right?

Oliver Gierke: Yes, that's actually a good one, because that's a very common misconception about the project. If you think about Spring Date REST as I just described it, you probably get to the impression that we take the datastore, we put that repository in between and then we turn that into HTTP resources, so what we basically do is we expose the database to the web. There's a couple of things missing, though.

Oliver Gierke: The first and foremost thing is that inside your application you have a lot more knowledge about the structure of your domain, the structure of your entities. You have that expressed in code, and then that is available in the database. Things like aggregate boundaries - that's probably something we could discuss even further, but the natural boundaries that your domain model has incorporated usually match nicely onto the structure of the resources, the boundaries of the resources, so there are a lot of things that you have to teach your object-to-JSON mapper manually that we can infer from the domain model you have.

Oliver Gierke: There's some raising of abstractions and translating those abstractions into certain traits of your API that you otherwise have to do yourself.

Eberhard Wolff: You're saying that the REST resource is more like an aggregate, and that's different from what I would have in the database; it's not just one table. Can you explain that a bit more?

Oliver Gierke: Yes, right. Let's say you have an order that has line items, and you have the customer that placed the order. On a relational database level there's no difference between the relationship between the order and the line item, because it's just a foreign key, and between the order and the customer, because it's just a foreign key... If you're staying with a relational database, at least.

Oliver Gierke: So if you went ahead and naively turned that into HTTP resources, you probably have a resource for the order, for the line items, yadda-yadda-yadda, but on the domain level you usually add an additional layer of abstraction, because the order usually includes the line items, and there's a composition relationship between the order and the line items, so you actually form an aggregate around those two, and the link to the customer actually becomes a relation (not in the database sense, but just generically speaking) to another aggregate.

Oliver Gierke: That's something that by using Spring Data and the repositories that's something that's implicitly expressed in the code, and we can use that to actually shape the aggregates around that. The fundamental idea here is that for an aggregate there are certain consistency rules that are most easily expressed if you can actually use the same shape if you turn that aggregate into a resource. There's kind of a nice overlap between the different concepts there, which is why we actually make that the default in Spring Data REST.

Oliver Gierke: We will inspect your aggregate boundaries and use those to basically create a defaulted representation of the aggregate for you.

Eberhard Wolff: And that's it, or is there anything else in this mapping between the REST resources and what the database provides?

Oliver Gierke: There's an interesting other aspect that's basically a consequence of us implementing those boundaries in the first place. If we take the order aggregate and turn that into a resource, and then basically decide to include the line items but not to include the customer directly - not embed it into the representation - the question comes up, "How do you actually refer to the customer?" There's an interesting concept in REST there, which is the hypermedia aspect, which in its most simple form is just like linking to different resources. We make heavy use of that in the representation creation for that, so you would get a link to the customer inside your representation of the order, so that a hypermedia-aware client could go ahead and just follow the link to access the customer.

Oliver Gierke: There's a bit more smart details to that, but on a high level that's all stuff that we infer from your domain model, which you wouldn't actually get if you just naively exposed your database.

Eberhard Wolff: What would your expectation be? Let's assume that I'm creating some kind of REST servers - would you assume that I would write all the REST handling using Spring Data REST, or is there something left where I would really do low-level HTTP REST handling all by myself, not using that library?

Oliver Gierke: That's an interesting question... I would have phrased it exactly the other way around. The thing here is that you can of course get away with Spring Data REST only - you have your entities, you have your repositories, you turn on Spring Data REST, you get a slight raise of abstraction here because we inspect the aggregates... You could be done with that, and actually that is sort of a turnkey solution. The thing is that you still on a very low level, in terms of added benefit, in terms of business interactions with the service you still need the client to know a lot of things about the internals, the structures, the semantics of the individual fields and what have you.

Oliver Gierke: You usually add more benefit through a REST API if you implement more high-level things in that API. An example would be indicating when an order is ready to be paid, like when a client has that Go To Checkout button in a mobile app, for example. Those informations, "When can I actually execute that call?" is not really communicated if you stick to that level of abstraction.

Oliver Gierke: The idea we have and the idea of the project is that we give you those low-level things that will be part of your API basically for free, or without a lot of boilerplate, to give you more time to think about those high-level interactions with the service, and then seamlessly integrate them with the parts that we provide out of the box, so that you as a team can really focus on the business process aspects of your API and add them to the stuff you get out of the box. This way you're basically freed to not write the low-level stuff, the repetitive collection resource/item resource things, but rather you can selectively implement the high-level parts of your API through Spring MVC, Spring Data REST extensions and so on.

Eberhard Wolff: What you're saying in a way is that a Spring REST resource such as an order might not just be a piece of data but actually might be a business process and might give some idea about the state of the business process, which is actually something that might not even be in the database or might not be obvious if you look at the data from the database.

Oliver Gierke: Right. It's just like the question, "In which situation am I allowed to cancel an order and in which situation do I have to pay the order?" That's sort of embedded in the data, but if you make the client just inspect the data to find out about that fact, you've basically replicated the business rule into the client. By using hypermedia, you can sort of keep that logic on the client, so that the client just looks for a particular link in the representation to find out whether it can do or cannot do some things.

Eberhard Wolff: So you could have the business process, and the links would actually say which transitions are possible from a specific state.

Oliver Gierke: Exactly. Raising the abstraction level quite a bit more, it provides more benefit to the clients, it requires the clients to implement a bit more protocol complexity as I call it, which means they have to be able to find links, to interact with them, but it allows you to keep the clients more free of business complexity, because it doesn't have to know about that -- if in that field there's a particular value, then I'm allowed to do something. If I have to bake that into the client, I won't be able to change it freely on the server.

Eberhard Wolff: The client doesn't need to interpret the data.

Oliver Gierke: Right, it can get away with less knowledge about the business, which is a good thing because you can then change the business more easily.

Eberhard Wolff: Excellent. So you are obviously doing an open source project, and I assume that all open source projects are looking for some kind of help. How can you help develop with these kinds of open source projects that you're working on?

Oliver Gierke: As I mentioned, most of our projects live on GitHub, but not all our bug triggers live on GitHub. The Spring Data projects in particular are usually using JIRA for a variety of reasons. That essentially means you can just register there, you can have an account there, which means you can file bugs. Whenever you use some of that stuff and you find something that's inconvenient, that doesn't work for you, please file a bug. That could be a typo fix in the documentation, that could be something that's really broken if you have a test case, or something - please report that stuff.

Oliver Gierke: It's one of the most direct ways you can get in touch with us. If you just find something and don't report a ticket for that, we don't have any other way to find out really, unless someone else does so. I speak to a lot of people that find something but never take the five minutes to create a ticket, but if you do that, that's very helpful.

Eberhard Wolff: I guess some people are just afraid that they didn't really understand something and what they consider a bug is really just a problem in their understanding. What do you suggest for those people?

Oliver Gierke: We use Stack Overflow for general questions, so I propose to just start with that... But what could possibly go wrong? In the worst case, you just file the ticket, we find out you're "using it wrong" and that's less work for us, rather than having to fix something. So don't hesitate to even file the bug if you just think you found something. If you're not offensive, then you won't get an offensive response for that. We really want people to interact with us, to file tickets and help moving things forward.

Eberhard Wolff: So what you're saying is it's fine to be stupid, and it's very important to speak up and to make people aware of the problems.

Oliver Gierke: Yes.

Eberhard Wolff: What I found interesting -- I always think that open source projects are really looking for developers, and the first thing that you said when it was about helping you is about bug reports and getting the stories from the users... But what about developers? Are you looking for developers, or is that not such a big concern?

Oliver Gierke: Especially with the move to GitHub - I mentioned that for the Spring Framework before, but it's certainly true for Spring Data as well - we've started to receive a lot more direct contributions that are usually associated to a bug that someone found, because it's just so very easy to do on GitHub. We're always happy if someone contributes something, of course. In some cases, it might be worse just issuing the bug report first, because as you said, you might oversee something or you might not be aware of some of the reasons something is implemented in a particular way... But before you actually start spending time on hacking something together - because it might, in the end, be a waste of time - generally speaking, whatever feels right for you... Reporting the bug first; if you think there's a good contribution you can make, go ahead and just do so. We're totally not restricted to anyone inside Pivotal.

Eberhard Wolff: Cool, thanks. Anything I forgot to ask you or anything you want to mention?

Oliver Gierke: I don't think so, it's been a pretty bright coverage. It's good.

Eberhard Wolff: Okay, thanks a lot for taking the time and answering our questions, and have fun with Spring Data, the project.

Oliver Gierke: Thank you for having me, Eberhard.

Eberhard Wolff: Thanks.