higher efficiency and performance at DocuSign Discover on November 20th. APIs and tools that span the entire contract lifecycle DocuSign Discover equips you to integrate, automate, and optimize your agreements. Register for free at developers.docuSign.com.
Hello, everybody. Welcome back to the Stack Overflow podcast, a place to talk all things software and technology. I am Ben Popper, the head of content here at Stack Overflow joined as I often am by my colleague and collaborator, Ryan Donovan, editor of our blog, veteran technical writer. Hey, Ben. Ryan, tell me a little bit about who we're going to be chatting with today. Today, we're going to be chatting with Jonathan Schneider, co-founder of Martin.
who also worked on the open rewrite project and Netflix. And modern is an extension of that. Okay. So we're going to be talking about automatic refactoring, particularly for Java, I believe. Very cool. Jonathan, welcome to the Stack Overflow podcast. Thank you, Ben, and Ryan, pleasure to be here. So we always ask folks first, give us quick flyover. What was the first computer you ever used? What got you into writing code? And how did your journey within the world of software and technology lead you to the role you're at today?
I grew up in a very rural area in Missouri. My first computer came fairly late to me. It was a Pentium II, I think. It was my junior year of high school. So I think I was a bit late to the game.
just based on where I live. But I took a computer science class with a friend in high school, AP Computer Science class. And the teacher was ideal. He dropped a C++ textbook on our desks and said, let me know if you have any questions and wouldn't sat down at his desk. And that's basically how we talked. So I mean, it just, it just going through the book, writing one thing after another and
If you have any questions, go up and ask them. I always felt like software development was like free engineering. It's like supply free, at least, you know, you can just keep building and building and building. And so really attracted me. So how did you get involved in the open rewrite project? I had joined Netflix back in 2000.
I think 14, roughly something like that. So I was working for engineering tools, a central team. We were like it or not responsible in some ways for moving the company forward from Java 6 to 7 at the time, 7 to 8, trying to retire old libraries, security, vulnerability, repair. And at the time, Netflix had a special freedom and responsibility culture. It was part of their culture and equipment.
a central team could impose no constraints on what product engineers did. So we spent a lot of time doing reporting, trying to provide context to product teams like why they weren't where we were wanting them to be. And it resulted in pretty much no action on the part of development team. So we're going around asking, what could I do? What could I do to make you move forward? And they'd say, well, do it for me. Otherwise, I've got something else to do.
It's kind of almost sarcastic way, but I think it's representative of how a lot of product teams are now. They're just kind of groaning under the pressure of a lot of feature work, require them at the same time trying to deal with modernization or security vulnerability repair. Yeah, talked about moving from the various Java releases. I remember any company that I worked at that had to migrate from Java was
three or four full releases behind because it was such a bear. Yeah. Why is that such an issue for teams? It's just a ton of manual work ultimately. And there can be just a lot of context necessary as well to accomplish it. With open rewrite, we've written what we call recipes, which perform individual units of transformation. It could be something as small as changing a method to something as large as like a whole library or a language level version update.
And if I take one of those, for example, like a spring boot two to three recipe, that's an open source right now, it has more than 2,300 steps in that migration. So we're effectively asking developers to know all of those things. And oh, by the way, if you missed one of those things, you know, you broke it. What's wrong with you?
Right. It's hard. Can you step back for a second and just sort of say like, what is the genesis of open rewrite, you know, as a project with a name and with a, you know, a working group behind it? So the problem was clear enough to us. We were trying to get Java 6.7, et cetera, like that. And started looking around at what refaturing technologies that were available at the time. So Uber had worked on something called Piranha.
Google had worked on something called error prone, which had a refattering component called refaster. And most of the solutions out there had some to us like unacceptable constraint. In the case of, you know, Uber's piranha, it saw the syntax tree, but not it didn't know everything the compiler knew about a particular method call. And that wound up being
important when we were doing, like, logging library migrations. A lot of logging libraries and syntax look very similar. They're not the same. You want to, like, narrowly identify a particular library when you're making a change. There were some that, like, Google builds a technology like this. Google is very
controlled environment. You know, they have Google Java format, mono repo, everything kind of looks the same. And so it's sufficient to make a tree level change and then print it back out to text and it's always going to look the same. And Netflix, because of that freedom and responsibility culture, it was absolute chaos. There was a lot of different styles of writing code. The tooling stack wasn't consistent.
And so we had some sort of base principles that were important to us. Like the change must look idiomatically consistent in the context of the repositories being inserted into, and there can be no false positives. And anything short of those two things that developers will just reject the change. So were you the creator of the open rewrite? Yes. Yeah. I started that project back then and you know,
15, something like that at that point, we had identified the problem and started working on that solution. So recognizing that we needed everything the compiler knew about, we built really by integrating directly with the compiler. The compiler produces an abstract syntax tree and then type attributes it. And so we just kind of used that data and started layering formatting on top of it and other things in order to produce accuracy.
Stepping back for a second, Ryan and I have been on many podcasts and been around for many developer surveys. What is your perspective on the hate that Java receives? Does it deserve that hate? And is some of working with it in a way that's more pleasant, sort of integral to what you're doing here?
finding this pain point, being able to solve it, doing it in a community-driven way. Because Java's obviously been around forever and continues to be in an enormous number of applications at enormous companies, so it's not going anywhere. What are some of the common big games that you hear that summarize?
You know, builder, builder, factory, factory, builder, factory, like, Ah, yes. And then I think some people complain about being forced into object oriented as hard as it is. And there's some folks who see a 20 year old language and are like, it's old.
Yeah. I suppose JavaScript is too at this point. I think it very much depends on the framework you're using. So the long naming convention was typical of older versions of Spring and so forth. And I think there's always new frameworks coming out that challenge those assumptions.
I think the language has evolved with the times somewhat conservatively. They're not then it's never the first language to introduce a new feature tends to kind of sit and wait and, you know, see a pattern sort of fully mature before adopting it, which I think has led to a long history of non backwards incompatible changes to syntax and so forth. But in terms of like market, I think, first of all, at the time, you know, most of Netflix's stack was built on job, at least the back end, obviously.
And I think in enterprise, mid and large markets, like C-sharp and Java still dominate the back end. So when we went to kind of build something for maximal use, we started with something that had good market share and then kind of move from there to other languages. Yeah. So I want to ask about the sort of transition between working on this open source project at a big company and then using that as the basis for a separate company.
Yeah, so I had actually left Netflix somewhere around 2017 and joined the spring team. So I was part of the problem in that respect, I guess, making the breaking changes. I started a project called Micrometer for application metrics instrumentation and then was sort of working up the value stack in the Cloud Foundry portfolio. We were trying to add a product called Spinnaker, which was
an open source, multi-cloud continuous delivery solution from Netflix and Google and others, trying to add that into a pivotal suite of, to its paths, I guess. We were working with the Moderna co-founder, Olga Kunz, and I were working as kind of the technical and product leads of this at large enterprise. JP Morgan, we were working with at the time, Home Depot, Fidelity, these kinds of organizations, trying to try out the idea of Spinnaker with them.
And what we kept hearing as we were teaching advanced continuous delivery concepts was, well, I'll talk to me in a year when I'm done moving spring, we wanted to, or this to that. There was always some migration or some vulnerability repair that they had to deal with, and it basically sucked up all their time. So having seen that enough, we thought, maybe we should go take that technology from Netflix and go do this instead, because it's the same problem we heard over and over again.
Yeah, you know, part of what I was excited to chat about in this is it originated in Netflix. And as you mentioned, a lot of large enterprises do use Java. So last November, Amazon announced Amazon code cute, which integrates open rewrite. And Andy Jassy, who's now the CEO of Amazon posted on LinkedIn that this enables them to
Form 17 Java upgrades and record time. I could list that all the numbers here in terms of the estimated savings and the hundreds of millions. How much of this is promotional? How much of this is real? And how much of this can be applied at other large institutions? I think the change that was produced.
in by Amazon Cucote transformer was all open rewrite. So it's a rule based system underlying that. I think Amazon was able to do that so quickly because that open source Java 8 to 17 recipe has been so battle tested. It's so many different organizations over the years. That's another one of those recipes. It's got hundreds of steps. And so it's accrued a lot of edge cases over the years and has made it super reliable. And so part of a recipe definition is how much time savings would
be realized by a single application of that recipe in the code. The numbers we see are usually something in the several hundred thousand hours. So we've been talking a lot about tech debt, and this is obviously a huge source of tech debts, these upgrades. Absolutely. You think, you know, best case scenario, you're able to get recipes for every upgrade, every security fix.
does tech debt still exist? Well, tech debt itself is an interesting, interesting phrase. We used to think of tech debt as something we imposed upon ourselves. We took some sort of shortcut architecturally or otherwise. And so we're going to have to pay that down later. There's this other activity, which I think is, you know, we are all relying on third party and open source components.
that evolve at their own pace. And if you don't keep up, then something breaks. I think from the perspective of the business, they don't see those as distinct oftentimes because the request is the same from the engineer, the engineer saying, hey, I need to take time doing non-featured development in order to do X. And I don't really care whose fault it is. All I hear is I'm not doing feature development.
So I do think the vast majority of technical debt if you want those together is coming from this third party and open source like keep up or your app breaks ultimately. That stuff I feel like is solvable if we ensure that
framework authors as they make breaking changes, provide recipes that help their downstream consumers adopt them more quickly. Probably the most mature engineering organization I've seen in terms of dealing with this, actually a small insurance company, they called it maintenance as opposed to technical debt. So there was technical debt tags and there was maintenance flags or tags on their issues.
And whenever something was tagged, technical debt, that was something I imposed upon myself. Maintenance was something that was being imposed upon me from the outside. It's more intuitive, I think, to the business too. Obviously, it's something you have to do with your car, your home. It's an asset you own and you can be maintained. Yeah. Technical debt sounds like you overextended yourself, like you got into trouble. You know, why are we in debt here? Whereas maintenance sounds like, yeah, everybody does maintenance. This is an understandable cost. That's right.
It's interesting in the dev survey, you know, there's sort of these two contrasting data points. The thing that developers find most frustrating is technical debt and the things that make developers happiest is clean code and environment. And it's sort of like, man, if only we could find a way to square these two, but you know, that idea of maintenance and the broken windows theory and just like getting everybody to kind of bake a little bit of this into every single day does seem to be more effective than like turning technical debt into a once a quarter sprint or whatever. Right.
The other element of that too is if you strictly look at it as something like, if you're doing it by hand, if I were doing say the Java 8 to 17 migration by hand, I would do the bare necessity of what was necessary in order to make the app run on the next runtime. But I wouldn't necessarily go back to every, you know, if statement with an instance of that then casts, you know, behind it and like uses the new idiom, which is like, I don't need to do the subsequent cast anymore. Or there's a better string format method now, or there's a like,
But when you have a recipe to do not only the bare necessities, but also adopt the more common or more modern idioms, suddenly, and maybe it goes back to the point of hating Java, it looks better. It looks more modern. It looks less archaic.
ultimately. And you don't have to sort of choose between the optional things and just the required things. How much of hating Java is just the aesthetics, you know? It looks bad. The old stuff looks really bad, right? Yeah. So looks better now. Yeah. I mean, I took some of my early computer science classes in like Java one. Yeah. And back then it was a lot of interesting features, but obviously they've kept adding interesting features. Yeah.
And you say, you know, sometimes it, it'll convert it to a more idiomatic, more updated like string builder, whatever. Yeah. Do people ever get mad about that? Absolutely. So if I just try to do Java eight to 17 and said, the way I'm going to do this is I'm going to mask PR to the whole organization. And everybody should accept this PR.
I think PRs tend to be viewed from product teams as like unwelcome advice coming from an in-law or something like that. It could be good advice, but you're just going to look for a reason to reject it. You're just going to find a way. And the things people will often pick on will be kind of stupid things like, oh, I don't like that string format thing. I heard one product team really pushing back hard on the fact that they could no longer use the constructor for an big integer. You had to use big end up value of.
It's like, guys, this is deprecated. It's going away. You don't have a choice. This is going to be. But because the recipe is composite, they can just pull out that one recipe, run the rest of it, get everything merged. We'll come back to this point later on. And so moving product team by product team and asking them to pull this change as opposed to being pushed to them through mass PRs, winds up being the thing that gets it over the line at the end. It's like a social engineering problem more than a technical one ultimately.
You mentioned moving from Java 8 to 17, have you done any studies on how much code has changed between a massive jump like that? Yeah, I see it all the time. So recipes, they also admit structured data in addition to making changes, and so we can see.
every file that was changed. There was one banking application we moved, it touched 19,000 files. And that's just one, right? So I mean, there was, I think another 60,000 repositories after that. I can't have a flashcard like experience where I'm like, what do you think about this change? What do you think about this change? What do you think about this like times 19,000? It just doesn't make any sense. And so that's where it becomes essential ultimately that I think a provably accurate system is the one making the change at the end of the day. However, that original recipe was done.
developed. So is this just Java based right now? We started in the JVM, obviously Java and then added Ruby and Kotlin, then added more infrastructure's code type things. So YAML, JSON, properties, XML, Terraform, Terraform actually wound up being a very difficult grammar as it turns out. Went back and added COBOL, JCL, copybooks, about 31 related COS technologies.
that wanted to be a lot harder than I expected as well. But then as we kind of came back to the more modern languages, we wanted to add C-sharp, Python, JavaScript, Ruby, et cetera. What we found is that these, I guess what you could generally call C family languages, shared most of the same LST structure, which is both intuitive and counterintuitive, I guess. Like they all have if statements, they'll for loop, they'll method declarations. The structure is very similar for the most part.
They look very different in text in terms of how those constructs are printed out, but structurally, they're very similar. And so when we wrote the first C family extension for C sharp,
The C-sharp LST, the loss of semantic tree model, actually extends from the base J1. So the significance of that is that many of the recipes we originally wrote for Java just automatically work on C-sharp. So Boolean simplification, change method name, a lot of these like core building blocks. Actually, you want to end up working on both sides.
We wanted to add a recipe authorship experience in each language as we moved along as well. And so we wrote this technology that allows us to transfer parts of the LST over the wire so that a C-sharp developer can write a recipe and C-sharp, a Java developer writes a recipe and Java, et cetera. At this point, we finished Python a month ago, C-sharp a couple months ago. We're just about done with JavaScript, TypeScript.
each of those extends from Jay, so that there's this cross-language reuse. I haven't seen that much before in my history and software, so it's kind of an exciting, unexpected benefit. Do you ever run into sort of conflicting recipes when somebody's, you know, upgrading Java and Spring? Confuiting in the sense that they would make incompatible changes somehow? Yeah, they're both trying to change the same things or change them in different ways.
I mean, the great thing about a recipe is, like, if something beat me to a change, then by the time I see it, I just have nothing to do. So I just don't do anything. So we'll actually chain these recipes often. The Spring Boot 3.3 recipe chains, the 3.2 would change a 3.1, 3.0, 2.7, 2.6, et cetera. So it doesn't really matter whether an application portfolio is starting on 2.6 or they're starting on 3.1 because
Everything below the version of Mon just doesn't do anything. So you can kind of take a heterogeneous set and move it forward together. What are your thoughts on capabilities from Gen AI, which now writes, reviews, debugs, documents, code to play a role in making some of what you're doing easier to help alleviate some of these pain points to be a compliment to basically what you build? I think the first and most important point is, and I'll state my belief on this and time may prove me wrong, but
I think that the code is text or even as AST. And so AST means lacking type attribution that the compiler knows about is an insufficient set of data for a model to do large-scale impact analyses or transformation. And the easy example of this, I think, is logging libraries. We go back to this. There's like five Java logging libraries. They each look very similar, log.info, log.warren, et cetera. How do I know which logging library I'm looking at?
I may have inherited that log field from a base class that comes from a binary dependency. So nowhere in the text of the code is there any reference to what library I'm using right now. So I think for any sort of large scale change, model needs to know things that the compiler knows. So it's a data problem, I think, more than anything. I think models will always be very data hungry. And to me, I feel good about our long-term prospects because I feel like the LST is the data that it needs to make these sort of decisions.
There's been another evolution, though, or another change recently in models that I find very interesting and honestly very easy for us to integrate, which is that I feel like there was this rag paradigm, retrieval augmented generation, get some data, stuff that's effectively prompt stuffing, stuff it into the prompt. This tool-calling, tool-function-calling paradigm, which is arisen in the large foundation models in the last several months,
Inverts that responsibility. So as I'm asking a question, I say to the model, hey, here's a list of tools and all their descriptions. And when you want to ask a question about binary dependencies, ask me when you want to ask a question about.
deprecated methods, ask me, and so on and so forth. The model was the one that takes a human provided prompt and decides when to call various tools in order to supply data. And interestingly, we found that recipes that produced those data tables are sort of trivial tools to bolt onto a model for it to answer questions about the codes pretty deeply.
I think that's what will continue to be true, is that data will be king and however you provide tools that help the model. Inferencing from that data will be the interesting solutions. I expect that the models themselves are interchangeable going forward and even today. But what do you see? I'm curious. Does that align with what you're thinking or seeing? I would agree that for the kind of
large-scale organization-wide transformation. You're talking about people are not trusting these models to do that. Right. And that the more context they have on your code base and your documentation and your
whatever you want to call it, knowledge base of your company, the easier it is for these models to be a benefit as opposed to writing code that seems good at first, but takes just as much time to un-writer debug later. And that their central application over the next
12 to 24 months, we'll be removing the toil work that developers have to do in addition to writing code. That is pretty simple, whether that's writing unit tests or doing the documentation or doing things like that. But yeah, I guess, Ryan and I have talked about this, like what do developers want? Well, they want to maintain agency and they want to get into a flow state and write code. They don't want to be a copy editor for AI agent. But
will AI agents get to a point where they're good enough that people want to replace developers? I mean, I hope not, but I can't say because they're getting better all the time. Yeah. I think when you provide those examples like unit test writing, documentation, certainly the kinds of things that like a GitHub co-pilot does for me in the IDE, all those I feel like are authorship experiences on some level or another. It's kind of synthesizing that new code.
And it's been, I use co-pilot every day. It's been my experience that it makes me, I don't know, 10% better, 15% or whatever number you want to throw at it. The IDE, rule-based refactorings made me, you know, dramatically better as well when they kind of arose. And so it's in some ways a way of writing more net new code that then needs to be maintained on the back end. And if anything accelerates this existing problem that I unwritten, they set out to solve, yeah.
Yeah, I've written about this and I like the way you phrase it, which is like more net new code is not necessarily a good thing. Maybe it is. Maybe it's not.
cleaner, more robust code is kind of net new, a good thing. And so better documentation is net new, a good thing. And so if you can apply it in those areas, you can feel confident for the time being that you're getting an ROI, whereas net new code, we can't be certain yet. And I'm sure there'll be tons of research and academic research. Maybe we can trust in research from different folks in industry, they'll be a little biased towards whatever their solution is.
I heard something interesting as well last week, and this was at a large financial tech innovation forum. Privately, there was a lot of talk about developer productivity code assistance. One of the executives reacted really badly. He said, I don't want to talk about developer productivity. If you give my developer back
you know, two hours of the week, do I believe they'll spend that on themselves or on the business? You know, and I thought, well, that's a really dark view. There's some truth to that. Like it's very hard to measure value to the business when it comes to that kind of thing. Whereas I think it's a bit easier in some respects, like a large scale transformation is like, you know, I'm replacing Oracle with Postgres or I'm replacing that's like easier for them to understand the value of. Yeah, but I mean, isn't that
that version of productivity where the developer, you know, gets two hours back and they go play too, call of duty for a little bit and refresh their brain. Like, I know it's a very dark. Yeah, but I think that's just as valuable that they get a break. They come back refreshed. They're able to think about these hard problems. Absolutely.
I mean, this is where you need a great CTO or CPO or CIO on the senior leadership team to make the opposite case to this person who has a dark view, to give that glass half full version of what that means for their developers.
All right, everybody, it is that time of the show. Let's shout out someone who came on Stack Overflow and shared a little bit of knowledge or curiosity in doing so, helped the whole community. Awarded four hours ago to Benjamin Aiken, a populist badge for answering the question, how to refresh an association after a save in Rails. Benjamin's answer was so good that it was given more efforts than the accepted answer and hence the populist badge.
Shout out to Benjamin, and thanks for contributing some knowledge. I am Benjamin Popper, not Benjamin Aiken, and I'm the director of content here at Stack Overflow. You can always find me on x at Ben Popper. If you have questions or suggestions for the show, if you want to come on as a guest or
here as talk about a particular topic, email us, podcast at Stack Overflow. And if you enjoyed today's episode, the kindest thing you could do would be to subscribe and leave us a rating and a review. I'm Ryan Donovan. I'm definitely Aiken. I edit the blog here at Stack Overflow. I can find the blog at stackoverflow.blog. And if you want to reach out to me with comments, suggestions, tips, you can find me on LinkedIn. I'm Jonathan Schneider, co-founder at Moderna.
m-o-d-e-r-n-e, or you can find us at modern.ai. I'm easy to find on LinkedIn and Twitter. If you haven't heard of open rewrite, give it a try. Very cool. All right, everybody. We'll put those links in the show notes so you can check them out. Thanks for listening and we will talk to you soon.