Advocating for Major Refactoring focused on Technical Debt.
Problem
I was the head of engineering at a series B startup, about 25 engineers; we had a data platform that helped retailers connect with their suppliers. The company was started about 10 years prior, and the main product went through several iterations. The team was very senior and had a lot of experience in the domain, but before I joined, technology did not have a seat at the table. The business also pivoted, so the original architecture was not designed for our current use case. This made development slower and more brittle than it could have been and doing some common tasks involved making changes in at least six different places, which caused a lot of hard to find bugs. It was becoming clear to all of us in engineering that there were multiple things we could do to evolve our architecture. However, those were large changes that would take several person and months' worth of effort for the whole team.
Actions taken
First, I had to formulate a more coherent plan and sell it to the CEO and my product counterpart. The CEO was already aware of some of the issues, but it was up to me to create a plan. I wanted to present multiple options. We took several weeks and had multiple brainstorming sessions with our senior engineers / architects, and generated several options. Some of them were very expensive, amounting to a major rewrite (which I knew was a non-starter).
I pushed the team to minimize scope and come up with ways to reduce risk. Eventually, we came up with several proposals, one of which could take about two months only (vs six months from other proposals). It was not quite as impactful, but it was much easier to mitigate risk, and it did not preclude us from doing more in the future. We also did some prototyping for a few key pieces.
During the process, I kept the CEO appraised at a very high-level, without going into too much detail, since we were still iterating on a plan.
Once we had our proposal, including different options and estimates, I presented it to the CEO and Head of Product. We debated the different options, benefits, and costs. I stressed that at the end of the refactoring we would be able to deliver some features much faster and with higher quality. Additionally, as part of the refactoring, we would develop a new way of managing our customer deployments, giving us more flexibility, another win.
A key variable was that because our customers were online retailers, we did a production freeze from middle of November until the end of the year to minimize any potential impact of changes on customers. Since our team was under less pressure than usual, and not releasing software at our regular cadence anyway, this gave us the time we needed to work on the complex refactoring that would otherwise block new product development.
Ultimately, we decided to move forward with the proposal that I recommended. It was estimated at two months of development, focusing on only one specific part of the system that had the most quality and performance problems. We also worked with the business on the migration plan, classifying customers into tiers based on complexity of their deployment and volume of data. Migrating some of our less complex customers early gave us more confidence in the new architecture, and helped us find and resolve issues.
While there were a few things that we didn't anticipate, mostly during the final testing and data migration, causing modest delays, overall the project was a success. Once the major part of the refactoring was done, majority of the team could resume feature development, while some of the engineers were still finishing migrating some of our larger and more complex customers.
Lessons learned
- Make sure all the stakeholders (product, business partners) are involved throughout the process, including early phases. Your role as a leader is to make sure everybody understands the moving parts.
- This is very much of a multi-dimensional problem, make sure you have a clear process for generating options and making this decision. Generate a schedule for the decision making process, including deadlines for key milestones.
- Work to communicate the benefits very clearly - both the engineering advantages (improved velocity, etc), and visible business benefits (improved quality, etc)
- Give your team time to think through the potential options. Work to generate as many possibilities as possible.
- Prototyping of some approaches can be very helpful and will ultimately save time.
- We underestimated some of the deployment concerns such as data migration, and even had to create some additional tooling for migrating one tenant at a time. This was an unknown unknown, but ultimately it gave us more flexibility in working with our customers.
Be notified about next articles from Jean Barmash
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.