Defining Technical Debt
In the world of ecommerce, there are so many details that need to be set up for a merchant to start selling that it can be completely overwhelming. This particularly affects early-stage startups that may be running a direct-to-consumer or online business for the first time, where so the path to launch can be sidetracked with experimentation via trial and error. In part, this makes up technical debt, which we’ll define here as the liabilities incurred by the decisions about how a website will function.
Technical debt can come from a variety of factors:
- Messy theme customizations
- Lack of documentation
- Patchwork integrations
- Data fragmentation
- Pixel and tracking conflicts
- App overload
- …and many more!
It can also be exacerbated by team changes like a programmer leaving, switching an agency, a freelancer disappearing, or business transitions like a founder exiting or a merger occurring.
The technical debt paradox is that the faster you grow, the more you need scalable systems, but the less time you have to build them.
There’s really no such thing as a temporary fix, which means short-term solutions can create long-term barriers to growth. It’s very easy to acquire technical debt during good times because you’re moving fast and making money, but it will catch up to you, at significant cost. The irony here is that it creates a situation where the very actions that drive initial business success like personalized optimizations, rapid feature deployment, and adding apps to quickly solve for functionality gaps will actually become the limiting factors for continued growth if not managed strategically.
Warning Signs
There are several typical first indicators of technical debt, though some will be more glaring than others. These could include:
- Rising customer service tickets for technical issues
- Things are getting slower and slower
- Syncing failures
- Missing data in reporting, or strange gaps in analytics or attribution
- Realizing that onboarding new team members is getting harder
If you’re starting to spend meetings talking about all the things you want to do or wish you could do with your website but end up instead talking about all the reasons you can’t do those things, you may be mired in technical debt. But the definitive way to determine what’s really going on is auditing. Our approach is to break it down into four main categories – though this is intended for Shopify stores, it is still broadly applicable for other ecommerce platforms as well. These core audit topics include:
- Theme architecture and customizations
- App ecosystem and integrations
- Data flow and management
- Operations automation
We’ll go through each of these below to address the key points of each.
Theme Architecture and Customizations
In Shopify, no matter if you’re on Shopify Basic or Shopify Plus, the online store theme is the heart of the customer’s online experience. It is the repository of all of the frontend styling code, and is the database of templates that define what is displayed to whom. Common theme issues include things like code bloat, hacky customizations, and leftover code from products or features that have long since left the site. We recommend making a technical checklist for evaluating themes, which includes defining best practices for your team pre-development, and a flow for QA review at the end. If you’re not defining what the standards are for your freelancer, agency or in-house developer, worst-case they’ll have none and best-case they’ll try to intuit their own in a vacuum.
It should also be noted that focusing on code-level SEO best practices and functional ADA compliance will help raise the bar overall, because your level of precision must be greater and there are protocols to follow which intrinsically enforce standards.
Other best practices include always maintaining code repositories with version control and comments on changes, having a consistent build and deploy process, and never deploying on Fridays. These little rulesets will help maintain discipline against cutting corners and rushing things. However, there are still other things that can help:
- Knowing when to use render versus include in Liquid (as using the wrong one can disable the code rendering entirely
- Using local (in-theme) copies of libraries rather than pulling from external CDNs
- Scheduling periodic cleanups of old sections, settings, and templates
- Enforcing strict naming conventions of templates, sections and blocks
- Maintaining consistent labeling and version numbering of theme copies
- Backing up an extra copy of the theme before any app install or uninstall
App Ecosystem
Not every platform has a robust app ecosystem, but Shopify has consistently encouraged independent feature development via well-documented APIs, which has led to a bevy of options for any given desire for bells and whistles. However, there can easily be redundant functionality between themes and apps, or even two or three apps amongst each other.
App development is a global enterprise, and knowing where an app is from is important both for its customer service and server location. There are apps that US-based merchants use that are based in Australia or India…which is fine, unless the only customer service you have is 15+ hours away due to time zone, or if their only servers are based in those places. That means that each US visitor on the site has to wait for round-trip server requests to be fulfilled from across the world, slowing down the site as a whole.
Apps can also create significant contingencies, to the point of blocking checkout if the app goes down. Do you know which apps might create that level of blockage on your site? If not, it’s immediately time to find out. Creating a visual map of each app with the features used and the integrations it has not just with Shopify but with other apps and how it ties into critical business process is crucial in times of crisis. Knowing what other libraries or code it injects into the site, or whether it’s only a containerized extension, will help shave minutes if not hours off an 11th hour debugging processes.
For any app that you have installed, you should know the effects of turning it off or having it be unavailable, e.g. what’s the worst that could happen if the app went down? Then you should have a contingency plan ready to address it, like cloning a theme, removing the snippets or turning off the extension, adding a note for customers if applicable (“We’re sorry but redeeming loyalty points is temporarily down…”) and have this shared in a place where the entire team, from customer service to project managers and developers, can access it.
Another hidden liability in app integrations is the possibility of silent failure. Oftentimes, integrations between apps that are set once and then forgotten about will not necessarily send alerts, emails or other obvious indicators that a reauthentication is needed, or that data has stopped flowing. A gift note app might connect to your email service provider (ESP) but then reports connection issues within that subsection of the settings that you never really go into again because you’ve already set up that connection. Scheduling periodic reminders in a calendar or ticketing system to go through and check the status of important integrations between SaaS and app products is a way to prevent significant gaps in data or customer experience.
The greater the technical debt your store has, the greater the security risk.
Finally, from a security perspective, apps can often have their own user roles which can be forgotten about when team members transition. Removing Shopify admin access is not necessarily enough, so checking user permissions periodically to ensure that only those who need access have it will help ensure that technical debt is not incurred by missteps by those who should not have as much access as they do, and don’t have access for longer than is needed.
Data Flow
As your business grows, the need for granular reporting increases as well, particularly with accounting and especially if you’re designing or manufacturing your own products. This usually leads to using an ERP to become the main repository of information and the source of truth. But moving to enterprise-grade software means creating a huge amount of connections between Shopify (or your ecommerce platform) and the ERP (say, NetSuite) which need to be mapped and documented methodically.
Not only do you need to map what goes where, but also which direction the information will be flowing, and how granularly. For instance, you can decide that NetSuite will become the primary owner of product data, which will then flow into Shopify via Celigo, but to what end? What happens when your reviews app creates new metafields to store data in each product? Will you pull that info into NetSuite, or ignore it? If you don’t plan for it, it could get overwritten with each sync, causing confusion and data loss, or end up taking longer to sync because unnecessary data is being pulled.
Other common exceptions to standard flows include preorders and backorders, dropship items, handling multiple inventory locations, and the specifics of how returns and exchanges are handled. Each one of these will need a plan, but also documentation on what fields are pushed to and pulled from, what types of data are allowed (only integers, or anything in string?) and so on.
Connections for data flows can also create latency, as the more data you collect or need to transfer, the longer it will take. Oftentimes this is not a process that affects a frontend user experience, but in cases tied to marketing data (e.g. pixels collection information) it can. Rather than installing pixels in the theme code, consider a more streamlined approach through tools like Google Tag Manager, which allow a better consolidated overview with the ability to toggle items off easily and load asynchronously.
Operations Automation
Merchants spend a huge amount of time repeating manual processes to jump through hoops when it seems like no other good option exists. But now more than ever, thanks to AI expediting previously tedious integrations, there are options to connect otherwise disparate parts. Like carpentry, we believe if it’s repeatable then it’s usually worth building a jig for it because time is money and your team can add value elsewhere.
You can use Shopify Flow, Shopify Functions, the Mechanic app, Zapier or Make.com to connect Shopify with SaaS and online services that wouldn’t otherwise talk to each other. However, documentation and error handling and monitoring are key here.
Institutional knowledge is an incredible shortcut, but it can disappear incredibly quickly as well.
It’s not enough to have automation set up with the assumption that it will trundle along and keep working, because if the team that built the automation changes, any debugging or reasoning why something is set up the way it is will be lost. Documentation should cover:
- What the goal of the automation is, e.g. what problem is being solved
- Which direction data flows
- Who receives the error messages (at what email, in what Slack channel?)
- What are the required fields/datapoints versus extras that are nice to have
- What would happen if the flow stopped or broke, and what would the backup process be as a stopgap
- What do the connected apps/services do, and where do the settings relevant to the automation live in the external services
Defining Bottlenecks
There are key indicators of technical debt to track at every growth stage in your store, which may act as small red flags to raise the alarm and trigger further investigation. These include having questions like:
- Why am I having this many problems with my store?
- How come inventory control isn’t working and we’re always overselling?
- Why does the 3PL sync error so often?
- How much time is customer service spending on website problem tickets?
- How come our [Pinterest ads, etc] conversion count is off by 50%?
- Why do customers in the post-purchase surveys always have complaints?
If you’re finding yourself with these questions, or every launch seems like a total quagmire, you probably have bottlenecks due to technical debt. In the absence of formal audits, you can still define those issues concisely by writing out the answers to these questions:
- What apps or services are involved?
- What is the flow or feature that is broken?
- When did the problem start, as far as we can tell?
- How many users have been affected?
- Are there any patterns like times of day, days of week, or browser/operating system commonalities in user reports?
- When researched, is there any other indication online (e.g. in a support forum or Reddit) that this has happened to others?
- If a connected service is involved, does the vendor/app company know about this already? If not, sharing a report with them will be crucial to resolution
- How critical is this issue to the business? Rate on a scale of 1-5, from minor to show-stopper
The clearer the picture you can paint for the team, the more efficient it will be to debug and resolve the issue. And if a connected service is involved, the better support you’ll receive (and faster) when you submit a support ticket or chat with a rep.
Site Speed & The Holy Grail
Over the last few years, while AI was brewing and threatening to destroy SEO as we know it, search engine optimization professionals (“SEOs”) became increasingly obsessed with site speed as a potential ranking factor for Google, which released Core Web Vitals (CWVs) as a benchmarking system for frontend speed.
There are a few problems with this, namely:
- Google is continually redefining the benchmark levels for various metrics, meaning the goalposts are literally shifting as you try and optimize for them
- Google has publicly stated that CWVs are one of hundreds of ranking factors, meaning at best perhaps you are working towards a 1/100ths improvement
- Site speed varies depending on time of day, location, and dozens of other factors that are beyond the code of a site
- AI is rapidly rewriting the rules on what matters for discoverability
That said, it is of utmost importance that your visitors – potential customers and returning customers alike – are able to navigate your site with the feeling of snappiness. We call this the “perceptible speed” of a site – a user shouldn’t “feel” like the site is slow and feel left out as something loads and loads, waiting for an interactive state to appear. This is because the realistic difference between a dozen assets below the viewable area of the screen taking one second longer and thus perhaps scoring 20 points less in a benchmark is minimal. Chasing site speed for the sake of reported metrics alone is a fool’s errand, and distracts your team from working on more important projects.
On the flipside however, is that significant site speed issues can be indicators of significant technical debt – JavaScript conflicts between multiple apps, shoddy product or theme logic, lazy image filter implementations, or simply bloated and inefficient code across a dozen theme sections can all contribute to a noticeable lag that can detract from the conversion rate. Treating the disease and not the symptom is often the best way to proceed – if working on site speed is the means to an end of technical debt, then it is worth pursuing.
Site speed is something that every developer can chase, and strive for better, but will never totally catch up with. This is because marketing pixels, CDNs and server hardware loads, internet traffic, and a user’s own suite of Chrome extensions can all cause serious delays in perceptible and benchmark render times. For this reason, we call it the Holy Grail – in the strictest Monty Pythonesque sense of the word!
Establishing Technical Governance
Once you’ve had a chance to go through and audit your existing codebase, apps, connected services, integrations and automations, it’s time to lay down some rules to enforce a new level of data tidy-ness. This could include a few different approaches such as:
- Define (and enforce) a process for code review
- Set up house rules (e.g. no deploys on Fridays, don’t deploy when a key team member is on vacation)
- Create a QA hitlist and review process – starting with the developer, then the project manager, and the designer
- Establish a checklist to vet new apps/SaaS, including things like reading reviews, examining reference sites for implementation quality, validating compatibility with all existing major apps and integrations, benchmarking speed and load times, etc.
- Require a documentation/knowledge base review by your team before signing up for new services
- Schedule regular technical debt review milestones, quarterly or biannually
These processes should be simple to understand so they can be simple to delegate and execute – as long as they’re not simple to ignore. Holding your team and partners accountable to these basic standards will convey that you care about quality, performance and consistency and will create good habits that signal to incoming team members that “this is the way we do things” which will help block bad habits from being inherited.
Refactoring
In the best of times, refactoring can be done from a place of disciplined calm. In the worst of times, refactoring is putting out the fire while you’re actively on fire. Hopefully the more time you spend proactively cleaning and refactoring code, the less time you’ll have to spend doing it reactively. One of the keys to success in editing, rewriting and optimizing code is to prioritize what is actually important versus what is fluff – there could always be some snippet of code done better, but it’s the mission-critical pieces of your site that will matter. Our best practices include measures to rank importance to create a game plan as follows:
- Prioritize for maximum revenue impact plus staffing efficiency – what are the low-cost, high-impact ways the code or features could be improved?
- Aim for incremental improvement methodology by making a map of stopgap versus long-term actions
- Build in more testing for high-stakes changes – make sure your QA processes are stronger than ever, including evaluating marketing pixels and other often-overlooked aspects that silently error
- Make sure you’re refactoring in a dev or sandbox environment that has the same app stack replicated from your production site; Shopify doesn’t make this easy to do, but don’t skip the manual labor here
- Ensure the full involvement of teams including customer service, marketing, design, etc. – each department has their own priorities and aspects that they know better than others, and can flag potential new conflicts or issues quicker
- Set goals for the timeline of completion and don’t let it drag on forever – small, iterative changes that get deployed are better than comprehensive, monolithic changes that never get launched
You may encounter resistance from your team or partners on pushing refactoring forward. It can be considered boring, demotivating or even insulting (“my code is perfect”) to some, but don’t be deterred by pushback. Explaining the importance of optimizing processes to get better may be a more strategic approach than explaining that what you have is “bad,” “broken,” or “terrible” even if that may truly be the case. Conveying that setting aside time during quiet periods will make everyone’s jobs easier during hectic periods (BFCM!) will help get buy-in and keep spirits lifted even if it is a chore.
Future Proofing
If you’ve laid down plans, scheduled regular reviews, trained the team in what to do and what to avoid, and sanitized your data and codebase… then by all means, congratulations! But there are still some steps you can take to design systems that will scale better the first time around.
- Design systems that anticipate growth – Any major feature request should be considered in the context of what’s coming out not just 3 months down the line, but 6, 9 or 12 months too. Don’t paint yourself into a corner only to replicate or rip out work you did recently to accommodate things that could have been forecast
- Build modularity into your stack – Think about code portability, consolidating iterative feature builds, and how you can potentially be more minimalist in your approach to connected services
- Keep consistent QA + standards enforcement – It’s all well and good to set up a meeting and get everyone on the same page, but it has to be someone’s job to keep the team(s) on track. Project management of quality assurance and standards is critical
- Create a technical decision framework – Documenting how you make decisions about what could incur technical debt and what is a lean or efficient system will help you train and delegate those decisions to senior leadership in every department
- Balance innovation with stability – We all love a shiny new object, but realistically a lot of brand new tech can be a flash in the pan, never coming out of beta or going under due to financial insolvency. We once took over a website where the agency that built it went out of business, the #2 internal employee since the company’s founding left, and later the deployment tool itself went out of business. If you’ve built on reliable, tried-and-true technology, that team and tool volatility can hopefully be mitigated instead of being catastrophic
- Monitor performance with tooling – There are a lot of services out there to give varying levels of analytics + notifications
A hallmark of working on the web is constant evolution, and so you and the whole team must always be learning new skills. Especially in such a rapidly changing world of AI, you can’t rest on your laurels and assume that your site will continue running without significant, regular investment.
Hiring To Fill The Gap
If you’re not a super technical founder or manager, that’s OK. If you’re reading this guide, it’s likely that there’s no one on your team that immediately springs to mind who could step in and set things straight – you still have plenty of options. You could hire a fractional CTO or an agency like ours to do an initial audit and start to get these standards in place. You could retain a freelance senior developer with experience in auditing and “rescue” projects to make initial assessments and recommendations on what to prioritize, and get a quote to address the most critical issues first. Long term, you might need to add team members to develop your internal administration, and including addressing technical debt as part of the advertised job responsibilities will start to set the tone and provide another rubric to evaluate candidates.
As you fill in other types of roles, like developers and ecommerce managers, or vet agencies to partner with, you’ll need to evaluate whether they are likely to take shortcuts, brush off difficult problems, or skip checking their work.
You can teach skills, but common sense and caring for attention to detail are often deeply ingrained traits that will not readily change upon an employer’s request.
And speaking of hiring, if you find yourself so busy that you “just need to find someone right away,” you will find yourself at higher risk. Working with random freelancers in random places to do random small jobs on your site is a really fast way to get into technical debt. Any feature or task can be coded five different ways and still technically work – but it’s likely that only perhaps two of those five ways are objectively a “good” way to accomplish the job, and while the others might suffice for now, they can be done using strange, non-compliant approaches that are far more brittle and prone to breaking.
Killing The Vibe: AI and Technical Debt
One of the most impressive use cases of AI that can wow a crowd of skeptics is the so-called “vibe coding,” a transliterative process whereby a user describes the basic desire for a piece of software in simple and plain non-technical terms, leading to AI immediately generating a web app with scores of computer code. The speed and the general accuracy with which this occurs would have previously been inconceivable even a few years ago. But what initially sounds incredibly appealing has a significant hidden downside – technical debt.
Vibe coding is creating more technical debt than ever before in history – and no one is ready for it.
We have never made more technical debt so quickly, because prior to AI, the sheer challenge of programming and the bottleneck of having a finite amount of people who know how to code acted as a limiter. Within specialized or regulated systems it’s even worse, so much so that the US is still using original components of air traffic control systems built in the 1970s. This didn’t necessarily mean that proportionally speaking less technical debt was created, but it did mean that only so much code could be generated. Now that random individuals with a passing thought can create entire multi-thousand-line web apps on a whim, without any real production cost, we are in a fundamentally different era of the web.
We first saw some of the similar vulnerabilities with the explosion of the “internet of things” (IoT) devices around fifteen years ago. Suddenly, with more connected devices than humans on the planet, a myriad of security problems surfaced. Users who didn’t know they could or should set passwords didn’t, code that was written once didn’t get updated, and the full capabilities of the technologies regular consumers used was not fully understood. We are heading down that path again, where non-technical users can google enough to find out how to make an API key, not realizing that a secret key should never be hardcoded or stored within a repo. What’s a repo? That’s exactly the point.
If you are interested in vibe coding and want to DIY your way to doing cool things in code, we don’t blame you. But there are some bare-minimum basics to know before you publish AI-generated code:
- Never place any API keys, “secrets,” passwords or anything with your email or personal information directly into code which AI is taking and using. Best case you’re training the AI to reuse your keys, worst case you’re inadvertently sharing your confidential info. Keep in mind APIs can have fees too, so if someone starts using your key, you could get an astronomical bill for their usage
- Instead, prompt the AI to ask you for keys and store them as settings in a private environment
- Learn enough about code to know what language(s) the code is being written in, and where the code is being hosted. Ask for a list of libraries and dependencies used in the project
- Research (using AI if you want) the common vulnerabilities of that language, those libraries and dependencies
- Learn how to download the codebase and store it securely on your local machine, and ideally learn how to use version control software like GitHub
- Do you feel like you’re becoming a programmer yet? Even if this levels up your technical skills, there’s a long way to go in understanding how systems function and why
- Be extremely cautious about how your vibe code is taking in and storing user information, and what security measures it it using to do so. If you’re making a members area and getting logins, passwords, names, emails, etcetera that you actually need to understand your liabilities and have a privacy policy and know where and how that data is being stored
These are just the tip of the iceberg really, but not doing at least these simple things and publicly publishing your project and inviting outsiders to access it can make you vulnerable to intrusion, exploitation and potentially even lawsuits.
Wrapping It Up
By now you may be overwhelmed, not sure where to turn and worried about how bad your site might actually look under the hood. Fear not, these are all areas that can be addressed methodically by a competent team. Consider hiring a senior developer, fractional CTO, or an agency like Hidden Gears to dive in and make professional assessments and strategize how best to proceed. Fresh eyes and a fresh perspective can go a long way to solving or refactoring problems that you may have stared at for a while.
We also recommend taking a considered, minimalist approach to the use of AI in production environments in the near term, as the technology is changing and morphing constantly without much sign of maturity or long-term stability. Using AI in a small scale, testing against alternatives, and deploying carefully is the best approach, even if it feels like everyone is jumping on the bullet train to the next big thing.
Finally, don’t forget what you came here to do – to figure out how to optimize and grow your online business. Framing the “why” will continue to be important to set a course for your team and collaborators and ensure that you can have a prosperous future. We’ve worked with hundreds of Shopify merchants who have been able to weather bad storms and ride great waves with a solid backend and smart practices to keep their technology stack nimble along the way.