Guiding CTO Principles

Many of the principles and values identified in this page come from Lean-Agile, the DevOps movement as well as the industry movement towards Microservices and then Unified Analytics from Databricks.


A good organization makes decisions backed by principles. Operationally, we follow the principles of Lean-Agile, and that is a definitive guide to our decision making processes. Additionally, our architecture is driven by microservice principles to make our systems and software scalable and extensible.

Our organization approaches product development in a two-factor approach (culture & architecture), which is meant to improve organizational agility and allow it to adapt quickly to changing customer needs and Y-axis scale for each of its services to handle growth. The organization’s product development culture embraces the mantra “early, and often” to propel its products into the marketplace at an incredibly rapid rate, due to its culture and architecture.

Unlike X-axis and Z-axis, which consist of running multiple, identical copies of the application, Y-axis axis scaling splits the application into multiple, different services. Each service is responsible for one or more closely related functions and will independently scale. Suggested Reading: The Art of Scalability.

What do you mean by “early", and "often?”

Creating software products is extremely complex and risky, due to its subjective nature and the inherent complexity of creating software, which its components can independently and dynamically scale, based on demand, while remaining extensible enough to allow easy refactoring and adding new features with the agility the customer demands. As such, the secret sauce to creating software products is delivering them to the customers as early as possible, and as often as possible.


"Early" goes to getting feedback and learning as quickly as possible, so we can learn from our mistakes and write better code, make our features work better and deliver the right software features to the customer. Early also goes to enabling the revenue or enhancing the customer retention or NPS of a product as quickly as possible. Things like an MMF (Minimally marketable feature) in a product development lifecycle adds value because it contains the minimum MARKETABLE required feature set of “must-haves” in order to enable the feature in production and realize its revenue, retention or NPS gains.


"Often" goes to deploying and releasing as often as possible, without the constraint of any timebox or timing sequence or process. Delivering working software into production with high agility and confidence of quality so we can reduce product development risk and enable early learning and discovery. The deployment must be decoupled from the customer release in order to achieve this as well as a culture and supporting operation of a true DevOps program that is without operational silos and non-cross functional teams. Suggested Reading: Lean-Agile Software Development: Achieving Enterprise Agility.

Culture is not just having a happy hour, throwing parties, being happy and friendly with those around you, and the workplace being a relatively decent place to spend the majority of your time Monday through Friday. Culture includes those things, but it is also about how we work day-to-day and the values we share. And work ethic goes directly to our culture. Culture is backed by our principles and values, and the work habit traits we share among our co-workers in a system based on trust and respect because of shared leadership and the professional community all of us nurture within the organization. Not only do we operate according to these principles, but they are of the utmost importance when choosing to bring a new team member into the organization.


The following principles are core to our employee to employee professionalism and work ethic. It is these shared principles that make us exceptional to one another.

Fundamentally, we value:

  • People who are professional.

  • People who collaborate.

  • People that respect their work.

  • People that don’t make additional work for others.

  • People that try to reduce efforts for others.

  • People who ask for help.

  • People that understand, when you help others, you help yourself.

  • People who are willing to take small steps and get feedback.

  • People who are willing to do something that is good enough for now.

  • People who are adaptable.

  • People willing to work outside their expertise.

  • People who can perform critical thinking.

  • People who are looking to grow with the team.

  • People who value culture and give a damn.

Work ethic is highly valued.

Work ethic is about showing up, being on time, being reliable, doing what you say you’re going to do, being trustworthy, putting in a fair day’s work, respecting the work, respecting the customer, respecting the organization, respecting co-workers, not wasting time, not making work hard for other people, not creating unnecessary work for other people, not being a bottleneck, not faking work. Work ethic is about being a fundamentally good person that others can count on and enjoy working with.

If you want to walk fast, walk alone. If you want to walk far, walk together. -- African proverb

Don't make your colleague’s job harder than it is; try to make it easier. You should reduce the work of others as much as you can.

Culture is highly valued, and our culture is made up of our shared values, principles and strong work ethic.


  • Have Fun - You spend 1/3 of your life at work. Make it as fun as you can and turn moments into adventures!

  • Be Flexible - Departments, products, and companies usually evolve very quickly. Hang on tight, because you will need to wear different hats based on the ever-evolving product roadmap and company makeup.

  • Share Responsibility - Work closely with your team members, swarm on work together and encourage closer collaboration focused on innovating and achieving better outcomes.

  • Value Feedback - Value a culture of learning, experimentation, and iteration to achieve personal, team and product success.

  • Continuously Improve - Embrace feedback and use it for yourself, your team and the entire organization to achieve better outcomes. Challenge yourself and those around you to break the status quo!

  • Build Quality In - Value building quality into every interaction you have and everything you do. From tasks to people to peer reviews: be quality first!

  • Embrace Automation - Get feedback as early and as often as possible and apply the feedback to run experiments aimed at continuously improving and delivering high quality and achieve the right outcomes.

  • Destroy Every Operational Silo - Work to increase shared ownership across teams and departments. Reduce waste from unnecessary bottlenecks and constraints. Remove dependencies which cause impediments, reduce cycle times and reduce complexity in communication. Be aware of escalating commitments due to late discovery and a lack of understanding of the definition of done.

  • Support Autonomous Teams - Solve your own problems, remove your constraints and impediments and optimize the whole rather than the part. Practice team self-service and bring forth solutions to issues. Be cross-functional, own solutions from top to bottom and don’t support silo thinking.

  • Embrace the Mantra, “Early, and Often” - We release as soon as possible. And we release often. Learn early, learn often, experiment and continuously improve. This applies to not just code!

  • Be an “Outside-In” Thinker - Don’t reinvent the wheel. Look for best practice and how outside influencers are successful and bring that into the company.


Technology plays a role in every business nowadays. In order to ensure success, and determine likely technology outcomes which help create success in technical projects, it must be grounded in principles shown to have success time and time again.

The most modern application architecture is microservices, but our primary principle is that "Every tool is not a hammer." Basically, this means that in some instances, we would prefer to do MVC instead of microservices or microservices instead of MVC. Or one database platform instead of another. But, for the majority of our architecture work, we have established principles, which most often lead to success.

Our microservices architecture is designed behind the principles behind companies that do microservices well, as well as the 12-Factor Application Principles. Microservices are fostered by a culture of automation, which builds services that can be deployed independently and the application and its backend services are decentralized, made up of many parts that are independently scalable, modeled around boundaries of the business domain and abstracted so that implementation details are hidden. Each part gives us the ability to isolate failures and is highly observable.


Suggested Reading: The Twelve-Factor App and Patterns for Microservices and Building Microservices


These are small autonomous services that work together, modeled around a business domain. One service is not more important than another service. These are autonomous because they can be changed, deployed and work independently of one another and all share these common characteristics.


Our choices, when determining what decisions to make regarding our architecture, should be evaluated against these sets of architectural principles that come from microservices and the twelve-factor app to validate if that decision may be sound before we commit to moving forward.


These are the principles adopted by companies that do microservices well, which avoids a “minefield” of complexity and prevents them from falling into the same pitfalls of SOA implementations.

Modeling Services Around a Business Domain

This achieves stability around an API and makes it easy to be reused or recombined in different ways for different user interfaces. We must find stable boundaries that align around business domains. Changing APIs is one of the most disruptive things to do, so finding stable boundaries leads to stability of the service and it also helps to align ownership of services to teams and aligning those teams to parts of the business domain which allows for teams to become experts in the part of the business domain for goals of which we are trying to achieve. A great approach to defining boundaries in the business domain is by using “Domain Driven Development” and performing exercises such as “Event Storming”. Suggested Reading: Implementing Domain-Driven Design.

Embracing a Culture Around Automation

This is to manage the emerging complexity that comes from having a large number of deployable units. This requires a lot of investment into the tooling and automation to manage all of these independent units. Thankfully, in today's world, much of this tooling and automation exists off the shelf now, and we can simply configure these solutions into existing frameworks that have the concept of a “pipeline” for releasing into production. Even said, there is an initial payment that must be made on the application in order to emerge a microservices architecture into production, but once it is established, the engineering organization and its software becomes resilient and achieves an incredible amount of agility, resulting in the ability to create and launch new services exponentially faster. This makes the business extremely competitive and makes it very difficult for competitors to compete at the scale by which we can adapt and maintain an impressive velocity against the roadmap.

Automation includes things like Infrastructure automation for provisioning, configuration, and scaling, automated testing because testing in a distributed system is inherently more complex and we need to have a high degree of confidence we are shipping working software as well as a reduction in cycle time for the feedback loop we get from testing. We also want to enable continuous delivery that allows us to get working software into production very quickly and without manual processes and manual inspection stage gates.

Without the investment of automation in a micro-services architecture or choosing technology that cannot be automated, the complexity cannot be managed with manual processes and allow the organization to be resilient and agile.

Hiding Implementation Details

To allow services to evolve independently from one another. This is the absolute decoupling of our implementation from the API itself, which allows other services to evolve without dependencies on other services implementation detail; so when that service changes, the other services do not also have to change. For example, hiding implementation details like the schema of the databases and not allowing it to be exposed or coupled through the API. This allows us to change our implementation details without affecting the API and also prevent from having to adjust user interfaces, orchestration layers, etc due to the change in that implementation detail.

This also changes the way we think of APIs itself. Using things like the “bounded context”, the collective responsibilities of a domain or logical grouping of things, and focusing on it rather than exposing how specifically this is implemented behind the API. This goes directly to allowing for complete autonomy of a service to have as much freedom as possible to do the job at hand.

Decentralization of all the Things

This is not just organizational decision making, but also architectural decision making and power. It is also the decentralization of power found in technology and allowing services to act on their own behalf. No longer do we have middleware that acts as not only the broker but the decision making processes for services. Nor longer do we have a centralized database in which services must adhere to their models. Every service is completely autonomous and holds its own power so it can do its job independently of any other. This allows each service in a business domain to have its own decision-making process for what the persistence looks like, how it will handle actions within its bounded context as well as any other resource it needs to use so it can perform its job. This also allows us to make unique decisions to solve problems specific to its bounded context, rather than adhere to centralized governance of any particular resource.

Deploying Independently

It should be the norm, not the exception, that you can deploy the service independently of any other service. This goes directly to reducing complexity and dependencies between your services and forces you to ensure that you are following many of the other principles because if your services are coupled by having implementation details exposed, are not decentralized, etc; then you will not be able to deploy independently. No other service should be required to be deployed while deploying another because each service is autonomous and able to do its own job without the help or dependency on another. While a new feature may require multiple services to be fully realized by a user interface, the service itself does not require any other services to do its own job or be deployed. This also allows us to use an expanding pattern to deploy new versions of services, while we can roll out upgrades to consumers over time to leverage those new versions of APIs.

Consumer First

This is an outside-in way of thinking rather than inside-out so we understand how these services are meant to be used by the outside world. Things like documentation goes a long way to make an API more useable, including tutorials and how-to documents. Using tools like Swagger or others actually provides a way to run requests against mock endpoints which allows for experimentation and sandboxing of outside development efforts. Other things, like service discovery tools, which allow a consumer to know where services are and interrogate them allow APIs to be better consumed. While we value the consumer, we also value making things as simple as possible in order to maintain these systems.

Isolate Failure

The system will not be more stable and resilient simply by having it running in multiple services. The system must be built to be resilient and self to recover or handle failure. With a distributed system, the surface area of failure is much wider. As a result, a system is more susceptible to failure, be more unreliable and etc due to all the “moving parts”. Resiliency does not come for free, but we can work to isolate failure and make the system more resilient.

Some things we want to become resilient with and isolate failures:

  • Latency issues

  • Timeout Issues

  • Error rates


By building an orchestration layer, as an example, that can degrade functionality or isolate problems that allow the rest of the functionality in other services to proceed uninhibited until the problem can be resolved is necessary. Things like the “circuit breaker” pattern help to resolve this and make a system more resilient. While some services may not be working, that failure will not cause the rest of the requests on the orchestration layer to fail due to things like building up threads to such a point that there is no more CPU to handle other requests because the requests are piling up due to that failure. When we fail, we want to fail fast and we want to isolate that failure so it does not affect performance or functionality in other working services. Things like the Netflix orchestration layer Hystrix (as one simple example) solve many of these problems by implementing failure isolation patterns. This is called “bulkheading”.

Highly Observable

We must have a high degree of visibility that extends far beyond basic monitoring in order to understand what is happening in a distributed system. Visibility needs to happen at scale. The services and systems need to generate a great deal of data through logs, and we must have good aggregation and inspection capabilities over vast amounts of data. Additionally, the toolset used to inspect logs should also be able to alert based on patterns that are important to identify failures, performance issues, software errors, usage statistics, and even security incidents.


For a distributed system, you need to look at things as a whole instead of just isolated systems. You need to be able to see when a transaction starts and be able to follow it all the way downstream so you can trace that operates through multiple services. Additionally, it is helpful to see the parent and child calling relationships so you can draw conclusions about how operations are flowing through the entire distributed system. This will allow you to look at the request holistically, but also able to dive down into a specific system to see a stack trace.


Dashboards are also a great tool for spotting abnormalities and trends, which allow you to be more proactive in spotting slow failures before they become larger problems.


Along with technology, data plays a dramatic role in an enterprise's ability to make business decisions, or empower its users to make a similar decision with objective data sets. The following principles are those which data systems should be designed for, in both Data Engineering and Data Science.

How we organize the business is important if we want to be successful with our data investment. When it comes down to the core, it’s not just about the technology; it’s about the people.

Define Your Data Strategy FIRST

A clearly defined data strategy is the starting point of any data project. Without a clearly defined strategy for how you handle your enterprise's existing data structures as well as the prioritization, and combination of datasets and then quality definitions that can impact your ability to spend the right amount of time on data preparation, many organizations will find themselves entrenched in impediments.

Your data strategy should be sufficient so that it embraces the CTO Principles, entirely, for all parts, including Team, Architecture, Data, etc. Your data strategy should be flexible so that it can be amended as the data engineering and data science personnel work to continuously improve any processes, systems or organization of its people.

Your data strategy should:

  • Avoid data structures where data is spread across multiple disparate systems across the organization, such as data warehouses, data lakes, databases, and file systems.

  • Avoid increased complexity due to equal priority and combination of both streaming datasets (IoT, social) and historical datasets for real-time analytics purposes.

  • Avoid too high demands on high-quality datasets, causing too much time being spent on data preparation tasks such as combining data, cleaning and verifying data, enriching data, labeling data, and so on.

  • Don't place too many limits on AI frameworks and other tools that allow data scientists to get the job done in their most proficient and fluent capability. But don't support so many that it impeeds getting the job done because governance is difficult or that it creates a disjointed ecosystem that lacks the ability to secure sufficient capacity and relevant data feeds for model training.

  • Allow data scientists to choose their favorite languages to visualize data and train models; this is the only way enterprises can truly solve the talent gap, which makes data scientists productive in their existing skills. Support an ecosystem that allows you to hire as wide of an aperture as possible, from a talent perspective, and support the deployment of a wide range of tools that get the job done according to our principles. Data scientists should have the choice to use the right framework to solve the right problem. Not every tool is a hammer and not everything is a nail.

  • Be careful not to increase complexity in development and operations (DevOps) in terms of setting up and maintaining a big data infrastructure, managing upgrades, and fixes, and scaling the infrastructure with growing data volumes as well as in providing high-performance infrastructure for large teams of data scientists.

  • Be careful of the skills required to build and maintain a scalable infrastructure. A proper AI project usually needs scalable infrastructure, but don't build something so unique that it requires such a diverse set of skills that you cannot hire it sufficiently into your business.

  • Build highly performant and reliable data pipelines. Speed up the process to explore, prepare, and ingest massive datasets for best-in-class AI applications. Simplify data management and easily connect data pipelines with machine learning (ML) to quickly fit the models to the data. The idea is to separate compute from storage for the best performance at lower costs.

  • Continuously train, track, and deploy AI models on big data faster, from experimentation to production. Ideally, prepackaged and ready-to-use ML frameworks should be available out of the box. Simplify model deployment and management to various platforms, so you can take advantage of hardware support for techniques such as deep learning.

  • Foster a collaborative environment for data scientists and data engineers to work effectively across the entire development-to-production life cycle. Providing a collaborative and interactive workspace makes it easier to automate and manage production pipelines, and build and train models, as well as visualize and share insights between key stakeholders.

  • Improve monitoring of all steps of software construction, from integration, testing, and releasing to deployment and infrastructure management. Reduce infrastructure complexity, particularly when delivered as a fully managed cloud service, which offers reliability, scale, security, and cost-efficiency.

  • Have sufficient ecosystem support for the most popular languages and tools, such as R, Python, Scala, Java, and SQL, as well as integrations with RStudio, DataRobot, Alteryx, Tableau, Power BI, and more, which allows practitioners to use their preferred toolkit. A good cross-team collaborative workspace should also foster teamwork between data engineers and data scientists via interactive notebooks, APIs, or their favorite Integrated Development Environments (IDEs), all backed with version control and change management support.

  • Allow practitioners to access all needed data in one place and automate the most complex data pipelines with job scheduling, monitoring, and workflows as notebooks or APIs. That access gives the teams full flexibility to run and maintain data pipelines and machine learning (ML) at scale.

  • Establish a data science lifecycle which consists of business understanding, data mining, data cleaning, data exploration, feature engineering, predictive modeling, and data visualization.

  • Establish an ML lifecycle, which consists of ML frameworks, ML lifecycle management, and distributed computing. Reproducibility, good management, and tracking experiments are necessary for making it easy to test others’ work and perform analysis. Creating a good workflow that you can reproduce, revisit, and deploy to production is necessary. Establish how to record and query experiments, as well as packaging ML models so they can be reproducible and run on any platform. This includes reuse of models locally or at scale in the cloud, as well as good iterative support for moving them between different stages of their life cycle, from experimentation to production. Key features in a well-managed ML lifecycle environment include

    • Prepackaged and ready to use ML frameworks out of the box.

    • Reproduced runs using any ML library or framework, locally or in the cloud

    • Simplified management and keeping track of experiments and results

    • Streamlined model deployment and management to various platforms

    • Making distributed deep learning simpler with built-in optimizations

    • Getting results faster with accelerated hardware support for deep learning.

  • Decouple storage and compute to reduce data movements, which will increase agility exponentially.

  • Provision a role within the company for Chief Data Officer, if your company is sufficient in size to fund the position. If not, work towards funding the position so it can be properly focused on. A company that is meant to heavily rely on data, must fully invest in it. A good company with a properly managed data-centric business MUST have dedicated leadership for all data initiatives, separate that from regular software development activities. As much as the challenges exist in software development, they are even harder to overcome with data engineering and data science, and especially across the business so we can unify data. A proper investment must be made.

Implementing a secure but enabling infrastructure is key to success with our AI investments. However, when approaching AI infrastructure work, many companies tend to forget to define a clear and aligned data strategy first. The impact of lacking a unified data strategy, which is communicated across the company and reflected in how the infrastructure is approached, shouldn’t be underestimated.

Unify the SILOS

Despite the appeal of artificial intelligence (AI), most enterprises struggle to succeed with AI. A common reason for this is that Data Engineering and Data Science are siloed from one another, both from a system and organization perspective. Just like we have the "Destroy Every Operational Silo" in our team principles, this is especially prevalent in the way organizations organize to solve problems in data.

Typically, enterprise data is itself siloed across many systems such as data warehouses, data lakes, databases, and file systems that aren’t AI-enabled.

This means a colossal amount of time is spent on preparing the data for more extensive analysis. It includes activities like cleaning data from duplicates, errors, missing fields and combining different data types into new groups for added comprehension. Other activities are needed, such as confirming that the data is actually accurate and enriching the data with additional attributes, giving it a context. All these activities and many more are aimed at getting the data ready to be part of a model that can execute an analysis.

Companies ordinarily have siloed teams for data engineering and data science. Where data engineers deal with large scale data preparation and deploying models into production, and data scientists deal with AI — exploring data and building, training, and validating models. This organizational separation creates impedance and slows projects down, which then becomes an impediment to the highly iterative nature of AI projects.

We must

  • Provide an integrated environment for data engineers to iteratively create and provide high-quality data sets to data scientists.

  • Provide collaboration capabilities, growing knowledge sharing among data scientists, and the ability to iteratively explore data, train, and fine-tune models as a team.

  • Limit complex procedures when deploying models into production which leads to multiple handoffs between data scientists, data engineers, and developers, slowing down the process and increasing risk to introduce errors.

Empowered data engineers are successful data engineers. We must invest time in defining an integrated and flexible data strategy.

Be Iterative, Be Collaborative

We are typically iterative in software development, so we need systems and teams organized in a way that supports our principle "Early and Often", even in data. The limited availability in the market of talent and skills in Data Science and Data Engineering to help solve AI makes it critical to improve internal efficiency, collaboration, knowledge sharing, and employee satisfaction in every company aiming to utilize the AI potential.

This is the only way to take an average team, and turn them into a business outcome performance powerhouse.

Preparing data for AI is a MAJOR bottleneck. After all the work of preparation for the predictive models, data scientists often find out that the results aren’t good enough and have to go back and start the cycle again. Given the fundamental need for an iterative and "Early and Often" feedback loop in AI development, companies need to cycle through the life cycle of preparing data, training the models, and deploying the model into production very quickly.