Chapter 2: Core Concepts and Definitions

Learning Objectives

After completing this chapter, you will be able to:

Define key infrastructure and platform management terminology with precision
Distinguish between different infrastructure types and deployment models
Understand cloud service models (IaaS, PaaS, SaaS, FaaS, CaaS) and their implications
Explain Infrastructure as Code principles, benefits, and implementation approaches
Recognize the components of modern infrastructure stacks across compute, network, storage, and security
Apply consistent terminology throughout infrastructure discussions and documentation
Understand the shared responsibility model across different service models
Differentiate between traditional and cloud-native infrastructure approaches

Introduction

Effective communication about infrastructure requires a shared vocabulary. Misunderstandings about fundamental concepts lead to poor decisions, failed projects, and wasted resources. This chapter establishes the core concepts and definitions used throughout this handbook and in the broader infrastructure management discipline.

The terminology of infrastructure has evolved significantly over the past decade. Terms like “server” now encompass physical machines, virtual machines, containers, and serverless functions. “Storage” spans local disks, SANs, object storage, and distributed file systems. Understanding these concepts in their modern context is essential for architects, engineers, and managers who need to communicate clearly about infrastructure decisions, designs, and operations.

This chapter serves as both a learning resource for those new to modern infrastructure and a reference for experienced practitioners who need precise definitions. The concepts introduced here form the foundation for all subsequent chapters.

Infrastructure Types and Deployment Models

On-Premises Infrastructure

On-premises infrastructure refers to IT infrastructure owned, operated, and housed within an organization’s own facilities. This traditional model gives organizations complete control over their hardware, software, and data.

Characteristic	Description	Implications
Ownership	Organization owns all hardware and facilities	Full capital expenditure, asset management required
Control	Complete control over all aspects of infrastructure	Maximum flexibility, maximum responsibility
Capital Model	Capital expenditure (CapEx) for hardware acquisition	Large upfront investment, depreciation over time
Responsibility	Full stack responsibility from facilities to applications	Requires diverse skill sets, 24x7 operations capability
Flexibility	Limited by physical capacity and procurement cycles	Capacity planning critical, scaling takes time
Location	Physical data center owned or leased	Geographic constraints, physical security requirements

Advantages of On-Premises Infrastructure:

Advantage	Description	Best For
Complete control	Full authority over hardware, software, and configuration	Organizations with strict control requirements
Data sovereignty	Data remains within organizational boundaries	Compliance with data residency regulations
No provider dependency	No reliance on external cloud providers	Organizations concerned about vendor lock-in
Predictable costs	Fixed costs after initial investment (excluding maintenance)	Stable, predictable workloads
Customization	Ability to customize hardware and software to exact specifications	Specialized workloads with unique requirements
Network performance	Low latency for on-premises applications	Latency-sensitive applications

Challenges of On-Premises Infrastructure:

Challenge	Description	Mitigation
High upfront investment	Significant capital required before deployment	Leasing options, phased deployment
Capacity planning	Must predict future needs accurately	Conservative planning, modular expansion
Hardware lifecycle	Regular refresh cycles required (typically 3-5 years)	Lifecycle management, technology roadmap
Facility requirements	Power, cooling, physical security, space	Colocation as alternative
Scaling limitations	Adding capacity requires procurement and installation	Hybrid cloud for burst capacity
Operational overhead	24x7 operations, maintenance, patching	Managed services, automation

Cloud Infrastructure

Cloud infrastructure refers to IT resources delivered as services by third-party providers over the internet. Cloud computing fundamentally changes the economics and operations of infrastructure.

Characteristic	Description	Implications
Ownership	Provider owns and maintains all infrastructure	No hardware management, provider handles facilities
Control	Shared responsibility model based on service type	Less control, but less responsibility
Capital Model	Operational expenditure (OpEx), pay-as-you-go	Variable costs, potential for optimization
Responsibility	Varies by service model (IaaS, PaaS, SaaS)	Different skill requirements per model
Flexibility	Elastic capacity, on-demand provisioning	Scale up/down based on demand
Location	Global regions and availability zones	Geographic distribution, data residency options

Cloud Deployment Models:

Model	Description	Characteristics	Use Cases
Public Cloud	Shared infrastructure, multi-tenant environment	Lowest cost, highest scalability, shared resources	General workloads, variable demand, startups
Private Cloud	Dedicated infrastructure for single organization	More control, dedicated resources, higher cost	Compliance requirements, security-sensitive workloads
Hybrid Cloud	Combination of on-premises and public cloud	Flexibility, workload placement options	Mixed requirements, cloud migration journey
Multi-Cloud	Multiple public cloud providers	Avoid lock-in, best-of-breed services	Risk distribution, specialized services
Community Cloud	Shared infrastructure for specific community	Shared costs, common compliance requirements	Government, healthcare, research

Public Cloud Characteristics:

PUBLIC CLOUD MODEL

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CLOUD PROVIDER INFRASTRUCTURE                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                        SHARED INFRASTRUCTURE                         │   │
│   │                                                                      │   │
│   │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │   │
│   │   │ Customer │    │ Customer │    │ Customer │    │ Customer │    │   │
│   │   │    A     │    │    B     │    │    C     │    │    D     │    │   │
│   │   │  (Your   │    │          │    │          │    │          │    │   │
│   │   │   Org)   │    │          │    │          │    │          │    │   │
│   │   └──────────┘    └──────────┘    └──────────┘    └──────────┘    │   │
│   │                                                                      │   │
│   │   ┌──────────────────────────────────────────────────────────────┐ │   │
│   │   │              HYPERVISOR / CONTAINER RUNTIME                   │ │   │
│   │   └──────────────────────────────────────────────────────────────┘ │   │
│   │                                                                      │   │
│   │   ┌──────────────────────────────────────────────────────────────┐ │   │
│   │   │           PHYSICAL SERVERS, STORAGE, NETWORK                  │ │   │
│   │   └──────────────────────────────────────────────────────────────┘ │   │
│   │                                                                      │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   PROVIDER RESPONSIBILITIES:                                                 │
│   • Physical security        • Hardware maintenance    • Network backbone   │
│   • Power and cooling        • Hypervisor security     • Service APIs       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Hybrid Infrastructure

Hybrid infrastructure combines on-premises and cloud resources, enabling organizations to place workloads in the optimal location based on requirements.

HYBRID INFRASTRUCTURE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                         HYBRID INFRASTRUCTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────┐       ┌─────────────────────────────┐     │
│  │       ON-PREMISES           │       │        PUBLIC CLOUD          │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Core Systems       │   │       │   │  Cloud-Native Apps  │   │     │
│  │  │  • ERP              │   │       │   │  • Web applications │   │     │
│  │  │  • Core databases   │   │       │   │  • Mobile backends  │   │     │
│  │  │  • Legacy apps      │   │       │   │  • APIs             │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Sensitive Data     │   │       │   │  Dev/Test           │   │     │
│  │  │  • PII              │   │       │   │  • Development      │   │     │
│  │  │  • Financial        │   │       │   │  • Testing          │   │     │
│  │  │  • Regulated        │   │       │   │  • Staging          │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Compliance         │   │       │   │  Burst Capacity     │   │     │
│  │  │  Workloads          │   │       │   │  • Peak loads       │   │     │
│  │  │                     │   │       │   │  • Seasonal demand  │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  └──────────────┬──────────────┘       └──────────────┬──────────────┘     │
│                 │                                      │                    │
│                 │     HYBRID CONNECTIVITY              │                    │
│                 │                                      │                    │
│                 │  ┌────────────────────────────────┐ │                    │
│                 └──┤  VPN / Direct Connect /        ├─┘                    │
│                    │  ExpressRoute / Interconnect   │                      │
│                    └────────────────────────────────┘                      │
│                                                                              │
│  UNIFIED MANAGEMENT:                                                        │
│  • Single pane of glass    • Consistent policies    • Integrated security  │
│  • Unified monitoring      • Common automation      • Shared identity      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Hybrid Cloud Connectivity Options:

Connection Type	Description	Bandwidth	Latency	Cost	Use Case
Site-to-Site VPN	Encrypted tunnel over internet	Up to 1 Gbps	Variable	Low	Development, non-critical
Direct Connect (AWS)	Dedicated private connection	1-100 Gbps	Low, consistent	High	Production workloads
ExpressRoute (Azure)	Dedicated private connection	50 Mbps-10 Gbps	Low, consistent	High	Production workloads
Cloud Interconnect (GCP)	Dedicated private connection	10-100 Gbps	Low, consistent	High	Production workloads
SD-WAN	Software-defined WAN	Variable	Optimized	Medium	Distributed offices

Multi-Cloud Strategy

Multi-cloud refers to the use of multiple cloud providers, either for different workloads or for redundancy.

Strategy	Description	Benefits	Challenges
Best of Breed	Use best service from each provider	Optimal capabilities	Management complexity
Risk Distribution	Spread workloads across providers	Reduced provider dependency	Cost, complexity
Geographic Coverage	Use providers with specific regional presence	Global reach	Data synchronization
Cost Optimization	Leverage competitive pricing	Cost savings	Comparison complexity
Compliance	Meet specific regulatory requirements	Compliance achievement	Policy consistency

Multi-Cloud Architecture:

MULTI-CLOUD ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                           MULTI-CLOUD STRATEGY                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐          │
│  │       AWS        │  │      AZURE       │  │       GCP        │          │
│  │                  │  │                  │  │                  │          │
│  │  • EC2/EKS       │  │  • AKS           │  │  • BigQuery      │          │
│  │  • S3            │  │  • Azure AD      │  │  • ML/AI         │          │
│  │  • Lambda        │  │  • Office 365    │  │  • Anthos        │          │
│  │  • RDS           │  │  • Cosmos DB     │  │  • Cloud Run     │          │
│  │                  │  │                  │  │                  │          │
│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘          │
│           │                     │                     │                     │
│           └─────────────────────┼─────────────────────┘                     │
│                                 │                                           │
│                    ┌────────────┴────────────┐                              │
│                    │   MULTI-CLOUD LAYER     │                              │
│                    │                         │                              │
│                    │  • Terraform (IaC)      │                              │
│                    │  • Kubernetes (K8s)     │                              │
│                    │  • Service Mesh         │                              │
│                    │  • Identity Federation  │                              │
│                    │  • Observability        │                              │
│                    │  • Cost Management      │                              │
│                    └─────────────────────────┘                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cloud Service Models

Infrastructure as a Service (IaaS)

IaaS provides virtualized computing resources over the internet. Users manage operating systems, applications, and data while the provider manages physical infrastructure.

Aspect	Description	Examples
What You Get	Virtual machines, storage volumes, virtual networks	EC2, Azure VMs, GCE
What You Manage	OS, middleware, runtime, applications, data	Full stack from OS up
Provider Manages	Hardware, hypervisor, physical network, facilities	Physical infrastructure
Pricing Model	Per hour/second for compute, per GB for storage	On-demand, reserved, spot

IaaS Characteristics:

Characteristic	Description	Consideration
Maximum Control	Full control over OS and everything above	Requires more operational capability
Maximum Flexibility	Any OS, any software, any configuration	More decisions to make
Operating System Access	Full root/admin access	Security responsibility
Patching Responsibility	Customer responsible for OS and application patching	Operational overhead
Scaling	Manual or auto-scaling of VM instances	Must configure scaling
Cost Model	Pay for allocated resources, even if underutilized	Right-sizing important

IaaS Use Cases:

Use Case	Description	Why IaaS
Lift and shift migration	Moving existing applications to cloud	Minimal application changes
Development and testing	Creating dev/test environments	Quick provisioning, cost control
High-performance computing	Scientific computing, rendering	Access to powerful hardware
Big data analysis	Processing large datasets	Scalable compute resources
Web hosting	Running web servers	Full control over configuration
Disaster recovery	DR site in cloud	Cost-effective standby

Platform as a Service (PaaS)

PaaS provides a platform for developing, running, and managing applications without the complexity of managing underlying infrastructure.

Aspect	Description	Examples
What You Get	Runtime environment, development tools, middleware	App Service, Elastic Beanstalk, Heroku
What You Manage	Application code, data	Focus on application logic
Provider Manages	Infrastructure, OS, runtime, middleware	Everything below application
Pricing Model	Per application, per instance, per request	Usage-based

PaaS Characteristics:

Characteristic	Description	Consideration
Reduced Complexity	No OS management, automatic patching	Less operational overhead
Faster Development	Focus on code, not infrastructure	Accelerated development
Limited Control	Constrained to platform capabilities	May not fit all applications
Vendor Lock-in	Applications may become platform-specific	Portability concerns
Built-in Scaling	Automatic scaling based on demand	Less configuration required
Cost Efficiency	Pay for actual usage	Better for variable workloads

PaaS Use Cases:

Use Case	Description	Why PaaS
Web applications	Modern web apps	Rapid deployment, auto-scaling
API development	Building and hosting APIs	Managed infrastructure
Microservices	Deploying microservice architectures	Container orchestration included
Mobile backends	Backend services for mobile apps	Pre-built services
DevOps automation	CI/CD pipelines	Integrated tooling

Software as a Service (SaaS)

SaaS delivers complete applications over the internet, fully managed by the provider.

Aspect	Description	Examples
What You Get	Complete application, accessible via browser/API	Office 365, Salesforce, Workday
What You Manage	User configuration, data	Minimal technical management
Provider Manages	Everything: infrastructure, application, updates	Full stack responsibility
Pricing Model	Per user, per month, per feature tier	Subscription-based

Containers as a Service (CaaS)

CaaS provides container orchestration and management as a service.

Aspect	Description	Examples
What You Get	Managed container orchestration platform	EKS, AKS, GKE
What You Manage	Container images, deployments, configurations	Application containers
Provider Manages	Control plane, node infrastructure, networking	Kubernetes infrastructure
Pricing Model	Per cluster, per node, per request	Varies by provider

Functions as a Service (FaaS) / Serverless

FaaS provides event-driven compute without server management.

Aspect	Description	Examples
What You Get	Event-triggered function execution	Lambda, Azure Functions, Cloud Functions
What You Manage	Function code, triggers	Just the code
Provider Manages	Everything else: scaling, infrastructure	Full infrastructure abstraction
Pricing Model	Per invocation, per duration, per memory	Pay only for execution

FaaS Characteristics:

Characteristic	Description	Consideration
No Server Management	Zero infrastructure to manage	Maximum abstraction
Automatic Scaling	Scales from zero to thousands	Built-in elasticity
Pay Per Execution	Only pay when code runs	Cost-effective for variable loads
Cold Start	Initial invocation latency	Not suitable for all workloads
Time Limits	Maximum execution duration	Not for long-running processes
Stateless	No persistent local state	Must use external storage

Database as a Service (DBaaS)

DBaaS provides managed database services.

Aspect	Description	Examples
What You Get	Managed database engine	RDS, Azure SQL, Cloud SQL, MongoDB Atlas
What You Manage	Schema, queries, data, some configuration	Database design and usage
Provider Manages	Hardware, patching, backups, HA, scaling	Database operations
Pricing Model	Per instance, per storage, per IOPS	Resource-based

Service Model Comparison

SERVICE MODEL RESPONSIBILITY COMPARISON

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SHARED RESPONSIBILITY MODEL                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│           ON-PREM      IaaS        PaaS        CaaS       SaaS              │
│         ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐         │
│         │  YOU   │  │  YOU   │  │  YOU   │  │  YOU   │  │  YOU   │         │
│         │ MANAGE │  │ MANAGE │  │ MANAGE │  │ MANAGE │  │ MANAGE │         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  ├────────┤         │
│ DATA    │████████│  │████████│  │████████│  │████████│  │████████│         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  ├────────┤         │
│ APP     │████████│  │████████│  │████████│  │████████│  │        │         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  │PROVIDER│         │
│ RUNTIME │████████│  │████████│  │        │  │        │  │ MANAGES│         │
│         ├────────┤  ├────────┤  │PROVIDER│  │PROVIDER│  │        │         │
│ O/S     │████████│  │████████│  │ MANAGES│  │ MANAGES│  │        │         │
│         ├────────┤  ├────────┤  │        │  │        │  │        │         │
│ VIRTUAL │████████│  │        │  │        │  │        │  │        │         │
│         ├────────┤  │PROVIDER│  │        │  │        │  │        │         │
│ SERVER  │████████│  │ MANAGES│  │        │  │        │  │        │         │
│         ├────────┤  │        │  │        │  │        │  │        │         │
│ STORAGE │████████│  │        │  │        │  │        │  │        │         │
│         ├────────┤  │        │  │        │  │        │  │        │         │
│ NETWORK │████████│  │        │  │        │  │        │  │        │         │
│         └────────┘  └────────┘  └────────┘  └────────┘  └────────┘         │
│                                                                              │
│  █ = Customer Responsibility     (blank) = Provider Responsibility          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Infrastructure as Code (IaC)

Definition and Principles

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes or interactive configuration tools.

Formal Definition: Infrastructure as Code is an approach to infrastructure automation that applies software engineering practices—including version control, code review, testing, and continuous integration—to infrastructure management.

Core Principles of IaC

Principle	Description	Implementation	Benefit
Declarative Definition	Define desired end state, not procedural steps	Terraform HCL, CloudFormation YAML	System figures out how to achieve state
Version Control	All infrastructure code stored in Git	GitHub, GitLab, Bitbucket	History, collaboration, rollback
Idempotency	Applying code multiple times produces same result	Terraform plan/apply	Safe to re-run, predictable
Modularity	Reusable, composable components	Terraform modules, CloudFormation nested stacks	DRY principle, consistency
Immutability	Replace rather than modify in place	Blue-green deployments, AMI-based	Reduced drift, predictable state
Testing	Validate infrastructure before deployment	Terratest, Kitchen-Terraform	Catch errors early
Documentation as Code	Code serves as documentation	Self-documenting infrastructure	Always current documentation

Declarative vs. Imperative Approaches

Approach	Description	Example	Best For
Declarative	Define desired state, system determines how	Terraform, CloudFormation	Most infrastructure provisioning
Imperative	Define exact steps to execute	Ansible playbooks, shell scripts	Configuration, orchestration
Hybrid	Declarative structure with imperative elements	Pulumi, CDK	Complex logic requirements

Declarative Example (Terraform):

# Terraform declares WHAT should exist
resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"

  tags = {
    Name        = "web-server-prod"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_security_group" "web" {
  name        = "web-sg"
  description = "Security group for web servers"

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Imperative Example (Ansible):

# Ansible defines HOW to configure
- name: Configure web server
  hosts: web_servers
  become: yes

  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
        update_cache: yes

    - name: Copy configuration file
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: Restart nginx

    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: Restart nginx
      service:
        name: nginx
        state: restarted

IaC Benefits

Benefit	Description	Quantified Impact
Consistency	Identical infrastructure across environments	Zero configuration drift
Speed	Provision infrastructure in minutes	100x faster than manual
Repeatability	Recreate infrastructure reliably	Disaster recovery capability
Auditability	Git history captures all changes	Complete change history
Collaboration	Teams can review and contribute	Reduced errors through review
Disaster Recovery	Rebuild infrastructure from code	Hours instead of days
Cost Reduction	Eliminate manual errors, reduce rework	40-60% operational savings
Compliance	Enforce standards through code	Automated compliance checks

IaC Tool Categories

Category	Purpose	Tools	When to Use
Provisioning	Create cloud resources	Terraform, CloudFormation, Pulumi, CDK	Creating infrastructure
Configuration Management	Configure systems	Ansible, Puppet, Chef, Salt	Configuring servers
Container Orchestration	Manage containers	Kubernetes, Docker Swarm, Nomad	Running containerized apps
Image Building	Create machine images	Packer, EC2 Image Builder	Creating golden images
Secret Management	Manage secrets	Vault, AWS Secrets Manager	Handling credentials

IaC Tool Comparison

Tool	Type	Language	State Management	Cloud Support	Learning Curve
Terraform	Provisioning	HCL	Remote state	Multi-cloud	Medium
CloudFormation	Provisioning	YAML/JSON	AWS managed	AWS only	Medium
Pulumi	Provisioning	Python/TS/Go	Remote state	Multi-cloud	Medium-High
CDK	Provisioning	Python/TS	CloudFormation	AWS primary	Medium-High
Ansible	Config Mgmt	YAML	Stateless	Multi-cloud	Low
Puppet	Config Mgmt	Puppet DSL	Puppet server	Multi-cloud	High
Chef	Config Mgmt	Ruby	Chef server	Multi-cloud	High

IaC Workflow

IaC DEVELOPMENT AND DEPLOYMENT WORKFLOW

┌─────────────────────────────────────────────────────────────────────────────┐
│                        INFRASTRUCTURE AS CODE WORKFLOW                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEVELOPMENT PHASE                                                           │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │ Write  │───►│ Lint   │───►│ Format │───►│ Local  │               │ │
│  │  │  Code  │    │        │    │        │    │  Test  │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │                                          │                     │ │
│  │      │  IDE/Editor                              │  terraform plan     │ │
│  │      │  tflint, checkov                         │  terratest          │ │
│  │      │  terraform fmt                           │  kitchen-terraform  │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  VERSION CONTROL PHASE                                                      │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │ Commit │───►│  Push  │───►│ Pull   │───►│ Review │               │ │
│  │  │        │    │        │    │ Request│    │        │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │                            │              │                    │ │
│  │      │  git commit                │  CI triggers │  Peer review       │ │
│  │      │  Conventional commits      │  PR created  │  Security review   │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  CI/CD PHASE                                                                │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │Validate│───►│Security│───►│ Plan   │───►│ Apply  │               │ │
│  │  │        │    │  Scan  │    │        │    │        │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │              │              │              │                   │ │
│  │      │  terraform   │  checkov    │  terraform   │  terraform        │ │
│  │      │  validate    │  tfsec      │  plan        │  apply            │ │
│  │      │              │  Snyk       │  (saved)     │  (auto/manual)    │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  POST-DEPLOYMENT                                                            │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐                              │ │
│  │  │ Verify │───►│Document│───►│Monitor │                              │ │
│  │  │        │    │        │    │        │                              │ │
│  │  └────────┘    └────────┘    └────────┘                              │ │
│  │      │              │              │                                  │ │
│  │      │  Smoke tests │  Update CMDB │  Drift detection                │ │
│  │      │  Health check│  Release note│  Cost monitoring                │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

State Management

IaC tools that manage state need careful consideration of how state is stored and accessed.

State Aspect	Description	Best Practice
Remote State	Store state in shared location	S3 + DynamoDB, Azure Blob, GCS
State Locking	Prevent concurrent modifications	Enable locking mechanism
State Encryption	Protect sensitive data in state	Enable encryption at rest
State Backup	Regular state backups	Versioning on state bucket
State Separation	Separate state per environment	Different state files per env

Compute Concepts

Physical Servers

Physical servers are dedicated hardware machines running workloads directly on hardware without virtualization.

Term	Definition	Use Case
Bare Metal	Physical server without hypervisor	High-performance, licensing, specific hardware
Rack Server	Server designed for 19-inch rack mounting	Data center deployments
Blade Server	Compact server in blade chassis	High-density environments
Tower Server	Standalone server unit	Small offices, edge locations
Hyperconverged Infrastructure (HCI)	Integrated compute, storage, networking	Simplified management

Virtual Machines

Virtual machines (VMs) are software-based computers running on a hypervisor that abstracts physical hardware.

Term	Definition	Examples
Hypervisor	Software layer enabling virtualization	VMware ESXi, Hyper-V, KVM, Xen
Type 1 Hypervisor	Runs directly on hardware (bare metal)	ESXi, Hyper-V, Xen
Type 2 Hypervisor	Runs on host operating system	VirtualBox, VMware Workstation
Guest OS	Operating system running inside VM	Windows, Linux within VM
Host OS	Operating system running the hypervisor	ESXi, Windows Server
vCPU	Virtual CPU allocated to VM	2 vCPU, 4 vCPU
VM Template	Pre-configured VM image for cloning	Golden images
Snapshot	Point-in-time capture of VM state	Backup before changes
Live Migration	Moving running VM between hosts	vMotion, Live Migration

Containers

Containers are lightweight, portable units that package applications with their dependencies, sharing the host OS kernel.

Term	Definition	Examples
Container Image	Packaged application with all dependencies	Docker image, OCI image
Container Runtime	Software that runs containers	Docker, containerd, CRI-O
Container Registry	Repository for storing container images	Docker Hub, ECR, ACR, GCR
Dockerfile	Instructions for building container image	Build recipe
Layer	Read-only filesystem layer in image	Image composition
Container	Running instance of container image	Isolated process

Container vs VM Comparison:

Aspect	Container	Virtual Machine
Isolation	Process-level (shared kernel)	Hardware-level (separate kernel)
Size	Megabytes (10s-100s MB)	Gigabytes (GBs)
Startup Time	Seconds	Minutes
Resource Overhead	Minimal	Significant (OS per VM)
Density	Hundreds per host	Tens per host
Portability	Very high (image-based)	Medium (hypervisor-dependent)
Security Isolation	Weaker (shared kernel)	Stronger (separate kernel)

CONTAINER VS VIRTUAL MACHINE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER VS VM ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  CONTAINERS                              VIRTUAL MACHINES                    │
│  ┌─────────────────────────────┐        ┌─────────────────────────────┐    │
│  │┌─────┐ ┌─────┐ ┌─────┐     │        │┌─────────┐ ┌─────────┐      │    │
│  ││App A│ │App B│ │App C│     │        ││  App A  │ │  App B  │      │    │
│  │├─────┤ ├─────┤ ├─────┤     │        │├─────────┤ ├─────────┤      │    │
│  ││Bins/│ │Bins/│ │Bins/│     │        ││ Bins/   │ │ Bins/   │      │    │
│  ││Libs │ │Libs │ │Libs │     │        ││ Libs    │ │ Libs    │      │    │
│  │└─────┘ └─────┘ └─────┘     │        │├─────────┤ ├─────────┤      │    │
│  │                            │        ││Guest OS │ │Guest OS │      │    │
│  │┌──────────────────────────┐│        │└─────────┘ └─────────┘      │    │
│  ││    Container Runtime     ││        │┌───────────────────────────┐│    │
│  │└──────────────────────────┘│        ││       HYPERVISOR          ││    │
│  │┌──────────────────────────┐│        │└───────────────────────────┘│    │
│  ││     Host Operating       ││        │┌───────────────────────────┐│    │
│  ││        System            ││        ││   Host Operating System   ││    │
│  │└──────────────────────────┘│        │└───────────────────────────┘│    │
│  │┌──────────────────────────┐│        │┌───────────────────────────┐│    │
│  ││       HARDWARE           ││        ││       HARDWARE            ││    │
│  │└──────────────────────────┘│        │└───────────────────────────┘│    │
│  └─────────────────────────────┘        └─────────────────────────────┘    │
│                                                                              │
│  PROS:                                   PROS:                               │
│  • Fast startup (seconds)               • Strong isolation                  │
│  • Lightweight (MBs)                    • Any OS on any host                │
│  • High density                         • Mature technology                 │
│  • Portable across platforms            • Better security boundaries        │
│                                                                              │
│  CONS:                                   CONS:                               │
│  • Shared kernel (security)             • Slow startup (minutes)            │
│  • Linux-centric (mostly)               • Heavy (GBs)                       │
│  • Newer technology                     • Lower density                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Container Orchestration (Kubernetes)

Kubernetes is the de facto standard for container orchestration, managing containerized applications at scale.

Term	Definition	Purpose
Pod	Smallest deployable unit (one or more containers)	Unit of deployment
Deployment	Manages replica sets and pod lifecycles	Declarative updates
Service	Stable network endpoint for pods	Service discovery
Namespace	Virtual cluster within cluster	Isolation, organization
ConfigMap	Non-sensitive configuration data	External configuration
Secret	Sensitive configuration data	Credentials, keys
Ingress	HTTP/HTTPS routing to services	External access
PersistentVolume	Storage abstraction	Persistent data
StatefulSet	Manages stateful applications	Databases, queues
DaemonSet	Ensures pod runs on all/some nodes	Logging, monitoring agents

Serverless Computing

Serverless computing abstracts infrastructure completely, allowing developers to focus solely on code.

Term	Definition	Examples
Function	Single-purpose code unit	Lambda function
Event Trigger	What initiates function execution	HTTP request, queue message, schedule
Cold Start	Initial invocation latency when function not warm	First request delay
Warm Start	Subsequent fast invocations	Reused container
Concurrency	Number of parallel function executions	Scaling unit
Provisioned Concurrency	Pre-warmed function instances	Eliminate cold starts

Network Concepts

Network Fundamentals

Term	Definition	Example
IP Address	Unique identifier for network device	192.168.1.100, 10.0.0.1
CIDR	IP address allocation notation	10.0.0.0/16 (65,536 addresses)
Subnet	Segment of larger network	10.0.1.0/24 (256 addresses)
Gateway	Entry/exit point for network	Router IP address
DNS	Domain Name System	Translates names to IPs
DHCP	Dynamic Host Configuration Protocol	Automatic IP assignment

Cloud Networking

Term	Definition	Examples
VPC/VNet	Virtual private cloud/network	Isolated network in cloud
Subnet	Network segment within VPC	Public subnet, private subnet
Route Table	Network routing rules	Direct traffic to destinations
Internet Gateway	Provides internet access to VPC	Public subnet connectivity
NAT Gateway	Enables private subnet internet access	Outbound only access
VPC Peering	Direct connection between VPCs	Inter-VPC communication
Transit Gateway	Hub for connecting multiple networks	Network hub
PrivateLink/Endpoint	Private connectivity to services	Avoid internet for AWS services

Network Security

Term	Definition	Level
Security Group	Virtual firewall for instances	Instance level
NACL	Network Access Control List	Subnet level
WAF	Web Application Firewall	Application level
Firewall	Network traffic filter	Network perimeter
IDS/IPS	Intrusion Detection/Prevention	Network monitoring

Load Balancing

Term	Definition	Use Case
Layer 4 (L4) Load Balancer	Routes based on IP and port	TCP/UDP traffic
Layer 7 (L7) Load Balancer	Routes based on application data	HTTP/HTTPS traffic
Health Check	Verifies target availability	Remove unhealthy targets
Target Group	Collection of targets	Group of instances
Listener	Checks for connection requests	Port and protocol
Algorithm	How traffic is distributed	Round-robin, least connections

Storage Concepts

Storage Types

Type	Description	Characteristics	Use Cases
Block Storage	Fixed-size blocks, formatted as filesystem	Low latency, high IOPS	Databases, boot volumes
File Storage	Hierarchical file system, shared access	POSIX compliant, concurrent access	Shared data, legacy apps
Object Storage	Flat structure, unlimited scale	HTTP access, metadata	Backups, data lakes, archives

Storage Characteristics

Characteristic	Definition	Measurement
IOPS	Input/Output Operations Per Second	Reads + Writes per second
Throughput	Data transfer rate	MB/s or GB/s
Latency	Time to complete I/O operation	Milliseconds or microseconds
Durability	Probability of data preservation	99.999999999% (11 9s)
Availability	Percentage time accessible	99.99%

Storage Tiers

Tier	Access Pattern	Cost	Examples
Hot/Standard	Frequent access	Highest	S3 Standard, Premium SSD
Warm/Infrequent	Monthly access	Medium	S3 IA, Cool Blob
Cold/Archive	Yearly access	Lowest	S3 Glacier, Archive

Security Concepts

Identity and Access Management

Term	Definition	Example
Principal	Entity requesting access	User, role, service
Authentication	Verifying identity	Password, MFA, certificate
Authorization	Determining permissions	IAM policy, RBAC
IAM Policy	Permission rules document	Allow/deny actions
Role	Set of permissions assumable by principals	EC2 instance role
MFA	Multi-Factor Authentication	Something you know + have

Security Principles

Principle	Description	Implementation
Least Privilege	Minimum necessary permissions	Specific IAM policies
Defense in Depth	Multiple security layers	WAF + SG + NACL
Zero Trust	Never trust, always verify	Continuous authentication
Separation of Duties	Divide critical functions	Different roles for different tasks

Encryption

Type	Description	Use
At Rest	Data encrypted in storage	EBS encryption, S3 SSE
In Transit	Data encrypted during transmission	TLS/SSL
Client-side	Encrypted before sending	Application handles
Server-side	Encrypted by service	Provider handles
KMS	Key Management Service	Central key management

Availability and Resilience Concepts

Availability Metrics

Term	Definition	Calculation
Availability	Percentage uptime	(Uptime / Total Time) × 100
Uptime	Time system is operational	Total time minus downtime
Downtime	Time system is unavailable	Planned + unplanned
MTBF	Mean Time Between Failures	Average time between failures
MTTR	Mean Time to Repair	Average repair time
MTTD	Mean Time to Detect	Average detection time

Availability Levels (“Nines”)

Availability	Uptime %	Downtime/Year	Downtime/Month	Downtime/Week
Two 9s	99%	3.65 days	7.31 hours	1.68 hours
Three 9s	99.9%	8.77 hours	43.83 minutes	10.08 minutes
Four 9s	99.99%	52.60 minutes	4.38 minutes	1.01 minutes
Five 9s	99.999%	5.26 minutes	26.30 seconds	6.05 seconds

Recovery Objectives

Term	Definition	Business Question
RTO	Recovery Time Objective	How long can we be down?
RPO	Recovery Point Objective	How much data can we lose?

Resilience Patterns

Pattern	Description	Implementation
Redundancy	Duplicate components	Multiple instances
Failover	Automatic switch to standby	Active-passive
Load Balancing	Distribute across instances	Active-active
Circuit Breaker	Prevent cascade failures	Fail fast, recover
Bulkhead	Isolate failures	Separate pools
Retry with Backoff	Retry failed operations	Exponential backoff

Key Takeaways

Infrastructure types (on-premises, cloud, hybrid, multi-cloud) each have distinct characteristics, advantages, and trade-offs that must be understood for effective architecture decisions
Cloud service models (IaaS, PaaS, SaaS, CaaS, FaaS) define the shared responsibility model—understanding what you manage versus what the provider manages is essential
Infrastructure as Code is foundational for modern infrastructure management, applying software engineering practices to infrastructure provisioning and configuration
Compute options span physical servers, virtual machines, containers, and serverless—each with appropriate use cases and trade-offs
Network concepts including VPCs, subnets, security groups, and load balancing are essential for designing secure, performant infrastructure
Storage types (block, file, object) serve different use cases with different characteristics for performance, durability, and cost
Security concepts including IAM, encryption, and security principles must be understood and applied throughout infrastructure design
Availability metrics and resilience patterns enable design of highly available, fault-tolerant systems

Summary

This chapter established the core concepts and terminology essential for infrastructure and platform management. These definitions form the foundation for communication, design decisions, and operational practices throughout the infrastructure lifecycle.

The infrastructure landscape has evolved significantly, with cloud computing introducing new service models that shift responsibilities between customers and providers. Understanding where on-premises infrastructure ends and cloud begins—and the spectrum of hybrid and multi-cloud options in between—enables architects to select the right deployment model for each workload.

Infrastructure as Code represents a paradigm shift from manual, ticket-based provisioning to software-defined infrastructure. The principles of declarative definition, version control, idempotency, and testing transform infrastructure from a bottleneck to an enabler of rapid delivery.

The diversity of compute options—from bare metal servers to serverless functions—provides flexibility to match workloads to the most appropriate execution environment. Similarly, the range of storage and networking options enables optimized solutions for different requirements.

Security concepts must permeate all infrastructure decisions. The principles of least privilege, defense in depth, and zero trust guide secure infrastructure design, while encryption and identity management provide the technical controls.

Finally, availability and resilience concepts enable the design of systems that meet business requirements for uptime and recovery. Understanding metrics like RTO, RPO, and the availability “nines” translates business requirements into technical specifications.

The concepts introduced in this chapter will be referenced and applied throughout subsequent chapters as we explore architecture, implementation, operations, and governance of infrastructure and platforms.

Review Questions

Service Model Selection: An organization is considering deploying a new application. Compare and contrast IaaS, PaaS, and CaaS for this use case. What factors would influence the choice?
IaC Principles: Explain the principle of idempotency in Infrastructure as Code. Why is it important, and how do tools like Terraform achieve it?
Container vs. VM: Your team is debating whether to containerize an application or deploy it on VMs. What are the key factors to consider, and when would you choose each approach?
Hybrid Cloud Design: Design a hybrid cloud architecture for an organization that needs to keep sensitive customer data on-premises while leveraging cloud for web-facing applications. What connectivity options would you recommend?
Availability Calculation: A business requires 99.95% availability for a critical application. Calculate the maximum allowable downtime per year and per month. What infrastructure design patterns would help achieve this?
Storage Selection: You need to design storage for three workloads: a relational database, a shared file system for multiple servers, and a data lake for analytics. What storage type would you recommend for each and why?
Security Layers: Explain the defense-in-depth approach to infrastructure security. Describe at least four layers of security controls and how they work together.
IaC Tool Selection: Your organization is standardizing on IaC. Compare Terraform and CloudFormation for a multi-cloud environment. What are the key considerations?

Previous	Next
Chapter 1: Introduction	Chapter 3: Strategic Framework