Chapter 2: Core Concepts and Definitions

Learning Objectives

After completing this chapter, you will be able to:

  • Define key infrastructure and platform management terminology with precision
  • Distinguish between different infrastructure types and deployment models
  • Understand cloud service models (IaaS, PaaS, SaaS, FaaS, CaaS) and their implications
  • Explain Infrastructure as Code principles, benefits, and implementation approaches
  • Recognize the components of modern infrastructure stacks across compute, network, storage, and security
  • Apply consistent terminology throughout infrastructure discussions and documentation
  • Understand the shared responsibility model across different service models
  • Differentiate between traditional and cloud-native infrastructure approaches

Introduction

Effective communication about infrastructure requires a shared vocabulary. Misunderstandings about fundamental concepts lead to poor decisions, failed projects, and wasted resources. This chapter establishes the core concepts and definitions used throughout this handbook and in the broader infrastructure management discipline.

The terminology of infrastructure has evolved significantly over the past decade. Terms like “server” now encompass physical machines, virtual machines, containers, and serverless functions. “Storage” spans local disks, SANs, object storage, and distributed file systems. Understanding these concepts in their modern context is essential for architects, engineers, and managers who need to communicate clearly about infrastructure decisions, designs, and operations.

This chapter serves as both a learning resource for those new to modern infrastructure and a reference for experienced practitioners who need precise definitions. The concepts introduced here form the foundation for all subsequent chapters.


Infrastructure Types and Deployment Models

On-Premises Infrastructure

On-premises infrastructure refers to IT infrastructure owned, operated, and housed within an organization’s own facilities. This traditional model gives organizations complete control over their hardware, software, and data.

CharacteristicDescriptionImplications
OwnershipOrganization owns all hardware and facilitiesFull capital expenditure, asset management required
ControlComplete control over all aspects of infrastructureMaximum flexibility, maximum responsibility
Capital ModelCapital expenditure (CapEx) for hardware acquisitionLarge upfront investment, depreciation over time
ResponsibilityFull stack responsibility from facilities to applicationsRequires diverse skill sets, 24x7 operations capability
FlexibilityLimited by physical capacity and procurement cyclesCapacity planning critical, scaling takes time
LocationPhysical data center owned or leasedGeographic constraints, physical security requirements

Advantages of On-Premises Infrastructure:

AdvantageDescriptionBest For
Complete controlFull authority over hardware, software, and configurationOrganizations with strict control requirements
Data sovereigntyData remains within organizational boundariesCompliance with data residency regulations
No provider dependencyNo reliance on external cloud providersOrganizations concerned about vendor lock-in
Predictable costsFixed costs after initial investment (excluding maintenance)Stable, predictable workloads
CustomizationAbility to customize hardware and software to exact specificationsSpecialized workloads with unique requirements
Network performanceLow latency for on-premises applicationsLatency-sensitive applications

Challenges of On-Premises Infrastructure:

ChallengeDescriptionMitigation
High upfront investmentSignificant capital required before deploymentLeasing options, phased deployment
Capacity planningMust predict future needs accuratelyConservative planning, modular expansion
Hardware lifecycleRegular refresh cycles required (typically 3-5 years)Lifecycle management, technology roadmap
Facility requirementsPower, cooling, physical security, spaceColocation as alternative
Scaling limitationsAdding capacity requires procurement and installationHybrid cloud for burst capacity
Operational overhead24x7 operations, maintenance, patchingManaged services, automation

Cloud Infrastructure

Cloud infrastructure refers to IT resources delivered as services by third-party providers over the internet. Cloud computing fundamentally changes the economics and operations of infrastructure.

CharacteristicDescriptionImplications
OwnershipProvider owns and maintains all infrastructureNo hardware management, provider handles facilities
ControlShared responsibility model based on service typeLess control, but less responsibility
Capital ModelOperational expenditure (OpEx), pay-as-you-goVariable costs, potential for optimization
ResponsibilityVaries by service model (IaaS, PaaS, SaaS)Different skill requirements per model
FlexibilityElastic capacity, on-demand provisioningScale up/down based on demand
LocationGlobal regions and availability zonesGeographic distribution, data residency options

Cloud Deployment Models:

ModelDescriptionCharacteristicsUse Cases
Public CloudShared infrastructure, multi-tenant environmentLowest cost, highest scalability, shared resourcesGeneral workloads, variable demand, startups
Private CloudDedicated infrastructure for single organizationMore control, dedicated resources, higher costCompliance requirements, security-sensitive workloads
Hybrid CloudCombination of on-premises and public cloudFlexibility, workload placement optionsMixed requirements, cloud migration journey
Multi-CloudMultiple public cloud providersAvoid lock-in, best-of-breed servicesRisk distribution, specialized services
Community CloudShared infrastructure for specific communityShared costs, common compliance requirementsGovernment, healthcare, research

Public Cloud Characteristics:

PUBLIC CLOUD MODEL

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CLOUD PROVIDER INFRASTRUCTURE                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                        SHARED INFRASTRUCTURE                         │   │
│   │                                                                      │   │
│   │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │   │
│   │   │ Customer │    │ Customer │    │ Customer │    │ Customer │    │   │
│   │   │    A     │    │    B     │    │    C     │    │    D     │    │   │
│   │   │  (Your   │    │          │    │          │    │          │    │   │
│   │   │   Org)   │    │          │    │          │    │          │    │   │
│   │   └──────────┘    └──────────┘    └──────────┘    └──────────┘    │   │
│   │                                                                      │   │
│   │   ┌──────────────────────────────────────────────────────────────┐ │   │
│   │   │              HYPERVISOR / CONTAINER RUNTIME                   │ │   │
│   │   └──────────────────────────────────────────────────────────────┘ │   │
│   │                                                                      │   │
│   │   ┌──────────────────────────────────────────────────────────────┐ │   │
│   │   │           PHYSICAL SERVERS, STORAGE, NETWORK                  │ │   │
│   │   └──────────────────────────────────────────────────────────────┘ │   │
│   │                                                                      │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   PROVIDER RESPONSIBILITIES:                                                 │
│   • Physical security        • Hardware maintenance    • Network backbone   │
│   • Power and cooling        • Hypervisor security     • Service APIs       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Hybrid Infrastructure

Hybrid infrastructure combines on-premises and cloud resources, enabling organizations to place workloads in the optimal location based on requirements.

HYBRID INFRASTRUCTURE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                         HYBRID INFRASTRUCTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────┐       ┌─────────────────────────────┐     │
│  │       ON-PREMISES           │       │        PUBLIC CLOUD          │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Core Systems       │   │       │   │  Cloud-Native Apps  │   │     │
│  │  │  • ERP              │   │       │   │  • Web applications │   │     │
│  │  │  • Core databases   │   │       │   │  • Mobile backends  │   │     │
│  │  │  • Legacy apps      │   │       │   │  • APIs             │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Sensitive Data     │   │       │   │  Dev/Test           │   │     │
│  │  │  • PII              │   │       │   │  • Development      │   │     │
│  │  │  • Financial        │   │       │   │  • Testing          │   │     │
│  │  │  • Regulated        │   │       │   │  • Staging          │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  │  ┌─────────────────────┐   │       │   ┌─────────────────────┐   │     │
│  │  │  Compliance         │   │       │   │  Burst Capacity     │   │     │
│  │  │  Workloads          │   │       │   │  • Peak loads       │   │     │
│  │  │                     │   │       │   │  • Seasonal demand  │   │     │
│  │  └─────────────────────┘   │       │   └─────────────────────┘   │     │
│  │                             │       │                              │     │
│  └──────────────┬──────────────┘       └──────────────┬──────────────┘     │
│                 │                                      │                    │
│                 │     HYBRID CONNECTIVITY              │                    │
│                 │                                      │                    │
│                 │  ┌────────────────────────────────┐ │                    │
│                 └──┤  VPN / Direct Connect /        ├─┘                    │
│                    │  ExpressRoute / Interconnect   │                      │
│                    └────────────────────────────────┘                      │
│                                                                              │
│  UNIFIED MANAGEMENT:                                                        │
│  • Single pane of glass    • Consistent policies    • Integrated security  │
│  • Unified monitoring      • Common automation      • Shared identity      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Hybrid Cloud Connectivity Options:

Connection TypeDescriptionBandwidthLatencyCostUse Case
Site-to-Site VPNEncrypted tunnel over internetUp to 1 GbpsVariableLowDevelopment, non-critical
Direct Connect (AWS)Dedicated private connection1-100 GbpsLow, consistentHighProduction workloads
ExpressRoute (Azure)Dedicated private connection50 Mbps-10 GbpsLow, consistentHighProduction workloads
Cloud Interconnect (GCP)Dedicated private connection10-100 GbpsLow, consistentHighProduction workloads
SD-WANSoftware-defined WANVariableOptimizedMediumDistributed offices

Multi-Cloud Strategy

Multi-cloud refers to the use of multiple cloud providers, either for different workloads or for redundancy.

StrategyDescriptionBenefitsChallenges
Best of BreedUse best service from each providerOptimal capabilitiesManagement complexity
Risk DistributionSpread workloads across providersReduced provider dependencyCost, complexity
Geographic CoverageUse providers with specific regional presenceGlobal reachData synchronization
Cost OptimizationLeverage competitive pricingCost savingsComparison complexity
ComplianceMeet specific regulatory requirementsCompliance achievementPolicy consistency

Multi-Cloud Architecture:

MULTI-CLOUD ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                           MULTI-CLOUD STRATEGY                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐          │
│  │       AWS        │  │      AZURE       │  │       GCP        │          │
│  │                  │  │                  │  │                  │          │
│  │  • EC2/EKS       │  │  • AKS           │  │  • BigQuery      │          │
│  │  • S3            │  │  • Azure AD      │  │  • ML/AI         │          │
│  │  • Lambda        │  │  • Office 365    │  │  • Anthos        │          │
│  │  • RDS           │  │  • Cosmos DB     │  │  • Cloud Run     │          │
│  │                  │  │                  │  │                  │          │
│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘          │
│           │                     │                     │                     │
│           └─────────────────────┼─────────────────────┘                     │
│                                 │                                           │
│                    ┌────────────┴────────────┐                              │
│                    │   MULTI-CLOUD LAYER     │                              │
│                    │                         │                              │
│                    │  • Terraform (IaC)      │                              │
│                    │  • Kubernetes (K8s)     │                              │
│                    │  • Service Mesh         │                              │
│                    │  • Identity Federation  │                              │
│                    │  • Observability        │                              │
│                    │  • Cost Management      │                              │
│                    └─────────────────────────┘                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cloud Service Models

Infrastructure as a Service (IaaS)

IaaS provides virtualized computing resources over the internet. Users manage operating systems, applications, and data while the provider manages physical infrastructure.

AspectDescriptionExamples
What You GetVirtual machines, storage volumes, virtual networksEC2, Azure VMs, GCE
What You ManageOS, middleware, runtime, applications, dataFull stack from OS up
Provider ManagesHardware, hypervisor, physical network, facilitiesPhysical infrastructure
Pricing ModelPer hour/second for compute, per GB for storageOn-demand, reserved, spot

IaaS Characteristics:

CharacteristicDescriptionConsideration
Maximum ControlFull control over OS and everything aboveRequires more operational capability
Maximum FlexibilityAny OS, any software, any configurationMore decisions to make
Operating System AccessFull root/admin accessSecurity responsibility
Patching ResponsibilityCustomer responsible for OS and application patchingOperational overhead
ScalingManual or auto-scaling of VM instancesMust configure scaling
Cost ModelPay for allocated resources, even if underutilizedRight-sizing important

IaaS Use Cases:

Use CaseDescriptionWhy IaaS
Lift and shift migrationMoving existing applications to cloudMinimal application changes
Development and testingCreating dev/test environmentsQuick provisioning, cost control
High-performance computingScientific computing, renderingAccess to powerful hardware
Big data analysisProcessing large datasetsScalable compute resources
Web hostingRunning web serversFull control over configuration
Disaster recoveryDR site in cloudCost-effective standby

Platform as a Service (PaaS)

PaaS provides a platform for developing, running, and managing applications without the complexity of managing underlying infrastructure.

AspectDescriptionExamples
What You GetRuntime environment, development tools, middlewareApp Service, Elastic Beanstalk, Heroku
What You ManageApplication code, dataFocus on application logic
Provider ManagesInfrastructure, OS, runtime, middlewareEverything below application
Pricing ModelPer application, per instance, per requestUsage-based

PaaS Characteristics:

CharacteristicDescriptionConsideration
Reduced ComplexityNo OS management, automatic patchingLess operational overhead
Faster DevelopmentFocus on code, not infrastructureAccelerated development
Limited ControlConstrained to platform capabilitiesMay not fit all applications
Vendor Lock-inApplications may become platform-specificPortability concerns
Built-in ScalingAutomatic scaling based on demandLess configuration required
Cost EfficiencyPay for actual usageBetter for variable workloads

PaaS Use Cases:

Use CaseDescriptionWhy PaaS
Web applicationsModern web appsRapid deployment, auto-scaling
API developmentBuilding and hosting APIsManaged infrastructure
MicroservicesDeploying microservice architecturesContainer orchestration included
Mobile backendsBackend services for mobile appsPre-built services
DevOps automationCI/CD pipelinesIntegrated tooling

Software as a Service (SaaS)

SaaS delivers complete applications over the internet, fully managed by the provider.

AspectDescriptionExamples
What You GetComplete application, accessible via browser/APIOffice 365, Salesforce, Workday
What You ManageUser configuration, dataMinimal technical management
Provider ManagesEverything: infrastructure, application, updatesFull stack responsibility
Pricing ModelPer user, per month, per feature tierSubscription-based

Containers as a Service (CaaS)

CaaS provides container orchestration and management as a service.

AspectDescriptionExamples
What You GetManaged container orchestration platformEKS, AKS, GKE
What You ManageContainer images, deployments, configurationsApplication containers
Provider ManagesControl plane, node infrastructure, networkingKubernetes infrastructure
Pricing ModelPer cluster, per node, per requestVaries by provider

Functions as a Service (FaaS) / Serverless

FaaS provides event-driven compute without server management.

AspectDescriptionExamples
What You GetEvent-triggered function executionLambda, Azure Functions, Cloud Functions
What You ManageFunction code, triggersJust the code
Provider ManagesEverything else: scaling, infrastructureFull infrastructure abstraction
Pricing ModelPer invocation, per duration, per memoryPay only for execution

FaaS Characteristics:

CharacteristicDescriptionConsideration
No Server ManagementZero infrastructure to manageMaximum abstraction
Automatic ScalingScales from zero to thousandsBuilt-in elasticity
Pay Per ExecutionOnly pay when code runsCost-effective for variable loads
Cold StartInitial invocation latencyNot suitable for all workloads
Time LimitsMaximum execution durationNot for long-running processes
StatelessNo persistent local stateMust use external storage

Database as a Service (DBaaS)

DBaaS provides managed database services.

AspectDescriptionExamples
What You GetManaged database engineRDS, Azure SQL, Cloud SQL, MongoDB Atlas
What You ManageSchema, queries, data, some configurationDatabase design and usage
Provider ManagesHardware, patching, backups, HA, scalingDatabase operations
Pricing ModelPer instance, per storage, per IOPSResource-based

Service Model Comparison

SERVICE MODEL RESPONSIBILITY COMPARISON

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SHARED RESPONSIBILITY MODEL                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│           ON-PREM      IaaS        PaaS        CaaS       SaaS              │
│         ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐         │
│         │  YOU   │  │  YOU   │  │  YOU   │  │  YOU   │  │  YOU   │         │
│         │ MANAGE │  │ MANAGE │  │ MANAGE │  │ MANAGE │  │ MANAGE │         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  ├────────┤         │
│ DATA    │████████│  │████████│  │████████│  │████████│  │████████│         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  ├────────┤         │
│ APP     │████████│  │████████│  │████████│  │████████│  │        │         │
│         ├────────┤  ├────────┤  ├────────┤  ├────────┤  │PROVIDER│         │
│ RUNTIME │████████│  │████████│  │        │  │        │  │ MANAGES│         │
│         ├────────┤  ├────────┤  │PROVIDER│  │PROVIDER│  │        │         │
│ O/S     │████████│  │████████│  │ MANAGES│  │ MANAGES│  │        │         │
│         ├────────┤  ├────────┤  │        │  │        │  │        │         │
│ VIRTUAL │████████│  │        │  │        │  │        │  │        │         │
│         ├────────┤  │PROVIDER│  │        │  │        │  │        │         │
│ SERVER  │████████│  │ MANAGES│  │        │  │        │  │        │         │
│         ├────────┤  │        │  │        │  │        │  │        │         │
│ STORAGE │████████│  │        │  │        │  │        │  │        │         │
│         ├────────┤  │        │  │        │  │        │  │        │         │
│ NETWORK │████████│  │        │  │        │  │        │  │        │         │
│         └────────┘  └────────┘  └────────┘  └────────┘  └────────┘         │
│                                                                              │
│  █ = Customer Responsibility     (blank) = Provider Responsibility          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Infrastructure as Code (IaC)

Definition and Principles

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes or interactive configuration tools.

Formal Definition: Infrastructure as Code is an approach to infrastructure automation that applies software engineering practices—including version control, code review, testing, and continuous integration—to infrastructure management.

Core Principles of IaC

PrincipleDescriptionImplementationBenefit
Declarative DefinitionDefine desired end state, not procedural stepsTerraform HCL, CloudFormation YAMLSystem figures out how to achieve state
Version ControlAll infrastructure code stored in GitGitHub, GitLab, BitbucketHistory, collaboration, rollback
IdempotencyApplying code multiple times produces same resultTerraform plan/applySafe to re-run, predictable
ModularityReusable, composable componentsTerraform modules, CloudFormation nested stacksDRY principle, consistency
ImmutabilityReplace rather than modify in placeBlue-green deployments, AMI-basedReduced drift, predictable state
TestingValidate infrastructure before deploymentTerratest, Kitchen-TerraformCatch errors early
Documentation as CodeCode serves as documentationSelf-documenting infrastructureAlways current documentation

Declarative vs. Imperative Approaches

ApproachDescriptionExampleBest For
DeclarativeDefine desired state, system determines howTerraform, CloudFormationMost infrastructure provisioning
ImperativeDefine exact steps to executeAnsible playbooks, shell scriptsConfiguration, orchestration
HybridDeclarative structure with imperative elementsPulumi, CDKComplex logic requirements

Declarative Example (Terraform):

# Terraform declares WHAT should exist
resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"

  tags = {
    Name        = "web-server-prod"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_security_group" "web" {
  name        = "web-sg"
  description = "Security group for web servers"

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Imperative Example (Ansible):

# Ansible defines HOW to configure
- name: Configure web server
  hosts: web_servers
  become: yes

  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
        update_cache: yes

    - name: Copy configuration file
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: Restart nginx

    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: Restart nginx
      service:
        name: nginx
        state: restarted

IaC Benefits

BenefitDescriptionQuantified Impact
ConsistencyIdentical infrastructure across environmentsZero configuration drift
SpeedProvision infrastructure in minutes100x faster than manual
RepeatabilityRecreate infrastructure reliablyDisaster recovery capability
AuditabilityGit history captures all changesComplete change history
CollaborationTeams can review and contributeReduced errors through review
Disaster RecoveryRebuild infrastructure from codeHours instead of days
Cost ReductionEliminate manual errors, reduce rework40-60% operational savings
ComplianceEnforce standards through codeAutomated compliance checks

IaC Tool Categories

CategoryPurposeToolsWhen to Use
ProvisioningCreate cloud resourcesTerraform, CloudFormation, Pulumi, CDKCreating infrastructure
Configuration ManagementConfigure systemsAnsible, Puppet, Chef, SaltConfiguring servers
Container OrchestrationManage containersKubernetes, Docker Swarm, NomadRunning containerized apps
Image BuildingCreate machine imagesPacker, EC2 Image BuilderCreating golden images
Secret ManagementManage secretsVault, AWS Secrets ManagerHandling credentials

IaC Tool Comparison

ToolTypeLanguageState ManagementCloud SupportLearning Curve
TerraformProvisioningHCLRemote stateMulti-cloudMedium
CloudFormationProvisioningYAML/JSONAWS managedAWS onlyMedium
PulumiProvisioningPython/TS/GoRemote stateMulti-cloudMedium-High
CDKProvisioningPython/TSCloudFormationAWS primaryMedium-High
AnsibleConfig MgmtYAMLStatelessMulti-cloudLow
PuppetConfig MgmtPuppet DSLPuppet serverMulti-cloudHigh
ChefConfig MgmtRubyChef serverMulti-cloudHigh

IaC Workflow

IaC DEVELOPMENT AND DEPLOYMENT WORKFLOW

┌─────────────────────────────────────────────────────────────────────────────┐
│                        INFRASTRUCTURE AS CODE WORKFLOW                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEVELOPMENT PHASE                                                           │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │ Write  │───►│ Lint   │───►│ Format │───►│ Local  │               │ │
│  │  │  Code  │    │        │    │        │    │  Test  │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │                                          │                     │ │
│  │      │  IDE/Editor                              │  terraform plan     │ │
│  │      │  tflint, checkov                         │  terratest          │ │
│  │      │  terraform fmt                           │  kitchen-terraform  │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  VERSION CONTROL PHASE                                                      │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │ Commit │───►│  Push  │───►│ Pull   │───►│ Review │               │ │
│  │  │        │    │        │    │ Request│    │        │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │                            │              │                    │ │
│  │      │  git commit                │  CI triggers │  Peer review       │ │
│  │      │  Conventional commits      │  PR created  │  Security review   │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  CI/CD PHASE                                                                │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐               │ │
│  │  │Validate│───►│Security│───►│ Plan   │───►│ Apply  │               │ │
│  │  │        │    │  Scan  │    │        │    │        │               │ │
│  │  └────────┘    └────────┘    └────────┘    └────────┘               │ │
│  │      │              │              │              │                   │ │
│  │      │  terraform   │  checkov    │  terraform   │  terraform        │ │
│  │      │  validate    │  tfsec      │  plan        │  apply            │ │
│  │      │              │  Snyk       │  (saved)     │  (auto/manual)    │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                    │                                        │
│                                    ▼                                        │
│  POST-DEPLOYMENT                                                            │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  ┌────────┐    ┌────────┐    ┌────────┐                              │ │
│  │  │ Verify │───►│Document│───►│Monitor │                              │ │
│  │  │        │    │        │    │        │                              │ │
│  │  └────────┘    └────────┘    └────────┘                              │ │
│  │      │              │              │                                  │ │
│  │      │  Smoke tests │  Update CMDB │  Drift detection                │ │
│  │      │  Health check│  Release note│  Cost monitoring                │ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

State Management

IaC tools that manage state need careful consideration of how state is stored and accessed.

State AspectDescriptionBest Practice
Remote StateStore state in shared locationS3 + DynamoDB, Azure Blob, GCS
State LockingPrevent concurrent modificationsEnable locking mechanism
State EncryptionProtect sensitive data in stateEnable encryption at rest
State BackupRegular state backupsVersioning on state bucket
State SeparationSeparate state per environmentDifferent state files per env

Compute Concepts

Physical Servers

Physical servers are dedicated hardware machines running workloads directly on hardware without virtualization.

TermDefinitionUse Case
Bare MetalPhysical server without hypervisorHigh-performance, licensing, specific hardware
Rack ServerServer designed for 19-inch rack mountingData center deployments
Blade ServerCompact server in blade chassisHigh-density environments
Tower ServerStandalone server unitSmall offices, edge locations
Hyperconverged Infrastructure (HCI)Integrated compute, storage, networkingSimplified management

Virtual Machines

Virtual machines (VMs) are software-based computers running on a hypervisor that abstracts physical hardware.

TermDefinitionExamples
HypervisorSoftware layer enabling virtualizationVMware ESXi, Hyper-V, KVM, Xen
Type 1 HypervisorRuns directly on hardware (bare metal)ESXi, Hyper-V, Xen
Type 2 HypervisorRuns on host operating systemVirtualBox, VMware Workstation
Guest OSOperating system running inside VMWindows, Linux within VM
Host OSOperating system running the hypervisorESXi, Windows Server
vCPUVirtual CPU allocated to VM2 vCPU, 4 vCPU
VM TemplatePre-configured VM image for cloningGolden images
SnapshotPoint-in-time capture of VM stateBackup before changes
Live MigrationMoving running VM between hostsvMotion, Live Migration

Containers

Containers are lightweight, portable units that package applications with their dependencies, sharing the host OS kernel.

TermDefinitionExamples
Container ImagePackaged application with all dependenciesDocker image, OCI image
Container RuntimeSoftware that runs containersDocker, containerd, CRI-O
Container RegistryRepository for storing container imagesDocker Hub, ECR, ACR, GCR
DockerfileInstructions for building container imageBuild recipe
LayerRead-only filesystem layer in imageImage composition
ContainerRunning instance of container imageIsolated process

Container vs VM Comparison:

AspectContainerVirtual Machine
IsolationProcess-level (shared kernel)Hardware-level (separate kernel)
SizeMegabytes (10s-100s MB)Gigabytes (GBs)
Startup TimeSecondsMinutes
Resource OverheadMinimalSignificant (OS per VM)
DensityHundreds per hostTens per host
PortabilityVery high (image-based)Medium (hypervisor-dependent)
Security IsolationWeaker (shared kernel)Stronger (separate kernel)
CONTAINER VS VIRTUAL MACHINE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER VS VM ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  CONTAINERS                              VIRTUAL MACHINES                    │
│  ┌─────────────────────────────┐        ┌─────────────────────────────┐    │
│  │┌─────┐ ┌─────┐ ┌─────┐     │        │┌─────────┐ ┌─────────┐      │    │
│  ││App A│ │App B│ │App C│     │        ││  App A  │ │  App B  │      │    │
│  │├─────┤ ├─────┤ ├─────┤     │        │├─────────┤ ├─────────┤      │    │
│  ││Bins/│ │Bins/│ │Bins/│     │        ││ Bins/   │ │ Bins/   │      │    │
│  ││Libs │ │Libs │ │Libs │     │        ││ Libs    │ │ Libs    │      │    │
│  │└─────┘ └─────┘ └─────┘     │        │├─────────┤ ├─────────┤      │    │
│  │                            │        ││Guest OS │ │Guest OS │      │    │
│  │┌──────────────────────────┐│        │└─────────┘ └─────────┘      │    │
│  ││    Container Runtime     ││        │┌───────────────────────────┐│    │
│  │└──────────────────────────┘│        ││       HYPERVISOR          ││    │
│  │┌──────────────────────────┐│        │└───────────────────────────┘│    │
│  ││     Host Operating       ││        │┌───────────────────────────┐│    │
│  ││        System            ││        ││   Host Operating System   ││    │
│  │└──────────────────────────┘│        │└───────────────────────────┘│    │
│  │┌──────────────────────────┐│        │┌───────────────────────────┐│    │
│  ││       HARDWARE           ││        ││       HARDWARE            ││    │
│  │└──────────────────────────┘│        │└───────────────────────────┘│    │
│  └─────────────────────────────┘        └─────────────────────────────┘    │
│                                                                              │
│  PROS:                                   PROS:                               │
│  • Fast startup (seconds)               • Strong isolation                  │
│  • Lightweight (MBs)                    • Any OS on any host                │
│  • High density                         • Mature technology                 │
│  • Portable across platforms            • Better security boundaries        │
│                                                                              │
│  CONS:                                   CONS:                               │
│  • Shared kernel (security)             • Slow startup (minutes)            │
│  • Linux-centric (mostly)               • Heavy (GBs)                       │
│  • Newer technology                     • Lower density                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Container Orchestration (Kubernetes)

Kubernetes is the de facto standard for container orchestration, managing containerized applications at scale.

TermDefinitionPurpose
PodSmallest deployable unit (one or more containers)Unit of deployment
DeploymentManages replica sets and pod lifecyclesDeclarative updates
ServiceStable network endpoint for podsService discovery
NamespaceVirtual cluster within clusterIsolation, organization
ConfigMapNon-sensitive configuration dataExternal configuration
SecretSensitive configuration dataCredentials, keys
IngressHTTP/HTTPS routing to servicesExternal access
PersistentVolumeStorage abstractionPersistent data
StatefulSetManages stateful applicationsDatabases, queues
DaemonSetEnsures pod runs on all/some nodesLogging, monitoring agents

Serverless Computing

Serverless computing abstracts infrastructure completely, allowing developers to focus solely on code.

TermDefinitionExamples
FunctionSingle-purpose code unitLambda function
Event TriggerWhat initiates function executionHTTP request, queue message, schedule
Cold StartInitial invocation latency when function not warmFirst request delay
Warm StartSubsequent fast invocationsReused container
ConcurrencyNumber of parallel function executionsScaling unit
Provisioned ConcurrencyPre-warmed function instancesEliminate cold starts

Network Concepts

Network Fundamentals

TermDefinitionExample
IP AddressUnique identifier for network device192.168.1.100, 10.0.0.1
CIDRIP address allocation notation10.0.0.0/16 (65,536 addresses)
SubnetSegment of larger network10.0.1.0/24 (256 addresses)
GatewayEntry/exit point for networkRouter IP address
DNSDomain Name SystemTranslates names to IPs
DHCPDynamic Host Configuration ProtocolAutomatic IP assignment

Cloud Networking

TermDefinitionExamples
VPC/VNetVirtual private cloud/networkIsolated network in cloud
SubnetNetwork segment within VPCPublic subnet, private subnet
Route TableNetwork routing rulesDirect traffic to destinations
Internet GatewayProvides internet access to VPCPublic subnet connectivity
NAT GatewayEnables private subnet internet accessOutbound only access
VPC PeeringDirect connection between VPCsInter-VPC communication
Transit GatewayHub for connecting multiple networksNetwork hub
PrivateLink/EndpointPrivate connectivity to servicesAvoid internet for AWS services

Network Security

TermDefinitionLevel
Security GroupVirtual firewall for instancesInstance level
NACLNetwork Access Control ListSubnet level
WAFWeb Application FirewallApplication level
FirewallNetwork traffic filterNetwork perimeter
IDS/IPSIntrusion Detection/PreventionNetwork monitoring

Load Balancing

TermDefinitionUse Case
Layer 4 (L4) Load BalancerRoutes based on IP and portTCP/UDP traffic
Layer 7 (L7) Load BalancerRoutes based on application dataHTTP/HTTPS traffic
Health CheckVerifies target availabilityRemove unhealthy targets
Target GroupCollection of targetsGroup of instances
ListenerChecks for connection requestsPort and protocol
AlgorithmHow traffic is distributedRound-robin, least connections

Storage Concepts

Storage Types

TypeDescriptionCharacteristicsUse Cases
Block StorageFixed-size blocks, formatted as filesystemLow latency, high IOPSDatabases, boot volumes
File StorageHierarchical file system, shared accessPOSIX compliant, concurrent accessShared data, legacy apps
Object StorageFlat structure, unlimited scaleHTTP access, metadataBackups, data lakes, archives

Storage Characteristics

CharacteristicDefinitionMeasurement
IOPSInput/Output Operations Per SecondReads + Writes per second
ThroughputData transfer rateMB/s or GB/s
LatencyTime to complete I/O operationMilliseconds or microseconds
DurabilityProbability of data preservation99.999999999% (11 9s)
AvailabilityPercentage time accessible99.99%

Storage Tiers

TierAccess PatternCostExamples
Hot/StandardFrequent accessHighestS3 Standard, Premium SSD
Warm/InfrequentMonthly accessMediumS3 IA, Cool Blob
Cold/ArchiveYearly accessLowestS3 Glacier, Archive

Security Concepts

Identity and Access Management

TermDefinitionExample
PrincipalEntity requesting accessUser, role, service
AuthenticationVerifying identityPassword, MFA, certificate
AuthorizationDetermining permissionsIAM policy, RBAC
IAM PolicyPermission rules documentAllow/deny actions
RoleSet of permissions assumable by principalsEC2 instance role
MFAMulti-Factor AuthenticationSomething you know + have

Security Principles

PrincipleDescriptionImplementation
Least PrivilegeMinimum necessary permissionsSpecific IAM policies
Defense in DepthMultiple security layersWAF + SG + NACL
Zero TrustNever trust, always verifyContinuous authentication
Separation of DutiesDivide critical functionsDifferent roles for different tasks

Encryption

TypeDescriptionUse
At RestData encrypted in storageEBS encryption, S3 SSE
In TransitData encrypted during transmissionTLS/SSL
Client-sideEncrypted before sendingApplication handles
Server-sideEncrypted by serviceProvider handles
KMSKey Management ServiceCentral key management

Availability and Resilience Concepts

Availability Metrics

TermDefinitionCalculation
AvailabilityPercentage uptime(Uptime / Total Time) × 100
UptimeTime system is operationalTotal time minus downtime
DowntimeTime system is unavailablePlanned + unplanned
MTBFMean Time Between FailuresAverage time between failures
MTTRMean Time to RepairAverage repair time
MTTDMean Time to DetectAverage detection time

Availability Levels (“Nines”)

AvailabilityUptime %Downtime/YearDowntime/MonthDowntime/Week
Two 9s99%3.65 days7.31 hours1.68 hours
Three 9s99.9%8.77 hours43.83 minutes10.08 minutes
Four 9s99.99%52.60 minutes4.38 minutes1.01 minutes
Five 9s99.999%5.26 minutes26.30 seconds6.05 seconds

Recovery Objectives

TermDefinitionBusiness Question
RTORecovery Time ObjectiveHow long can we be down?
RPORecovery Point ObjectiveHow much data can we lose?

Resilience Patterns

PatternDescriptionImplementation
RedundancyDuplicate componentsMultiple instances
FailoverAutomatic switch to standbyActive-passive
Load BalancingDistribute across instancesActive-active
Circuit BreakerPrevent cascade failuresFail fast, recover
BulkheadIsolate failuresSeparate pools
Retry with BackoffRetry failed operationsExponential backoff

Key Takeaways

  • Infrastructure types (on-premises, cloud, hybrid, multi-cloud) each have distinct characteristics, advantages, and trade-offs that must be understood for effective architecture decisions
  • Cloud service models (IaaS, PaaS, SaaS, CaaS, FaaS) define the shared responsibility model—understanding what you manage versus what the provider manages is essential
  • Infrastructure as Code is foundational for modern infrastructure management, applying software engineering practices to infrastructure provisioning and configuration
  • Compute options span physical servers, virtual machines, containers, and serverless—each with appropriate use cases and trade-offs
  • Network concepts including VPCs, subnets, security groups, and load balancing are essential for designing secure, performant infrastructure
  • Storage types (block, file, object) serve different use cases with different characteristics for performance, durability, and cost
  • Security concepts including IAM, encryption, and security principles must be understood and applied throughout infrastructure design
  • Availability metrics and resilience patterns enable design of highly available, fault-tolerant systems

Summary

This chapter established the core concepts and terminology essential for infrastructure and platform management. These definitions form the foundation for communication, design decisions, and operational practices throughout the infrastructure lifecycle.

The infrastructure landscape has evolved significantly, with cloud computing introducing new service models that shift responsibilities between customers and providers. Understanding where on-premises infrastructure ends and cloud begins—and the spectrum of hybrid and multi-cloud options in between—enables architects to select the right deployment model for each workload.

Infrastructure as Code represents a paradigm shift from manual, ticket-based provisioning to software-defined infrastructure. The principles of declarative definition, version control, idempotency, and testing transform infrastructure from a bottleneck to an enabler of rapid delivery.

The diversity of compute options—from bare metal servers to serverless functions—provides flexibility to match workloads to the most appropriate execution environment. Similarly, the range of storage and networking options enables optimized solutions for different requirements.

Security concepts must permeate all infrastructure decisions. The principles of least privilege, defense in depth, and zero trust guide secure infrastructure design, while encryption and identity management provide the technical controls.

Finally, availability and resilience concepts enable the design of systems that meet business requirements for uptime and recovery. Understanding metrics like RTO, RPO, and the availability “nines” translates business requirements into technical specifications.

The concepts introduced in this chapter will be referenced and applied throughout subsequent chapters as we explore architecture, implementation, operations, and governance of infrastructure and platforms.


Review Questions

  1. Service Model Selection: An organization is considering deploying a new application. Compare and contrast IaaS, PaaS, and CaaS for this use case. What factors would influence the choice?

  2. IaC Principles: Explain the principle of idempotency in Infrastructure as Code. Why is it important, and how do tools like Terraform achieve it?

  3. Container vs. VM: Your team is debating whether to containerize an application or deploy it on VMs. What are the key factors to consider, and when would you choose each approach?

  4. Hybrid Cloud Design: Design a hybrid cloud architecture for an organization that needs to keep sensitive customer data on-premises while leveraging cloud for web-facing applications. What connectivity options would you recommend?

  5. Availability Calculation: A business requires 99.95% availability for a critical application. Calculate the maximum allowable downtime per year and per month. What infrastructure design patterns would help achieve this?

  6. Storage Selection: You need to design storage for three workloads: a relational database, a shared file system for multiple servers, and a data lake for analytics. What storage type would you recommend for each and why?

  7. Security Layers: Explain the defense-in-depth approach to infrastructure security. Describe at least four layers of security controls and how they work together.

  8. IaC Tool Selection: Your organization is standardizing on IaC. Compare Terraform and CloudFormation for a multi-cloud environment. What are the key considerations?


Chapter Navigation


Back to top

Infrastructure and Platform Management Handbook - MIT License