Chapter 2: Core Concepts and Definitions
Learning Objectives
After completing this chapter, you will be able to:
- Define key infrastructure and platform management terminology with precision
- Distinguish between different infrastructure types and deployment models
- Understand cloud service models (IaaS, PaaS, SaaS, FaaS, CaaS) and their implications
- Explain Infrastructure as Code principles, benefits, and implementation approaches
- Recognize the components of modern infrastructure stacks across compute, network, storage, and security
- Apply consistent terminology throughout infrastructure discussions and documentation
- Understand the shared responsibility model across different service models
- Differentiate between traditional and cloud-native infrastructure approaches
Introduction
Effective communication about infrastructure requires a shared vocabulary. Misunderstandings about fundamental concepts lead to poor decisions, failed projects, and wasted resources. This chapter establishes the core concepts and definitions used throughout this handbook and in the broader infrastructure management discipline.
The terminology of infrastructure has evolved significantly over the past decade. Terms like “server” now encompass physical machines, virtual machines, containers, and serverless functions. “Storage” spans local disks, SANs, object storage, and distributed file systems. Understanding these concepts in their modern context is essential for architects, engineers, and managers who need to communicate clearly about infrastructure decisions, designs, and operations.
This chapter serves as both a learning resource for those new to modern infrastructure and a reference for experienced practitioners who need precise definitions. The concepts introduced here form the foundation for all subsequent chapters.
Infrastructure Types and Deployment Models
On-Premises Infrastructure
On-premises infrastructure refers to IT infrastructure owned, operated, and housed within an organization’s own facilities. This traditional model gives organizations complete control over their hardware, software, and data.
| Characteristic | Description | Implications |
|---|---|---|
| Ownership | Organization owns all hardware and facilities | Full capital expenditure, asset management required |
| Control | Complete control over all aspects of infrastructure | Maximum flexibility, maximum responsibility |
| Capital Model | Capital expenditure (CapEx) for hardware acquisition | Large upfront investment, depreciation over time |
| Responsibility | Full stack responsibility from facilities to applications | Requires diverse skill sets, 24x7 operations capability |
| Flexibility | Limited by physical capacity and procurement cycles | Capacity planning critical, scaling takes time |
| Location | Physical data center owned or leased | Geographic constraints, physical security requirements |
Advantages of On-Premises Infrastructure:
| Advantage | Description | Best For |
|---|---|---|
| Complete control | Full authority over hardware, software, and configuration | Organizations with strict control requirements |
| Data sovereignty | Data remains within organizational boundaries | Compliance with data residency regulations |
| No provider dependency | No reliance on external cloud providers | Organizations concerned about vendor lock-in |
| Predictable costs | Fixed costs after initial investment (excluding maintenance) | Stable, predictable workloads |
| Customization | Ability to customize hardware and software to exact specifications | Specialized workloads with unique requirements |
| Network performance | Low latency for on-premises applications | Latency-sensitive applications |
Challenges of On-Premises Infrastructure:
| Challenge | Description | Mitigation |
|---|---|---|
| High upfront investment | Significant capital required before deployment | Leasing options, phased deployment |
| Capacity planning | Must predict future needs accurately | Conservative planning, modular expansion |
| Hardware lifecycle | Regular refresh cycles required (typically 3-5 years) | Lifecycle management, technology roadmap |
| Facility requirements | Power, cooling, physical security, space | Colocation as alternative |
| Scaling limitations | Adding capacity requires procurement and installation | Hybrid cloud for burst capacity |
| Operational overhead | 24x7 operations, maintenance, patching | Managed services, automation |
Cloud Infrastructure
Cloud infrastructure refers to IT resources delivered as services by third-party providers over the internet. Cloud computing fundamentally changes the economics and operations of infrastructure.
| Characteristic | Description | Implications |
|---|---|---|
| Ownership | Provider owns and maintains all infrastructure | No hardware management, provider handles facilities |
| Control | Shared responsibility model based on service type | Less control, but less responsibility |
| Capital Model | Operational expenditure (OpEx), pay-as-you-go | Variable costs, potential for optimization |
| Responsibility | Varies by service model (IaaS, PaaS, SaaS) | Different skill requirements per model |
| Flexibility | Elastic capacity, on-demand provisioning | Scale up/down based on demand |
| Location | Global regions and availability zones | Geographic distribution, data residency options |
Cloud Deployment Models:
| Model | Description | Characteristics | Use Cases |
|---|---|---|---|
| Public Cloud | Shared infrastructure, multi-tenant environment | Lowest cost, highest scalability, shared resources | General workloads, variable demand, startups |
| Private Cloud | Dedicated infrastructure for single organization | More control, dedicated resources, higher cost | Compliance requirements, security-sensitive workloads |
| Hybrid Cloud | Combination of on-premises and public cloud | Flexibility, workload placement options | Mixed requirements, cloud migration journey |
| Multi-Cloud | Multiple public cloud providers | Avoid lock-in, best-of-breed services | Risk distribution, specialized services |
| Community Cloud | Shared infrastructure for specific community | Shared costs, common compliance requirements | Government, healthcare, research |
Public Cloud Characteristics:
PUBLIC CLOUD MODEL
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLOUD PROVIDER INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ SHARED INFRASTRUCTURE │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Customer │ │ Customer │ │ Customer │ │ Customer │ │ │
│ │ │ A │ │ B │ │ C │ │ D │ │ │
│ │ │ (Your │ │ │ │ │ │ │ │ │
│ │ │ Org) │ │ │ │ │ │ │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ HYPERVISOR / CONTAINER RUNTIME │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ PHYSICAL SERVERS, STORAGE, NETWORK │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ PROVIDER RESPONSIBILITIES: │
│ • Physical security • Hardware maintenance • Network backbone │
│ • Power and cooling • Hypervisor security • Service APIs │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Hybrid Infrastructure
Hybrid infrastructure combines on-premises and cloud resources, enabling organizations to place workloads in the optimal location based on requirements.
HYBRID INFRASTRUCTURE ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ HYBRID INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ ON-PREMISES │ │ PUBLIC CLOUD │ │
│ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │
│ │ │ Core Systems │ │ │ │ Cloud-Native Apps │ │ │
│ │ │ • ERP │ │ │ │ • Web applications │ │ │
│ │ │ • Core databases │ │ │ │ • Mobile backends │ │ │
│ │ │ • Legacy apps │ │ │ │ • APIs │ │ │
│ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │
│ │ │ Sensitive Data │ │ │ │ Dev/Test │ │ │
│ │ │ • PII │ │ │ │ • Development │ │ │
│ │ │ • Financial │ │ │ │ • Testing │ │ │
│ │ │ • Regulated │ │ │ │ • Staging │ │ │
│ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │
│ │ │ Compliance │ │ │ │ Burst Capacity │ │ │
│ │ │ Workloads │ │ │ │ • Peak loads │ │ │
│ │ │ │ │ │ │ • Seasonal demand │ │ │
│ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │
│ │ │ │ │ │
│ └──────────────┬──────────────┘ └──────────────┬──────────────┘ │
│ │ │ │
│ │ HYBRID CONNECTIVITY │ │
│ │ │ │
│ │ ┌────────────────────────────────┐ │ │
│ └──┤ VPN / Direct Connect / ├─┘ │
│ │ ExpressRoute / Interconnect │ │
│ └────────────────────────────────┘ │
│ │
│ UNIFIED MANAGEMENT: │
│ • Single pane of glass • Consistent policies • Integrated security │
│ • Unified monitoring • Common automation • Shared identity │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Hybrid Cloud Connectivity Options:
| Connection Type | Description | Bandwidth | Latency | Cost | Use Case |
|---|---|---|---|---|---|
| Site-to-Site VPN | Encrypted tunnel over internet | Up to 1 Gbps | Variable | Low | Development, non-critical |
| Direct Connect (AWS) | Dedicated private connection | 1-100 Gbps | Low, consistent | High | Production workloads |
| ExpressRoute (Azure) | Dedicated private connection | 50 Mbps-10 Gbps | Low, consistent | High | Production workloads |
| Cloud Interconnect (GCP) | Dedicated private connection | 10-100 Gbps | Low, consistent | High | Production workloads |
| SD-WAN | Software-defined WAN | Variable | Optimized | Medium | Distributed offices |
Multi-Cloud Strategy
Multi-cloud refers to the use of multiple cloud providers, either for different workloads or for redundancy.
| Strategy | Description | Benefits | Challenges |
|---|---|---|---|
| Best of Breed | Use best service from each provider | Optimal capabilities | Management complexity |
| Risk Distribution | Spread workloads across providers | Reduced provider dependency | Cost, complexity |
| Geographic Coverage | Use providers with specific regional presence | Global reach | Data synchronization |
| Cost Optimization | Leverage competitive pricing | Cost savings | Comparison complexity |
| Compliance | Meet specific regulatory requirements | Compliance achievement | Policy consistency |
Multi-Cloud Architecture:
MULTI-CLOUD ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-CLOUD STRATEGY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ AWS │ │ AZURE │ │ GCP │ │
│ │ │ │ │ │ │ │
│ │ • EC2/EKS │ │ • AKS │ │ • BigQuery │ │
│ │ • S3 │ │ • Azure AD │ │ • ML/AI │ │
│ │ • Lambda │ │ • Office 365 │ │ • Anthos │ │
│ │ • RDS │ │ • Cosmos DB │ │ • Cloud Run │ │
│ │ │ │ │ │ │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ │ MULTI-CLOUD LAYER │ │
│ │ │ │
│ │ • Terraform (IaC) │ │
│ │ • Kubernetes (K8s) │ │
│ │ • Service Mesh │ │
│ │ • Identity Federation │ │
│ │ • Observability │ │
│ │ • Cost Management │ │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Cloud Service Models
Infrastructure as a Service (IaaS)
IaaS provides virtualized computing resources over the internet. Users manage operating systems, applications, and data while the provider manages physical infrastructure.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Virtual machines, storage volumes, virtual networks | EC2, Azure VMs, GCE |
| What You Manage | OS, middleware, runtime, applications, data | Full stack from OS up |
| Provider Manages | Hardware, hypervisor, physical network, facilities | Physical infrastructure |
| Pricing Model | Per hour/second for compute, per GB for storage | On-demand, reserved, spot |
IaaS Characteristics:
| Characteristic | Description | Consideration |
|---|---|---|
| Maximum Control | Full control over OS and everything above | Requires more operational capability |
| Maximum Flexibility | Any OS, any software, any configuration | More decisions to make |
| Operating System Access | Full root/admin access | Security responsibility |
| Patching Responsibility | Customer responsible for OS and application patching | Operational overhead |
| Scaling | Manual or auto-scaling of VM instances | Must configure scaling |
| Cost Model | Pay for allocated resources, even if underutilized | Right-sizing important |
IaaS Use Cases:
| Use Case | Description | Why IaaS |
|---|---|---|
| Lift and shift migration | Moving existing applications to cloud | Minimal application changes |
| Development and testing | Creating dev/test environments | Quick provisioning, cost control |
| High-performance computing | Scientific computing, rendering | Access to powerful hardware |
| Big data analysis | Processing large datasets | Scalable compute resources |
| Web hosting | Running web servers | Full control over configuration |
| Disaster recovery | DR site in cloud | Cost-effective standby |
Platform as a Service (PaaS)
PaaS provides a platform for developing, running, and managing applications without the complexity of managing underlying infrastructure.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Runtime environment, development tools, middleware | App Service, Elastic Beanstalk, Heroku |
| What You Manage | Application code, data | Focus on application logic |
| Provider Manages | Infrastructure, OS, runtime, middleware | Everything below application |
| Pricing Model | Per application, per instance, per request | Usage-based |
PaaS Characteristics:
| Characteristic | Description | Consideration |
|---|---|---|
| Reduced Complexity | No OS management, automatic patching | Less operational overhead |
| Faster Development | Focus on code, not infrastructure | Accelerated development |
| Limited Control | Constrained to platform capabilities | May not fit all applications |
| Vendor Lock-in | Applications may become platform-specific | Portability concerns |
| Built-in Scaling | Automatic scaling based on demand | Less configuration required |
| Cost Efficiency | Pay for actual usage | Better for variable workloads |
PaaS Use Cases:
| Use Case | Description | Why PaaS |
|---|---|---|
| Web applications | Modern web apps | Rapid deployment, auto-scaling |
| API development | Building and hosting APIs | Managed infrastructure |
| Microservices | Deploying microservice architectures | Container orchestration included |
| Mobile backends | Backend services for mobile apps | Pre-built services |
| DevOps automation | CI/CD pipelines | Integrated tooling |
Software as a Service (SaaS)
SaaS delivers complete applications over the internet, fully managed by the provider.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Complete application, accessible via browser/API | Office 365, Salesforce, Workday |
| What You Manage | User configuration, data | Minimal technical management |
| Provider Manages | Everything: infrastructure, application, updates | Full stack responsibility |
| Pricing Model | Per user, per month, per feature tier | Subscription-based |
Containers as a Service (CaaS)
CaaS provides container orchestration and management as a service.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Managed container orchestration platform | EKS, AKS, GKE |
| What You Manage | Container images, deployments, configurations | Application containers |
| Provider Manages | Control plane, node infrastructure, networking | Kubernetes infrastructure |
| Pricing Model | Per cluster, per node, per request | Varies by provider |
Functions as a Service (FaaS) / Serverless
FaaS provides event-driven compute without server management.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Event-triggered function execution | Lambda, Azure Functions, Cloud Functions |
| What You Manage | Function code, triggers | Just the code |
| Provider Manages | Everything else: scaling, infrastructure | Full infrastructure abstraction |
| Pricing Model | Per invocation, per duration, per memory | Pay only for execution |
FaaS Characteristics:
| Characteristic | Description | Consideration |
|---|---|---|
| No Server Management | Zero infrastructure to manage | Maximum abstraction |
| Automatic Scaling | Scales from zero to thousands | Built-in elasticity |
| Pay Per Execution | Only pay when code runs | Cost-effective for variable loads |
| Cold Start | Initial invocation latency | Not suitable for all workloads |
| Time Limits | Maximum execution duration | Not for long-running processes |
| Stateless | No persistent local state | Must use external storage |
Database as a Service (DBaaS)
DBaaS provides managed database services.
| Aspect | Description | Examples |
|---|---|---|
| What You Get | Managed database engine | RDS, Azure SQL, Cloud SQL, MongoDB Atlas |
| What You Manage | Schema, queries, data, some configuration | Database design and usage |
| Provider Manages | Hardware, patching, backups, HA, scaling | Database operations |
| Pricing Model | Per instance, per storage, per IOPS | Resource-based |
Service Model Comparison
SERVICE MODEL RESPONSIBILITY COMPARISON
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHARED RESPONSIBILITY MODEL │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ON-PREM IaaS PaaS CaaS SaaS │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ YOU │ │ YOU │ │ YOU │ │ YOU │ │ YOU │ │
│ │ MANAGE │ │ MANAGE │ │ MANAGE │ │ MANAGE │ │ MANAGE │ │
│ ├────────┤ ├────────┤ ├────────┤ ├────────┤ ├────────┤ │
│ DATA │████████│ │████████│ │████████│ │████████│ │████████│ │
│ ├────────┤ ├────────┤ ├────────┤ ├────────┤ ├────────┤ │
│ APP │████████│ │████████│ │████████│ │████████│ │ │ │
│ ├────────┤ ├────────┤ ├────────┤ ├────────┤ │PROVIDER│ │
│ RUNTIME │████████│ │████████│ │ │ │ │ │ MANAGES│ │
│ ├────────┤ ├────────┤ │PROVIDER│ │PROVIDER│ │ │ │
│ O/S │████████│ │████████│ │ MANAGES│ │ MANAGES│ │ │ │
│ ├────────┤ ├────────┤ │ │ │ │ │ │ │
│ VIRTUAL │████████│ │ │ │ │ │ │ │ │ │
│ ├────────┤ │PROVIDER│ │ │ │ │ │ │ │
│ SERVER │████████│ │ MANAGES│ │ │ │ │ │ │ │
│ ├────────┤ │ │ │ │ │ │ │ │ │
│ STORAGE │████████│ │ │ │ │ │ │ │ │ │
│ ├────────┤ │ │ │ │ │ │ │ │ │
│ NETWORK │████████│ │ │ │ │ │ │ │ │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │
│ █ = Customer Responsibility (blank) = Provider Responsibility │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Infrastructure as Code (IaC)
Definition and Principles
Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes or interactive configuration tools.
Formal Definition: Infrastructure as Code is an approach to infrastructure automation that applies software engineering practices—including version control, code review, testing, and continuous integration—to infrastructure management.
Core Principles of IaC
| Principle | Description | Implementation | Benefit |
|---|---|---|---|
| Declarative Definition | Define desired end state, not procedural steps | Terraform HCL, CloudFormation YAML | System figures out how to achieve state |
| Version Control | All infrastructure code stored in Git | GitHub, GitLab, Bitbucket | History, collaboration, rollback |
| Idempotency | Applying code multiple times produces same result | Terraform plan/apply | Safe to re-run, predictable |
| Modularity | Reusable, composable components | Terraform modules, CloudFormation nested stacks | DRY principle, consistency |
| Immutability | Replace rather than modify in place | Blue-green deployments, AMI-based | Reduced drift, predictable state |
| Testing | Validate infrastructure before deployment | Terratest, Kitchen-Terraform | Catch errors early |
| Documentation as Code | Code serves as documentation | Self-documenting infrastructure | Always current documentation |
Declarative vs. Imperative Approaches
| Approach | Description | Example | Best For |
|---|---|---|---|
| Declarative | Define desired state, system determines how | Terraform, CloudFormation | Most infrastructure provisioning |
| Imperative | Define exact steps to execute | Ansible playbooks, shell scripts | Configuration, orchestration |
| Hybrid | Declarative structure with imperative elements | Pulumi, CDK | Complex logic requirements |
Declarative Example (Terraform):
# Terraform declares WHAT should exist
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
tags = {
Name = "web-server-prod"
Environment = "production"
ManagedBy = "terraform"
}
}
resource "aws_security_group" "web" {
name = "web-sg"
description = "Security group for web servers"
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Imperative Example (Ansible):
# Ansible defines HOW to configure
- name: Configure web server
hosts: web_servers
become: yes
tasks:
- name: Install nginx
apt:
name: nginx
state: present
update_cache: yes
- name: Copy configuration file
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: Restart nginx
- name: Ensure nginx is running
service:
name: nginx
state: started
enabled: yes
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted
IaC Benefits
| Benefit | Description | Quantified Impact |
|---|---|---|
| Consistency | Identical infrastructure across environments | Zero configuration drift |
| Speed | Provision infrastructure in minutes | 100x faster than manual |
| Repeatability | Recreate infrastructure reliably | Disaster recovery capability |
| Auditability | Git history captures all changes | Complete change history |
| Collaboration | Teams can review and contribute | Reduced errors through review |
| Disaster Recovery | Rebuild infrastructure from code | Hours instead of days |
| Cost Reduction | Eliminate manual errors, reduce rework | 40-60% operational savings |
| Compliance | Enforce standards through code | Automated compliance checks |
IaC Tool Categories
| Category | Purpose | Tools | When to Use |
|---|---|---|---|
| Provisioning | Create cloud resources | Terraform, CloudFormation, Pulumi, CDK | Creating infrastructure |
| Configuration Management | Configure systems | Ansible, Puppet, Chef, Salt | Configuring servers |
| Container Orchestration | Manage containers | Kubernetes, Docker Swarm, Nomad | Running containerized apps |
| Image Building | Create machine images | Packer, EC2 Image Builder | Creating golden images |
| Secret Management | Manage secrets | Vault, AWS Secrets Manager | Handling credentials |
IaC Tool Comparison
| Tool | Type | Language | State Management | Cloud Support | Learning Curve |
|---|---|---|---|---|---|
| Terraform | Provisioning | HCL | Remote state | Multi-cloud | Medium |
| CloudFormation | Provisioning | YAML/JSON | AWS managed | AWS only | Medium |
| Pulumi | Provisioning | Python/TS/Go | Remote state | Multi-cloud | Medium-High |
| CDK | Provisioning | Python/TS | CloudFormation | AWS primary | Medium-High |
| Ansible | Config Mgmt | YAML | Stateless | Multi-cloud | Low |
| Puppet | Config Mgmt | Puppet DSL | Puppet server | Multi-cloud | High |
| Chef | Config Mgmt | Ruby | Chef server | Multi-cloud | High |
IaC Workflow
IaC DEVELOPMENT AND DEPLOYMENT WORKFLOW
┌─────────────────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE AS CODE WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ DEVELOPMENT PHASE │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Write │───►│ Lint │───►│ Format │───►│ Local │ │ │
│ │ │ Code │ │ │ │ │ │ Test │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ │ │ │ │ │
│ │ │ IDE/Editor │ terraform plan │ │
│ │ │ tflint, checkov │ terratest │ │
│ │ │ terraform fmt │ kitchen-terraform │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ VERSION CONTROL PHASE │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Commit │───►│ Push │───►│ Pull │───►│ Review │ │ │
│ │ │ │ │ │ │ Request│ │ │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ │ │ │ │ │ │
│ │ │ git commit │ CI triggers │ Peer review │ │
│ │ │ Conventional commits │ PR created │ Security review │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ CI/CD PHASE │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │Validate│───►│Security│───►│ Plan │───►│ Apply │ │ │
│ │ │ │ │ Scan │ │ │ │ │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ │ │ │ │ │ │ │
│ │ │ terraform │ checkov │ terraform │ terraform │ │
│ │ │ validate │ tfsec │ plan │ apply │ │
│ │ │ │ Snyk │ (saved) │ (auto/manual) │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ POST-DEPLOYMENT │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Verify │───►│Document│───►│Monitor │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ └────────┘ └────────┘ └────────┘ │ │
│ │ │ │ │ │ │
│ │ │ Smoke tests │ Update CMDB │ Drift detection │ │
│ │ │ Health check│ Release note│ Cost monitoring │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
State Management
IaC tools that manage state need careful consideration of how state is stored and accessed.
| State Aspect | Description | Best Practice |
|---|---|---|
| Remote State | Store state in shared location | S3 + DynamoDB, Azure Blob, GCS |
| State Locking | Prevent concurrent modifications | Enable locking mechanism |
| State Encryption | Protect sensitive data in state | Enable encryption at rest |
| State Backup | Regular state backups | Versioning on state bucket |
| State Separation | Separate state per environment | Different state files per env |
Compute Concepts
Physical Servers
Physical servers are dedicated hardware machines running workloads directly on hardware without virtualization.
| Term | Definition | Use Case |
|---|---|---|
| Bare Metal | Physical server without hypervisor | High-performance, licensing, specific hardware |
| Rack Server | Server designed for 19-inch rack mounting | Data center deployments |
| Blade Server | Compact server in blade chassis | High-density environments |
| Tower Server | Standalone server unit | Small offices, edge locations |
| Hyperconverged Infrastructure (HCI) | Integrated compute, storage, networking | Simplified management |
Virtual Machines
Virtual machines (VMs) are software-based computers running on a hypervisor that abstracts physical hardware.
| Term | Definition | Examples |
|---|---|---|
| Hypervisor | Software layer enabling virtualization | VMware ESXi, Hyper-V, KVM, Xen |
| Type 1 Hypervisor | Runs directly on hardware (bare metal) | ESXi, Hyper-V, Xen |
| Type 2 Hypervisor | Runs on host operating system | VirtualBox, VMware Workstation |
| Guest OS | Operating system running inside VM | Windows, Linux within VM |
| Host OS | Operating system running the hypervisor | ESXi, Windows Server |
| vCPU | Virtual CPU allocated to VM | 2 vCPU, 4 vCPU |
| VM Template | Pre-configured VM image for cloning | Golden images |
| Snapshot | Point-in-time capture of VM state | Backup before changes |
| Live Migration | Moving running VM between hosts | vMotion, Live Migration |
Containers
Containers are lightweight, portable units that package applications with their dependencies, sharing the host OS kernel.
| Term | Definition | Examples |
|---|---|---|
| Container Image | Packaged application with all dependencies | Docker image, OCI image |
| Container Runtime | Software that runs containers | Docker, containerd, CRI-O |
| Container Registry | Repository for storing container images | Docker Hub, ECR, ACR, GCR |
| Dockerfile | Instructions for building container image | Build recipe |
| Layer | Read-only filesystem layer in image | Image composition |
| Container | Running instance of container image | Isolated process |
Container vs VM Comparison:
| Aspect | Container | Virtual Machine |
|---|---|---|
| Isolation | Process-level (shared kernel) | Hardware-level (separate kernel) |
| Size | Megabytes (10s-100s MB) | Gigabytes (GBs) |
| Startup Time | Seconds | Minutes |
| Resource Overhead | Minimal | Significant (OS per VM) |
| Density | Hundreds per host | Tens per host |
| Portability | Very high (image-based) | Medium (hypervisor-dependent) |
| Security Isolation | Weaker (shared kernel) | Stronger (separate kernel) |
CONTAINER VS VIRTUAL MACHINE ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONTAINER VS VM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ CONTAINERS VIRTUAL MACHINES │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │┌─────┐ ┌─────┐ ┌─────┐ │ │┌─────────┐ ┌─────────┐ │ │
│ ││App A│ │App B│ │App C│ │ ││ App A │ │ App B │ │ │
│ │├─────┤ ├─────┤ ├─────┤ │ │├─────────┤ ├─────────┤ │ │
│ ││Bins/│ │Bins/│ │Bins/│ │ ││ Bins/ │ │ Bins/ │ │ │
│ ││Libs │ │Libs │ │Libs │ │ ││ Libs │ │ Libs │ │ │
│ │└─────┘ └─────┘ └─────┘ │ │├─────────┤ ├─────────┤ │ │
│ │ │ ││Guest OS │ │Guest OS │ │ │
│ │┌──────────────────────────┐│ │└─────────┘ └─────────┘ │ │
│ ││ Container Runtime ││ │┌───────────────────────────┐│ │
│ │└──────────────────────────┘│ ││ HYPERVISOR ││ │
│ │┌──────────────────────────┐│ │└───────────────────────────┘│ │
│ ││ Host Operating ││ │┌───────────────────────────┐│ │
│ ││ System ││ ││ Host Operating System ││ │
│ │└──────────────────────────┘│ │└───────────────────────────┘│ │
│ │┌──────────────────────────┐│ │┌───────────────────────────┐│ │
│ ││ HARDWARE ││ ││ HARDWARE ││ │
│ │└──────────────────────────┘│ │└───────────────────────────┘│ │
│ └─────────────────────────────┘ └─────────────────────────────┘ │
│ │
│ PROS: PROS: │
│ • Fast startup (seconds) • Strong isolation │
│ • Lightweight (MBs) • Any OS on any host │
│ • High density • Mature technology │
│ • Portable across platforms • Better security boundaries │
│ │
│ CONS: CONS: │
│ • Shared kernel (security) • Slow startup (minutes) │
│ • Linux-centric (mostly) • Heavy (GBs) │
│ • Newer technology • Lower density │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Container Orchestration (Kubernetes)
Kubernetes is the de facto standard for container orchestration, managing containerized applications at scale.
| Term | Definition | Purpose |
|---|---|---|
| Pod | Smallest deployable unit (one or more containers) | Unit of deployment |
| Deployment | Manages replica sets and pod lifecycles | Declarative updates |
| Service | Stable network endpoint for pods | Service discovery |
| Namespace | Virtual cluster within cluster | Isolation, organization |
| ConfigMap | Non-sensitive configuration data | External configuration |
| Secret | Sensitive configuration data | Credentials, keys |
| Ingress | HTTP/HTTPS routing to services | External access |
| PersistentVolume | Storage abstraction | Persistent data |
| StatefulSet | Manages stateful applications | Databases, queues |
| DaemonSet | Ensures pod runs on all/some nodes | Logging, monitoring agents |
Serverless Computing
Serverless computing abstracts infrastructure completely, allowing developers to focus solely on code.
| Term | Definition | Examples |
|---|---|---|
| Function | Single-purpose code unit | Lambda function |
| Event Trigger | What initiates function execution | HTTP request, queue message, schedule |
| Cold Start | Initial invocation latency when function not warm | First request delay |
| Warm Start | Subsequent fast invocations | Reused container |
| Concurrency | Number of parallel function executions | Scaling unit |
| Provisioned Concurrency | Pre-warmed function instances | Eliminate cold starts |
Network Concepts
Network Fundamentals
| Term | Definition | Example |
|---|---|---|
| IP Address | Unique identifier for network device | 192.168.1.100, 10.0.0.1 |
| CIDR | IP address allocation notation | 10.0.0.0/16 (65,536 addresses) |
| Subnet | Segment of larger network | 10.0.1.0/24 (256 addresses) |
| Gateway | Entry/exit point for network | Router IP address |
| DNS | Domain Name System | Translates names to IPs |
| DHCP | Dynamic Host Configuration Protocol | Automatic IP assignment |
Cloud Networking
| Term | Definition | Examples |
|---|---|---|
| VPC/VNet | Virtual private cloud/network | Isolated network in cloud |
| Subnet | Network segment within VPC | Public subnet, private subnet |
| Route Table | Network routing rules | Direct traffic to destinations |
| Internet Gateway | Provides internet access to VPC | Public subnet connectivity |
| NAT Gateway | Enables private subnet internet access | Outbound only access |
| VPC Peering | Direct connection between VPCs | Inter-VPC communication |
| Transit Gateway | Hub for connecting multiple networks | Network hub |
| PrivateLink/Endpoint | Private connectivity to services | Avoid internet for AWS services |
Network Security
| Term | Definition | Level |
|---|---|---|
| Security Group | Virtual firewall for instances | Instance level |
| NACL | Network Access Control List | Subnet level |
| WAF | Web Application Firewall | Application level |
| Firewall | Network traffic filter | Network perimeter |
| IDS/IPS | Intrusion Detection/Prevention | Network monitoring |
Load Balancing
| Term | Definition | Use Case |
|---|---|---|
| Layer 4 (L4) Load Balancer | Routes based on IP and port | TCP/UDP traffic |
| Layer 7 (L7) Load Balancer | Routes based on application data | HTTP/HTTPS traffic |
| Health Check | Verifies target availability | Remove unhealthy targets |
| Target Group | Collection of targets | Group of instances |
| Listener | Checks for connection requests | Port and protocol |
| Algorithm | How traffic is distributed | Round-robin, least connections |
Storage Concepts
Storage Types
| Type | Description | Characteristics | Use Cases |
|---|---|---|---|
| Block Storage | Fixed-size blocks, formatted as filesystem | Low latency, high IOPS | Databases, boot volumes |
| File Storage | Hierarchical file system, shared access | POSIX compliant, concurrent access | Shared data, legacy apps |
| Object Storage | Flat structure, unlimited scale | HTTP access, metadata | Backups, data lakes, archives |
Storage Characteristics
| Characteristic | Definition | Measurement |
|---|---|---|
| IOPS | Input/Output Operations Per Second | Reads + Writes per second |
| Throughput | Data transfer rate | MB/s or GB/s |
| Latency | Time to complete I/O operation | Milliseconds or microseconds |
| Durability | Probability of data preservation | 99.999999999% (11 9s) |
| Availability | Percentage time accessible | 99.99% |
Storage Tiers
| Tier | Access Pattern | Cost | Examples |
|---|---|---|---|
| Hot/Standard | Frequent access | Highest | S3 Standard, Premium SSD |
| Warm/Infrequent | Monthly access | Medium | S3 IA, Cool Blob |
| Cold/Archive | Yearly access | Lowest | S3 Glacier, Archive |
Security Concepts
Identity and Access Management
| Term | Definition | Example |
|---|---|---|
| Principal | Entity requesting access | User, role, service |
| Authentication | Verifying identity | Password, MFA, certificate |
| Authorization | Determining permissions | IAM policy, RBAC |
| IAM Policy | Permission rules document | Allow/deny actions |
| Role | Set of permissions assumable by principals | EC2 instance role |
| MFA | Multi-Factor Authentication | Something you know + have |
Security Principles
| Principle | Description | Implementation |
|---|---|---|
| Least Privilege | Minimum necessary permissions | Specific IAM policies |
| Defense in Depth | Multiple security layers | WAF + SG + NACL |
| Zero Trust | Never trust, always verify | Continuous authentication |
| Separation of Duties | Divide critical functions | Different roles for different tasks |
Encryption
| Type | Description | Use |
|---|---|---|
| At Rest | Data encrypted in storage | EBS encryption, S3 SSE |
| In Transit | Data encrypted during transmission | TLS/SSL |
| Client-side | Encrypted before sending | Application handles |
| Server-side | Encrypted by service | Provider handles |
| KMS | Key Management Service | Central key management |
Availability and Resilience Concepts
Availability Metrics
| Term | Definition | Calculation |
|---|---|---|
| Availability | Percentage uptime | (Uptime / Total Time) × 100 |
| Uptime | Time system is operational | Total time minus downtime |
| Downtime | Time system is unavailable | Planned + unplanned |
| MTBF | Mean Time Between Failures | Average time between failures |
| MTTR | Mean Time to Repair | Average repair time |
| MTTD | Mean Time to Detect | Average detection time |
Availability Levels (“Nines”)
| Availability | Uptime % | Downtime/Year | Downtime/Month | Downtime/Week |
|---|---|---|---|---|
| Two 9s | 99% | 3.65 days | 7.31 hours | 1.68 hours |
| Three 9s | 99.9% | 8.77 hours | 43.83 minutes | 10.08 minutes |
| Four 9s | 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes |
| Five 9s | 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds |
Recovery Objectives
| Term | Definition | Business Question |
|---|---|---|
| RTO | Recovery Time Objective | How long can we be down? |
| RPO | Recovery Point Objective | How much data can we lose? |
Resilience Patterns
| Pattern | Description | Implementation |
|---|---|---|
| Redundancy | Duplicate components | Multiple instances |
| Failover | Automatic switch to standby | Active-passive |
| Load Balancing | Distribute across instances | Active-active |
| Circuit Breaker | Prevent cascade failures | Fail fast, recover |
| Bulkhead | Isolate failures | Separate pools |
| Retry with Backoff | Retry failed operations | Exponential backoff |
Key Takeaways
- Infrastructure types (on-premises, cloud, hybrid, multi-cloud) each have distinct characteristics, advantages, and trade-offs that must be understood for effective architecture decisions
- Cloud service models (IaaS, PaaS, SaaS, CaaS, FaaS) define the shared responsibility model—understanding what you manage versus what the provider manages is essential
- Infrastructure as Code is foundational for modern infrastructure management, applying software engineering practices to infrastructure provisioning and configuration
- Compute options span physical servers, virtual machines, containers, and serverless—each with appropriate use cases and trade-offs
- Network concepts including VPCs, subnets, security groups, and load balancing are essential for designing secure, performant infrastructure
- Storage types (block, file, object) serve different use cases with different characteristics for performance, durability, and cost
- Security concepts including IAM, encryption, and security principles must be understood and applied throughout infrastructure design
- Availability metrics and resilience patterns enable design of highly available, fault-tolerant systems
Summary
This chapter established the core concepts and terminology essential for infrastructure and platform management. These definitions form the foundation for communication, design decisions, and operational practices throughout the infrastructure lifecycle.
The infrastructure landscape has evolved significantly, with cloud computing introducing new service models that shift responsibilities between customers and providers. Understanding where on-premises infrastructure ends and cloud begins—and the spectrum of hybrid and multi-cloud options in between—enables architects to select the right deployment model for each workload.
Infrastructure as Code represents a paradigm shift from manual, ticket-based provisioning to software-defined infrastructure. The principles of declarative definition, version control, idempotency, and testing transform infrastructure from a bottleneck to an enabler of rapid delivery.
The diversity of compute options—from bare metal servers to serverless functions—provides flexibility to match workloads to the most appropriate execution environment. Similarly, the range of storage and networking options enables optimized solutions for different requirements.
Security concepts must permeate all infrastructure decisions. The principles of least privilege, defense in depth, and zero trust guide secure infrastructure design, while encryption and identity management provide the technical controls.
Finally, availability and resilience concepts enable the design of systems that meet business requirements for uptime and recovery. Understanding metrics like RTO, RPO, and the availability “nines” translates business requirements into technical specifications.
The concepts introduced in this chapter will be referenced and applied throughout subsequent chapters as we explore architecture, implementation, operations, and governance of infrastructure and platforms.
Review Questions
Service Model Selection: An organization is considering deploying a new application. Compare and contrast IaaS, PaaS, and CaaS for this use case. What factors would influence the choice?
IaC Principles: Explain the principle of idempotency in Infrastructure as Code. Why is it important, and how do tools like Terraform achieve it?
Container vs. VM: Your team is debating whether to containerize an application or deploy it on VMs. What are the key factors to consider, and when would you choose each approach?
Hybrid Cloud Design: Design a hybrid cloud architecture for an organization that needs to keep sensitive customer data on-premises while leveraging cloud for web-facing applications. What connectivity options would you recommend?
Availability Calculation: A business requires 99.95% availability for a critical application. Calculate the maximum allowable downtime per year and per month. What infrastructure design patterns would help achieve this?
Storage Selection: You need to design storage for three workloads: a relational database, a shared file system for multiple servers, and a data lake for analytics. What storage type would you recommend for each and why?
Security Layers: Explain the defense-in-depth approach to infrastructure security. Describe at least four layers of security controls and how they work together.
IaC Tool Selection: Your organization is standardizing on IaC. Compare Terraform and CloudFormation for a multi-cloud environment. What are the key considerations?