Parallel and Distributed Computing

💻Parallel and Distributed Computing Unit 14 – Cloud Computing & Virtualization

Cloud computing revolutionized how we access and manage digital resources. It offers on-demand services like servers, storage, and software over the internet, enabling businesses to scale efficiently and reduce infrastructure costs. Virtualization is the backbone of cloud computing, abstracting physical hardware into virtual environments. This technology allows for efficient resource allocation, improved scalability, and enhanced flexibility in deploying and managing applications across diverse cloud platforms.

Key Concepts and Definitions

  • Cloud computing delivers computing services over the internet (the cloud) including servers, storage, databases, networking, software, analytics, and intelligence
  • Virtualization abstracts physical hardware, creating a virtual computing environment where resources can be efficiently utilized and dynamically allocated
  • Scalability enables a system to handle increased loads by dynamically adding more resources (horizontal scaling) or increasing the capacity of existing resources (vertical scaling)
  • Elasticity allows resources to be rapidly provisioned and released based on demand, ensuring optimal resource utilization and cost-efficiency
  • Service Level Agreement (SLA) defines the level of service expected from a cloud provider, including uptime, performance, and security commitments
  • Multitenancy enables multiple customers (tenants) to share the same computing resources while maintaining data isolation and security
  • Workload refers to the amount of processing a computer system performs, which can be distributed across multiple cloud resources for efficient execution

Evolution of Cloud Computing

  • Mainframe computing (1950s-1970s) centralized computing resources, accessed through terminals, laying the foundation for modern cloud computing concepts
  • Client-server model (1980s-1990s) distributed computing tasks between clients and servers, enabling resource sharing and remote access
  • Grid computing (1990s-2000s) connected geographically distributed computers to solve complex problems, introducing the concept of resource pooling
  • Utility computing (early 2000s) provided computing resources as a metered service, similar to electricity or water, paving the way for pay-as-you-go cloud services
  • Cloud computing (2006-present) emerged with Amazon Web Services (AWS), followed by other providers like Google Cloud Platform and Microsoft Azure
    • Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) models were introduced, offering different levels of abstraction and control
  • Serverless computing (2014-present) enables developers to focus on writing code without managing underlying infrastructure, with providers dynamically allocating resources as needed

Cloud Service Models

  • Infrastructure as a Service (IaaS) provides virtualized computing resources over the internet, including servers, storage, and networking
    • Customers have control over operating systems, storage, and deployed applications, while the cloud provider manages the underlying infrastructure
    • Examples include Amazon EC2, Google Compute Engine, and Microsoft Azure Virtual Machines
  • Platform as a Service (PaaS) offers a complete development and deployment environment in the cloud, including tools for building, testing, and deploying applications
    • Developers can focus on writing code without worrying about infrastructure management or software updates
    • Examples include Google App Engine, Microsoft Azure App Service, and Heroku
  • Software as a Service (SaaS) delivers software applications over the internet, accessible through a web browser or API
    • Users can access the software from any device with an internet connection, without the need for installation or maintenance
    • Examples include Salesforce, Google Workspace (formerly G Suite), and Microsoft Office 365
  • Function as a Service (FaaS) allows developers to execute individual functions in response to events or requests, without managing servers or infrastructure
    • Cloud providers dynamically allocate resources and charge based on the number of function executions and their duration
    • Examples include AWS Lambda, Google Cloud Functions, and Microsoft Azure Functions

Virtualization Technologies

  • Hypervisor (virtual machine monitor) creates and manages virtual machines (VMs), allowing multiple operating systems to run concurrently on a single physical machine
    • Type 1 hypervisors (bare-metal) run directly on the host's hardware, providing better performance and security (Xen, Microsoft Hyper-V, VMware ESXi)
    • Type 2 hypervisors (hosted) run as a software layer on top of an existing operating system, offering flexibility and ease of use (Oracle VirtualBox, VMware Workstation)
  • Virtual machines (VMs) emulate physical computers, with their own virtual hardware (CPU, memory, storage) and operating system, isolated from other VMs and the host
  • Containers provide a lightweight alternative to VMs, packaging an application and its dependencies into a single unit that can run consistently across different environments
    • Containers share the host operating system kernel, resulting in faster startup times and lower overhead compared to VMs
    • Examples include Docker, Kubernetes, and Linux Containers (LXC)
  • Software-defined networking (SDN) decouples network control from the underlying hardware, enabling programmatic management and dynamic configuration of network resources
  • Storage virtualization abstracts physical storage devices, presenting them as a single logical unit, facilitating data management, backup, and disaster recovery

Cloud Infrastructure and Architecture

  • Data centers house the physical infrastructure (servers, storage, networking) that powers cloud computing services, designed for reliability, security, and efficiency
  • Compute resources include virtual machines (VMs), containers, and serverless computing options, provisioned on-demand to meet application requirements
  • Storage services offer various options for storing and accessing data in the cloud, such as object storage (Amazon S3), block storage (Amazon EBS), and file storage (Google Cloud Filestore)
    • Storage tiers (hot, warm, cold) cater to different access frequency and latency requirements, optimizing costs based on data usage patterns
  • Networking infrastructure connects cloud resources and enables communication between services, typically using software-defined networking (SDN) for flexibility and scalability
    • Virtual private cloud (VPC) isolates cloud resources within a logically separated network, ensuring security and control over network configuration
  • Load balancing distributes incoming traffic across multiple resources (servers, VMs, containers) to optimize performance, ensure high availability, and handle failover scenarios
  • Content delivery networks (CDNs) cache static content in geographically distributed edge locations, reducing latency and improving user experience for globally accessed applications
  • API gateways act as a single entry point for managing, securing, and monitoring API requests, facilitating communication between services and enabling serverless architectures

Deployment Models and Strategies

  • Public cloud offers computing resources and services over the public internet, owned and operated by a third-party provider (AWS, Google Cloud, Microsoft Azure)
    • Resources are shared among multiple tenants, with the provider responsible for infrastructure management, security, and maintenance
    • Offers high scalability, flexibility, and cost-efficiency, with a pay-as-you-go pricing model
  • Private cloud delivers cloud computing services exclusively for a single organization, either on-premises or hosted by a third-party provider
    • Provides greater control, security, and customization compared to public cloud, but requires significant upfront investment and ongoing maintenance
    • Suitable for organizations with strict regulatory compliance or data sovereignty requirements
  • Hybrid cloud combines public and private cloud environments, allowing workloads to move between them based on specific requirements (cost, performance, security)
    • Enables organizations to leverage the scalability of public cloud while keeping sensitive data or critical workloads in a private cloud environment
    • Requires careful planning and management to ensure seamless integration and data synchronization between the two environments
  • Multi-cloud strategy involves using services from multiple cloud providers to avoid vendor lock-in, optimize costs, and leverage best-of-breed services
    • Requires expertise in managing and integrating different cloud platforms, as well as ensuring data portability and interoperability
  • Lift and shift (rehosting) migration strategy involves moving existing applications to the cloud without significant modifications, minimizing initial effort but potentially limiting cloud benefits
  • Refactoring (rearchitecting) migration strategy involves redesigning applications to take full advantage of cloud-native features and services, maximizing benefits but requiring more effort and resources

Security and Privacy Considerations

  • Data encryption protects sensitive information both at rest (stored) and in transit (during transmission) using encryption algorithms and key management systems
    • Client-side encryption ensures data is encrypted before being sent to the cloud, with the keys held by the customer for added security
    • Server-side encryption is performed by the cloud provider, offering convenience but requiring trust in the provider's security practices
  • Access control mechanisms (IAM) regulate who can access cloud resources and what actions they can perform, based on the principle of least privilege
    • Multi-factor authentication (MFA) adds an extra layer of security by requiring users to provide additional verification (e.g., a code from a mobile app) during login
    • Role-based access control (RBAC) assigns permissions to users based on their roles and responsibilities within the organization
  • Network security measures protect cloud resources from unauthorized access and potential threats, using tools like firewalls, intrusion detection/prevention systems (IDS/IPS), and virtual private networks (VPNs)
  • Compliance with industry standards and regulations (HIPAA, GDPR, PCI-DSS) ensures that cloud services meet specific security and privacy requirements for sensitive data and applications
  • Shared responsibility model defines the division of security responsibilities between the cloud provider and the customer, depending on the service model (IaaS, PaaS, SaaS)
    • The provider is responsible for securing the underlying infrastructure, while the customer is responsible for securing their applications, data, and user access
  • Security monitoring and incident response processes help detect, investigate, and mitigate security breaches or anomalies in the cloud environment
    • Cloud providers offer native security tools and services (AWS GuardDuty, Google Cloud Security Command Center) for continuous monitoring and threat detection
    • Third-party security solutions can be integrated to enhance visibility and protection across multiple cloud platforms and on-premises systems

Performance and Scalability

  • Auto-scaling dynamically adjusts the number of resources (VMs, containers) based on predefined rules or metrics (CPU utilization, request rate) to handle fluctuating workloads
    • Horizontal scaling (scaling out) adds more instances to distribute the load, suitable for stateless applications or those that can be easily parallelized
    • Vertical scaling (scaling up) increases the capacity of existing instances (CPU, memory), suitable for stateful applications or databases that require more resources
  • Load balancing distributes incoming traffic across multiple resources to optimize performance, ensure high availability, and handle failover scenarios
    • Application load balancers (ALBs) operate at the application layer (HTTP/HTTPS), routing traffic based on content and supporting advanced features like sticky sessions and SSL termination
    • Network load balancers (NLBs) operate at the transport layer (TCP/UDP), providing high-speed, low-latency traffic distribution for non-HTTP applications or those requiring high throughput
  • Caching stores frequently accessed data or computations in fast, temporary storage (memory, SSDs) to reduce latency and improve application performance
    • In-memory caches (Redis, Memcached) store data in RAM for extremely fast access, ideal for session management, real-time analytics, or content delivery
    • Content delivery networks (CDNs) cache static content in geographically distributed edge locations, reducing latency for users accessing the application from different regions
  • Serverless computing (FaaS) automatically scales resources based on the number of incoming requests or events, allowing developers to focus on writing code without managing infrastructure
    • Serverless platforms (AWS Lambda, Google Cloud Functions) handle the provisioning, scaling, and management of the underlying resources, charging only for the actual execution time and resources consumed
  • Performance monitoring and optimization tools help identify bottlenecks, analyze resource utilization, and fine-tune application performance in the cloud
    • Cloud providers offer native monitoring solutions (Amazon CloudWatch, Google Cloud Monitoring) for collecting and visualizing metrics, logs, and events
    • Third-party application performance management (APM) tools (New Relic, Datadog) provide deeper insights into application behavior, user experience, and performance anomalies

Real-World Applications and Case Studies

  • Netflix leverages AWS for its global streaming service, using a microservices architecture, auto-scaling, and content delivery networks (CDNs) to deliver high-quality video to millions of users
    • Netflix uses AWS Lambda for serverless computing, Amazon S3 for storage, and Amazon DynamoDB for its distributed database needs
    • The company's cloud-native approach enables rapid innovation, scalability, and resilience, ensuring a seamless user experience across different devices and regions
  • Airbnb migrated its infrastructure to AWS to handle its rapid growth and global expansion, using a mix of IaaS, PaaS, and SaaS services
    • Airbnb uses Amazon EC2 for compute, Amazon RDS for managed databases, and Amazon ECS for container orchestration
    • The company also leverages machine learning services (Amazon SageMaker) for personalized recommendations and fraud detection, improving user experience and trust
  • Spotify transitioned from on-premises data centers to Google Cloud Platform (GCP) to scale its music streaming service and leverage advanced data analytics capabilities
    • Spotify uses Google Compute Engine for compute, Google Cloud Storage for storing audio files, and Google Cloud Dataproc for big data processing (Hadoop, Spark)
    • The company also employs machine learning (TensorFlow) and data analytics (BigQuery) to personalize playlists, recommend songs, and gain insights into user behavior
  • The New York Times used Google Cloud to digitize its entire archive of over 5 million historical articles, making them searchable and accessible to readers worldwide
    • The company used Google Cloud Storage for storing the scanned images and OCR data, and Google Cloud Dataflow for parallel processing and indexing of the articles
    • The project showcased the power of cloud computing in preserving and democratizing access to historical information, enabling new research and discovery opportunities
  • Coca-Cola Enterprises migrated its global IT infrastructure to Microsoft Azure, leveraging a hybrid cloud approach for its mission-critical applications and data
    • The company uses Azure Virtual Machines for compute, Azure SQL Database for managed databases, and Azure Active Directory for identity and access management
    • The migration helped Coca-Cola Enterprises achieve greater agility, cost savings, and global consistency, while ensuring compliance with local data sovereignty regulations


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.