Security in Charmed Ceph
Operating a Charmed Ceph cluster involves managing distributed storage components orchestrated by Juju. Ensuring the security of this system is necessary to protect data integrity and confidentiality. This guide provides an overview of security aspects, potential attack vectors, and some best practices for deploying and operating Charmed Ceph in a secure manner.
Architectural Overview
Understanding the Charmed Ceph architecture is the first step towards securing it. The diagram below depicts a typical Charmed Ceph deployment with ceph-radosgw and ceph-fs deployed. Note that the Charmed Ceph ecosystem is flexible and can be tailored to a specific use case, so the architecture here is just an example.
Components
- Admin workstation: operators’ machine used to manage and monitor the deployment.
- Juju controller: Orchestrates the Juju model that contains all Ceph services.
- ceph-mon: Ceph Monitor (MON) daemons; usually deployed as three units in an HA setup.
- ceph-osd: Hosts Ceph Object Storage Daemons (OSDs); each unit manages one or more data disks.
- ceph-radosgw: RGW (object‑storage) service; typically deployed in a highly‑available configuration with at least three units.
- ceph-fs: Metadata Server (MDS) daemons for CephFS; typically deployed in an HA set of at least three units.
- Client workloads: Consume Ceph storage via RBD block devices, RGW object buckets, or CephFS shared filesystems.
Attack Surface
The attack surface encompasses all points where an unauthorized user could attempt to enter or extract data from the system. For Charmed Ceph, these include:
Open Ports and Network Interfaces
Ceph daemons by default listen on the TCP ports below.
Port | Component | Purpose | Security Considerations |
---|---|---|---|
3300, 6789 | Ceph MON | Monitor daemon client communication | Should ideally be restricted to internal networks and specific client subnets via firewall. |
6800-7300 | Ceph OSD / MGR / MDS | Intra-cluster communication | Must be strictly firewalled from external access. Essential for cluster operation. |
80 | RGW (HTTP) | RADOS Gateway (Object storage HTTP access) | Object storage access. Disable if not needed. |
443 | RGW (HTTPS) | RADOS Gateway secure traffic (HTTPS) | Object storage access. Disable if not needed. Requires TLS certificate management. |
9283 | MGR (Dashboard) | Ceph Dashboard HTTPS access | Access should be restricted. Authentication is required. |
9128 | MGR (Prometheus) | Prometheus metrics endpoint | Restrict access to monitoring servers. |
22 | SSH | Host OS access | Standard SSH hardening practices (key auth, restricted access). |
17070 | Juju Agent | Juju agent communication with Controller | Communication is TLS encrypted. Access to hosts implies potential access to agents. |
Other (various) | Other Services | Potentially other services running on hosts | Audit open ports on cluster nodes. |
Network Protocols and Endpoints
- Ceph Protocol (Messenger v1/v2): Used for all internal Ceph communication (MON, OSD, MGR, MDS). Messenger v2 (default in newer Ceph versions) provides encryption capabilities for data in transit.
- Cephx Authentication: Primary mechanism for authenticating Ceph internal and client communication. It provides mutual authentication between clients/daemons and the MONs.
- HTTP/HTTPS (RGW): Used for S3/Swift access via the RADOS Gateway. HTTPS with strong TLS configuration is best practice for protecting data and credentials in transit, especially if RGW is externally accessible.
- Juju Agent Protocol: Communication between Juju agents and the controller is encrypted with TLS.
Data Interfaces
- Block Devices and Filesystems: OSDs interact directly with underlying storage (disks or logical volumes). The OSD processes require elevated privileges to access these devices. The ceph-osd charm provides an option to limit capabilities via AppArmor – this should be used as a best practice.
- CephFS Mounts: Clients mounting CephFS interact via the Ceph kernel module or FUSE, requiring Cephx authentication.
Management Infrastructure (Juju)
Juju itself presents a management attack surface:
- Juju Controller: Gaining access to the Juju controller provides complete control over the entire deployment. Secure controller access using strong credentials and network restrictions.
- Juju Agents: Agents run on each machine managed by Juju. Compromise of a host machine could potentially lead to compromise of the agent and interaction with the controller.
- Charms and Configuration: Configuration applied via Juju (including charm configurations and relations) can impact security. Review charm options of ceph and related charms.
Refer to the Official Juju Security Documentation for more details on securing Juju itself.
Access Controls
Robust access controls limit users and services to only the permissions they require.
Cephx Authentication and Authorization
Cephx is the native Ceph authentication system. It operates with shared secret keys:
- Key Types: Ceph uses different keys for different roles (e.g., keyring, admin, osd, mds, client). These keys represent common functional groupings or specific default keys (client.admin).
- Capabilities (Caps): Each key has associated capabilities defining what actions are permitted (e.g., mon ‘allow r’, osd ‘allow *’, mgr ‘allow profile security-cluster’).
User Management (Ceph Dashboard / RGW)
- Dashboard Users: Manage user accounts and roles within the Ceph Dashboard for accessing monitoring and limited management functions.
- RGW Users: RGW has its own user management system for S3/Swift access, separate from Cephx cluster users. Manage RGW users, keys (access key, secret key), and potentially quotas.
Management Infrastructure Access (Juju)
Control access at multiple levels:
- Host OS Access (POSIX permissions): Standard Linux user/group permissions and access controls (e.g. SSH keys, sudo rules) on the machines hosting Ceph components and Juju agents.
- Juju Permissions: Utilize Juju’s Role-Based Access Control (RBAC) to manage who can access controllers and models, and what actions they can perform.
- Elevated Privileges: Processes like OSDs typically require root-level privileges for device access. This risk is mitigated by security profiles, but careful host security remains essential.
Secrets
Secrets are sensitive pieces of information that must be protected:
- Cephx Keys: Stored typically in keyrings (/etc/ceph/ceph.client.admin.keyring, etc.) on relevant nodes or within Juju’s configuration.
- TLS Certificates & Keys: For RGW HTTPS, Ceph Dashboard, and potentially Messenger v2 encryption. Securely store private keys with restricted permissions.
- RGW User Keys: S3/Swift access and secret keys. Manage these carefully; treat them like passwords.
Encryption
Protecting data confidentiality both in transit and at rest:
- In Transit:
- Messenger v2: Configure Ceph internal communication (between MON, OSD, MGR, MDS) to use secure mode, encrypting traffic.
- TLS at RGW: Essential for encrypting S3/Swift traffic between clients and the RGW. Use strong TLS protocols (TLS 1.2+) and ciphers. Obtain certificates from a trusted CA or manage an internal PKI. Configure via Juju relations or charm options.
- Ceph Dashboard HTTPS: The dashboard uses HTTPS by default. Ensure the certificate is valid and trusted.
- Juju Communication: Juju controller-agent communication is secured with TLS automatically.
- At Rest:
- OSD Encryption (via LUKS): Ceph supports encrypting data stored on OSDs using LUKS. This protects data if physical drives are stolen. Charmed Ceph allows enabling OSD encryption during deployment (osd-encrypt option). Key management for LUKS needs to be handled carefully.
- Full Disk Encryption (FDE): Consider encrypting the entire host OS disk, especially for MON nodes holding cluster maps and keys, and RGW nodes potentially caching data. This adds another layer of protection against physical access, managed at the OS level.
Secure Deployment
Incorporate security from the beginning of your Charmed Ceph deployment.
Network Architecture
- Segmentation: Use separate physical or logical (VLAN) networks for different access levels
- External (optional): If applicable, expose specific endpoints for external untrusted consumption, e.g. RGW.
- Storage Access: Client access (including RGW if no external access provided), MON access.
- Cluster Network: OSD replication and heartbeat traffic. Isolating this improves performance and security.
- Firewalls: Implement strict firewall rules (e.g. using iptables, nftables) on all nodes:
- Deny all traffic by default.
- Allow only necessary ports between specific hosts/networks (refer to the port table).
- Restrict access to management interfaces (SSH, Juju, Dashboard) to trusted administrative networks.
Minimum Privileges
- Cephx Keys: Create dedicated Cephx keys for each client/application with the minimum required capabilities. Do not use the admin key for routine access.
- Juju Roles: Assign Juju users the least permissive role (e.g., read, write) necessary for their tasks on specific models. Reserve admin rights carefully.
- OS Users: Limit sudo access on host machines. Run services under dedicated, unprivileged users where possible (though OSDs inherently require higher privileges for device access, mitigated by containers/snaps). Apply the least privilege principle rigorously across all layers.
- Explicit Assignment: Ensure that all access, whether via Cephx, Dashboard, or RGW, relies on explicit assignment of permissions/capabilities rather than default permissive settings. Limit permissions strictly to what is needed for the operation.
Auditing and Centralized Logging
- Enable Auditing: Configure Ceph logging to capture significant events.
- Centralized Logging: Forward logs from all Ceph nodes, host systems (syslog, auth.log), and Juju components to a central logging system (such as Loki, Splunk). This facilitates correlation and analysis.
- Monitor and Audit: Regularly review logs for anomalies, security events (e.g. repeated auth failures).
Alerting
- Configure Monitoring: Use the built-in Ceph monitoring (Prometheus exporter via MGR module) and integrate it with an alerting system such as the Canonical Observability Stack.
- Security Alerts: Configure alerts for security anomalies and critical health issues, such as:
- Ceph health status changes (HEALTH_WARN, HEALTH_ERR).
- Daemon crashes or restarts.
- Near-full OSDs/pools.
- Significant performance deviations.
Secure Operation
Maintaining security is an ongoing process. This section aims to give a brief overview of best practices for secure operations.
Vulnerability Management
- Monitor Advisories: Actively track Common Vulnerabilities and Exposures (CVEs) and security advisories for Ceph, Juju, Ubuntu/Host OS, Kernel, and related software. Use resources like Ubuntu Security Notices (USNs) and the Ceph announce list.
- Patch Management: Implement a robust process for testing and applying security patches promptly. Prioritize critical vulnerabilities. Use Juju for orchestrated upgrades of Ceph charms.
Incident Response
- Develop a Plan: Have a clearly documented Incident Response (IR) plan tailored to your Charmed Ceph environment.
- Define Steps: The plan should cover standard IR phases, e.g.:
- How to detect incidents (monitoring, logs, reports).
- Isolating affected systems/components.
- Removing the threat and fixing the vulnerability.
- Safely restoring services and data.
- Post-incident analysis to improve defenses.
- Practice: Regularly test the plan through drills/simulations.
Perform Audits
- Regular Checks: Conduct periodic security audits of the cluster.
- Validate Controls: Verify configurations (encryption, network rules), permissions (Cephx caps, Juju roles, OS access), and access controls frequently.
Perform Upgrades
- Stay Current: Regularly upgrade Ceph (point releases often contain security fixes), Juju, Snapd, and the underlying OS to benefit from the latest security patches and features.
- Schedule Proactively: Plan and schedule updates, especially for security vulnerabilities. Test upgrades in a staging environment before applying to production. Follow documented upgrade procedures for Ceph and Juju.
Release Notes
- Always read the release notes for Ceph, Juju, Ubuntu, and related components before performing upgrades or making significant configuration changes.
- Release notes contain information about security enhancements, bug fixes (including security fixes), potential breaking changes, and known issues that might impact security or stability.