Capacity planning
This page provides guidance relating to capacity planning for your Chef 360 Platform installation rather than hard and fast rules. This is because Chef 360 Platform is a common server platform for many Chef products and features. Depending on the products enabled in Chef 360 Platform and the features used, the requirements will change. Usage patterns also impact scalability because some requests to the Chef 360 Platform Application Programming Interface (API) are more computationally expensive than others.
In general, it is advisable to start with the recommendations for an evaluation cluster and then scale the server as needed. Premature optimization is not recommended and will likely hinder more than help, because it may introduce unnecessary complexity. Additionally, the requirements will change over time depending on the current Chef 360 Platform version, because as new features are released, additional services/containers are added. For example, at launch, Chef 360 Platform only contained Node Management and Chef Courier services. Subsequent releases will add Chef Infra Server, compliance automation, reporting, and additional new add-ons such as monitoring and alerting.
Single node system requirements for an evaluation node
For an evaluation node, the base system requirements are:
- 16 GB of RAM
- 4 vCPUs
- 80 GB of disk space
Recommended installation environment
We highly recommended installing Chef 360 Platform into a virtualized hypervisor (a software or hardware layer that allows multiple Virtual Machines (VMs) to run on a single physical machine, also known as the host). The hypervisor (also called a Virtual Machine Monitor (VMM)) allocates the host’s physical resources (such as CPU, memory, and storage) to the VMs as needed. Each VM has its own operating system and applications, and the hypervisor ensures that the VMs remain independent of each other.
This setup allows you to add new CPU and RAM to a Chef 360 Platform install with less complexity than provisioning new bare metal hardware. Note that Chef 360 Platform is powered by a Kubernetes cluster, so scaling and sizing rules for the Kubernetes runtime are also relevant.
Scaling the Chef 360 Platform Server
The Chef 360 Platform Server itself is highly scalable. A single virtual machine can handle requests for many thousands of nodes. As the scale increases, Chef 360 Platform can be expanded into a multi-node architecture with horizontally scaled front-ends to relieve pressure on system bottlenecks.
However, Chef 360 Platform does not support geographically distributed active-active clustering. It may be best to isolate Chef 360 Platform by identifying a failure domain and deploying a Chef 360 Platform cluster for each domain instead of having a single central, monolithic cluster.
For example, if there are West Coast and East Coast data centers, it may be best to have one Chef 360 Platform cluster in each data center. Deployments to each cluster can be synchronized upstream by CI software. However, this is not the only configuration option; an active-passive configuration, with a single centralized cluster, may be right for your organization. Contact your account team to discuss how to properly identify failure domains and request a consultation for sizing.
Identifying bottlenecks
The primary limiting bottlenecks for Chef 360 Platform cluster installations can be categorized into three groups:
- Input/Output Operations Per Second (IOPS)
- Concurrent Operational Requests (COR), and;
- Active state caching (RAM)
Different usage patterns impact these scaling factors uniquely, and each Chef 360 Platform feature also has different requirements against these three measures.
To aid in sizing, Chef 360 Platform uses a generic scaling metric called CORs, which stands for concurrent operational requests. Not all requests and features place the same load on the server, so sizing Chef 360 Platform clusters involves estimating each core feature’s COR value separately and then aggregating them.
Scaling factors
The following are some features that impact COR calculations:
Node Management
The number of enrolled nodes under management, along with the frequency of node check-in, node metadata sync, and number of skills under management impact the COR.
Courier
Courier CORs are impacted by the average number of active jobs, average steps per job, the average number of targets per job, and the average duration of a job. In addition to the active jobs, it is suggested to add the number of CORs required to run an emergency job on the entire fleet to ensure adequate capacity to respond to critical infrastructure events.
Chef Infra Client runs
Chef client runs per minute
Compliance Phase
The number of compliance profiles per Chef Infra Client run.
Currently, an online sizing calculator is not available. Contact your account team to discuss how to calculate the COR value for your installation.