Configure Data Collection

[edit on GitHub]

_images/chef_automate_full.png

Automatic Node Run Data Collection via Chef Server

Note

Chef 12.16.42 or greater and Chef Server 12.11.0 or greater are required.

Nodes can send their run data to Chef Automate through the Chef server automatically. To enable this functionality, you must perform the following steps:

  • Configure a Data Collector token in Chef Automate
  • Configure your Chef server to point to Chef Automate

Configure a Data Collector token in Chef Automate

All messages sent to Chef Automate are performed over HTTP and are authenticated with a pre-shared key called a “token.” A default token is configured for every Chef Automate installation, but it is recommended that you create your own.

To set your own token, add the following to your /etc/delivery/delivery.rb file:

data_collector['token'] = 'sometokenvalue'

... and then run automate-ctl reconfigure

If you do not configure a token, the default token value is: 93a49a4f2482c64126f7b6015e6b0f30284287ee4054ff8807fb63d9cbd1c506

Configure your Chef server to point to Chef Automate

In addition to forwarding Chef run data to Automate, Chef server will send messages to Chef Automate whenever an action is taken on a Chef server object, such as when a cookbook is uploaded to the Chef server or when a user edits a role.

To enable this feature, add the following settings to /etc/opscode/chef-server.rb on the Chef server:

data_collector['root_url'] = 'https://my-automate-server.mycompany.com/data-collector/v0/'
data_collector['token'] = 'TOKEN'

where my-automate-server.mycompany.com is the fully-qualified domain name of your Chef Automate server, and TOKEN is either the default value or the token value you configured in the prior section.

Save the file and run chef-server-ctl reconfigure to complete the process.

Additional configuration options include:

  • data_collector['timeout']: timeout in milliseconds to abort an attempt to send a message to the Chef Automate server. Default: 30000.
  • data_collector['http_init_count']: number of Chef Automate HTTP workers Chef server should start. Default: 25.
  • data_collector['http_max_count']: maximum number of Chef Automate HTTP workers Chef server should allow to exist at any time. Default: 100.
  • data_collector['http_max_age']: maximum age a Chef Automate HTTP worker should be allowed to live, specified as an Erlang tuple. Default: {70, sec}.
  • data_collector['http_cull_interval']: how often Chef server should cull aged-out Chef Automate HTTP workers that have exceeded their http_max_age, specified as an Erlang tuple. Default: {1, min}.
  • data_collector['http_max_connection_duration']: maximum duration an HTTP connection is allowed to exist before it is terminated, specified as an Erlang tuple. Default: {70, sec}.

Configure High Availability servers to send server object data

To configure front-end servers in your HA cluster to send their object data, perform the previous steps for configuring a Chef server as well as ensure that the fqdn field in all of your front-end Chef server chef-server.rb files are the same.

The following example sets the fqdn field to "my-chef-server.mycompany.com" in two front-end servers.

chef-server.rb.FE1

# This file generated by chef-backend-ctl gen-server-config
# Modify with extreme caution.
fqdn "my-chef-server.mycompany.com"
use_chef_backend true
data_collector['root_url'] = 'https://my-automate-server.mycompany.com/data-collector/v0/'
data_collector['token'] = 'TOKEN'

chef-server.rb.FE2

# This file generated by chef-backend-ctl gen-server-config
# Modify with extreme caution.
fqdn "my-chef-server.mycompany.com"
use_chef_backend true
data_collector['root_url'] = 'https://my-automate-server.mycompany.com/data-collector/v0/'
data_collector['token'] = 'TOKEN'

Warning

Failure to set the fqdn field to the same value will result in Chef Automate treating data from each of these front-end servers as separate Chef servers.

Sending Node Run Data to Chef Automate Directly

If you do not use a Chef server in your environment (if you only use chef-solo, for example) or you do not wish to use the automatic forwarding of run data to Automate, you can configure your Chef clients to send their run data to Automate directly.

To enable this functionality, you must perform the following steps:

  • Configure a Data Collector token in Chef Automate (see prior section)
  • Add Chef Automate SSL certificate to trusted_certs directory
  • Configure Chef Client to use the Data Collector endpoint in Chef Automate

Add Chef Automate certificate to trusted_certs directory

Note

This step only applies to self-signed SSL certificates. If you are using an SSL certificate signed by a valid certificate authority, you may skip this step.

Chef requires that the self-signed Chef Automate SSL certificate (HOSTNAME.crt) is located in the /etc/chef/trusted_certs directory on any node that wants to send data to Chef Automate. This directory is the location into which SSL certificates are placed when a node has been bootstrapped with chef-client.

To fetch the certificate onto your workstation, use knife ssl fetch and pass in the URL of the Chef Automate server. You can then use utilities such as scp or rsync to copy the downloaded cert files from your .chef/trusted_certs directory to the /etc/chef/trusted_certs directory on the nodes in your infrastructure that will be sending data directly to the Chef Automate server.

Configure Chef Client to use the Data Collector endpoint in Chef Automate

Note

Chef version 12.12.15 or greater is required.

The data collector functionality is used by the Chef client to send node and converge data to Chef Automate. This feature works for the following: Chef client, and both the default and legacy modes of Chef solo.

To send node and converge data to Chef Automate, modify your Chef config (that is client.rb, solo.rb, or add an additional config file in an appropriate directory, such as client.d) to contain the following configuration:

data_collector.server_url "https://my-automate-server.mycompany.com/data-collector/v0/"
data_collector.token "TOKEN"

where my-automate-server.mycompany.com is the fully-qualified domain name of your Chef Automate server and TOKEN is the token value you configured in the earlier step.

Additional configuration options include:

  • data_collector.mode: The mode in which the data collector is allowed to operate. This can be used to run data collector only when running as Chef solo but not when using Chef client. Options: :solo, :client, or :both. Default: :both.
  • data_collector.raise_on_failure: When the data collector cannot send the “starting a run” message to the data collector server, the data collector will be disabled for that run. In some situations, such as highly-regulated environments, it may be more reasonable to prevent Chef from performing the actual run. In these situations, setting this value to true will cause the Chef run to raise an exception before starting any converge activities. Default: false.
  • data_collector.organization: A user-supplied organization string that can be sent in payloads generated by the data collector when Chef is run in Solo mode. This allows users to associate their Solo nodes with faux organizations without the nodes being connected to an actual Chef server.

Sending Compliance Data to Chef Automate

To send compliance data gathered by InSpec as part of a Chef client run, you will need to use the audit cookbook. All profiles, which are configured to run during the audit cookbook execution, will send their results back to the Chef Automate server.

To configure the audit cookbook, you will first need to configure the Chef client to send node converge data, as previously described. The data_collector.server_url and data_collector.token values will be used as the reporting targets. Once you have done that, configure the the audit cookbook’s collector by setting the audit.collector attribute to chef-visibility.

A complete audit cookbook attribute configuration would look something like this:

audit: {
  collector: 'chef-visibility',
  profiles: {
    'cis/cis-centos6-level1' => true
  }
}

Sending Habitat Data to Chef Automate

The visibility capabilities of Chef Automate can also be used to collect and report on Habitat ring data. The Prism Habitat package collects this data and sends it to an Chef Automate server’s REST API endpoint. You can configure settings like the data collector URL, token, the Habitat supervisor used to get the ring information, and so on. For more information on the Prism package, see Habitat Prism. For more information on Habitat, see the Habitat site.

Use an external Elasticsearch cluster (optional)

Chef Automate uses Elasticsearch to store its data, and the default Chef Automate install includes a single Elasticsearch service. This is sufficient to run production work loads; however for greater data retention, we recommend using a multi-node Elasticsearch cluster with replication and sharding to store and protect your data.

Prerequisites

  • Chef Automate server
  • Elasticsearch (version 2.4.1 or greater; however, v5.x is not yet supported)

Elasticsearch configuration

To utilize an external Elasticsearch installation, set the following configuration option in your /etc/delivery/delivery.rb:

elasticsearch['urls'] = ['https://my-elaticsearch-cluster.mycompany.com']

The elasticsearch['urls'] attribute should be an array of Elasticsearch nodes over which Chef Automate will round-robin requests. You can also supply a single entry which corresponds to a load-balancer or a third-party Elasticsearch-as-a-service offering.

After saving the file, run sudo automate-ctl reconfigure.

An additional Elasticsearch-related configuration properties is elasticsearch['host_header']. This is the HTTP Host header to send with the request. When this attribute is unspecified, the default behavior is as follows:

  • If the urls parameter contains a single entry, the host of the supplied URI will be sent as the Host header.
  • If the urls parameter contains more than one entry, no Host header will be sent.

When this attribute is specified, the supplied string will be sent as the Host header on all requests. This may be required for some third-party Elasticsearch offerings.

Troubleshooting: My data does not show up in the UI

If an organization does not have any nodes associated with it, it does not show up in the Nodes section of the Chef Automate UI. This is also true for roles, cookbooks, recipes, attributes, resources, node names, and environments. Only those items that have a node associated with them will appear in the UI. Chef Automate has all the data for all of these, but does not highlight them in the UI. This is designed to keep the UI focused on the nodes in your cluster.