In-Place A2HA to Automate HA

Note

Chef Automate 4.10.1 released on 6th September 2023 includes improvements to the deployment and installation experience of Automate HA. Please read the blog to learn more about key improvements. Refer to the pre-requisites page (On-Premises, AWS) and plan your usage with your customer success manager or account manager.

Warning

A2HA user can be migrated to Automate HA with a minimum Chef Automate version 20201230192246.

This page explains the in-place migration of A2HA to Automate HA. This migration involves the following steps:

Prerequisites

A healthy state of the A2HA cluster to take fresh backup.
A2HA is configured to take backup on a mounted network drive (location example: /mnt/automate_backup).
Availability of 60% of space.

Capture information about the current A2HA instance

In order to verify the migration is completed successfully we’ll need to capture some information about the current installation. The following script will capture counts of objects in the Chef-Infra Server that we can compare with the server after the migration has been completed.

Create capture_infra_counts.sh and run it using ./capture_infra_counts.sh > pre_migration_infra_counts.log

#!/usr/bin/bash

for i in `chef-server-ctl org-list`; do
  org=https://localhost/organizations/$i
  echo "Orgination: ${i}"
  echo -n "node count: "
  knife node list -s $org | wc -l
  echo -n "client count: "
  knife client list -s $org | wc -l
  echo -n "cookbook count: "
  knife cookbook list -s $org | wc -l
  echo -n "total objects: "
  knife list / -R -s $org | wc -l
  echo "----------------"
done

Taking Backup and clean up of instances

Take the latest backup of A2HA by running the following commands from any automate instance:
```
sudo chef-automate backup create
```
The above command will store the backup in a configured backup patch in a2ha.rb config file /hab/a2_deploy_workspace/a2ha.rb. Once the backup is completed successfully, save the backup Id. For example: 20210622065515. To use the backup created previously, run the following command on Automate node to get the backup id:
```
chef-automate backup list
```
The output looks like as shown below:
```
Backup             State       Age
20180508201548    completed  8 minutes old
20180508201643    completed  8 minutes old
20180508201952    completed  4 minutes old
```
Create a bootstrap bundle from one of automate node using the following command:
```
sudo chef-automate bootstrap bundle create bootstrap.abb
```
The above command will create the bootstrap bundle and copy the same bundle to bastion or backup-dir.
Stop each of the frontend nodes (automate and chef-server) using the following command:
```
sudo chef-automate stop
```
Rename /hab dir to something else like /hab-old.
Remove the following files:
- /bin/chef-automate
- /bin/hab
- /bin/hab-launch
- /bin/hab-sup
Unload services from each of the PostgreSQL Nodes:
```
sudo hab svc unload chef/automate-backend-postgresql
sudo hab svc unload chef/automate-backend-metricbeat
sudo hab svc unload chef/automate-backend-journalbeat
sudo hab svc unload chef/automate-backend-haproxy
sudo hab svc unload chef/automate-backend-pgleaderchk
```
Check the status using the hab svc status command. None of the services should be running. Once checked, stop the habitat supervisor with the command systemctl stop hab-sup. Rename /hab dir to something else like /hab-old.

Unload services from each of the Elasticsearch Nodes

sudo hab svc unload chef/automate-backend-elasticsidecar
sudo hab svc unload chef/automate-backend-elasticsearch
sudo hab svc unload chef/automate-backend-journalbeat
sudo hab svc unload chef/automate-backend-metricbeat
sudo hab svc unload chef/automate-backend-curator
sudo hab svc unload chef/automate-backend-kibana

Check the status using the hab svc status command. None of the services should be running. Once checked, stop the habitat supervisor with the command systemctl stop hab-sup. Rename /hab dir to something else like /hab-old.

In the bastion host, take a copy of your current workspace and keep it safe for a while.
Remove or Rename /hab dir in the bastion host.

Installing the Latest Automate HA

Follow Automate HA installation documentation to know more about config.toml file and provide the same IPs and backup config in config.toml as in the a2ha.rb file.

File System backup configuration

In case the backup configuration was skipped in the deployment config.toml, the User needs to configure EFS backup manually in Automate HA.

Note

While configuring the backup configuration provide the path of Elasticsearch instead of OpenSearch as A2HA backup was in Elasticsearch directory like instead of /mnt/automate_backups/opensearch/ it will be /mnt/automate_backups/elasticsearch/.

Restore Backup

Once deployment is successful, proceed with restoring the backup in Automate HA. For more information, see On-Premise Deployment using Filesystem.

sudo chef-automate config show > current_config.toml

Find the following config in the current_config.toml file and update it to look like the following:

[global.v1.external.opensearch.auth.basic_auth]
    username = "admin"
    password = "admin"

AND

[global.v1.external.opensearch.backup.fs]
    path = "/mnt/automate_backups/elasticsearch"

Copy the bootstrap.abb bundle to all the Frontend nodes of the Chef Automate HA cluster. Unpack the bundle using the below command on all the Frontend nodes:

sudo chef-automate bootstrap bundle unpack bootstrap.abb

To restore, use the below command from same automate node, Make sure to stop all other frontend nodes using chef-automate stop:

sudo chef-automate backup restore /mnt/automate_backups/backups/20210622065515/ --patch-config current_config.toml --airgap-bundle /var/tmp/frontend-4.x.y.aib --skip-preflight

Note

After the restore command is successfully executed, run the chef-automate config show command. Both the Elasticsearch and OpenSearch configs are part of Automate Config. Keep both configs; it won’t impact the functionality. After restoring Automate HA is configured to communicate with OpenSearch.

We can remove the Elasticsearch config from the automate. To do that, redirect the applied config to the file and set the config again.

chef-automate config show > applied_config.toml

Modify applied_config.toml, remove the elastic search config, and set the config. Set applied_config.toml on all the frontend nodes manually. As the removal of config is not supported from the bastion. Use the below command to set the config manually on each Frontend node.

chef-automate config set applied_config.toml

To know more about the usage of S3 backup, see On-Premise Deployment using Object Storage page.

Note

Once Automate HA is up and running with restored data, We can remove old backed-up directories sudo rm -rf hab-old, freeing up acquired space.
Reset the backup configuration path to OpenSearch so that new backups will be stored in OpenSearch directory. For more information, see configuration for automate node from provision host.

Validate successful migration

Check the Automate UI of Automate HA. Check whether the data is present in Automate UI for HA.

If you are using the embedded chef server, log in to the Chef Server HA node, and run the following script to get a count of objects from the Chef Infra Server, this should match the counts captured at the start of the migration

Create capture_infra_counts.sh and run it using ./capture_infra_counts.sh > post_migration_infra_counts.log

#!/usr/bin/bash

for i in `chef-server-ctl org-list`; do
    org=https://localhost/organizations/$i
    echo "Orgination: ${i}"
    echo -n "node count: "
    knife node list -s $org | wc -l
    echo -n "client count: "
    knife client list -s $org | wc -l
    echo -n "cookbook count: "
    knife cookbook list -s $org | wc -l
    echo -n "total objects: "
    knife list / -R -s $org | wc -l
    echo "----------------"
done

Compare the pre migration to post migration counts diff pre_migration_infra_counts.log post_migration_infra_counts.log

Connect Chef-Workstation to the new cluster and use knife to communicate with Automate HA
1. Open the ~/.chef/config.rb, ~/.chef/knife.rb or ~/.chef/credentials file from a Chef-Workstation and update the chef_server_url with the Automate fqdn.
  Example: chef_server_url "https://<automate-fqdn>/organizations/new_org"
2. Run knife user list, knife node list, or knife cookbook list and verify the commands complete successfully

Troubleshoot

While installing the new Automate HA, if PostgreSQL is having any issues in starting, and in PostgreSQL instance hab svc status shows a secret key mismatch error, then try the cleanup command with new Automate HA CLI chef-automate cleanup --onprem-deployment and then remove /bin/chef-automate from all frontend nodes, now try the installation again.
See the troubleshooting section to know more if you encounter an error while restoring related to the Elasticsearch snapshot.
While restoring the backup if an error related to backup directory occurs like:

Error in Automate node: failed to create snapshot repository: Elasticsearch repository create request failed for repo** OR Error in OpenSearch node: /mnt/automate_backups/backups/automate-elasticsearch-data/chef-automate-*-service] doesn’t match any of the locations specified by path.repo

Please re-check your EFS backup configuration for the Automate and OpenSearch node.