Skip to main content

Migrate to Imply

A common path to using Imply is to start with open source Apache Druid and then to move to Imply to take advantage of the analytics and operational features in Imply. This topic describes how to migrate from Apache Druid to Imply and the Imply distribution of Apache Druid.

Best practices for migrating from Apache Druid to Imply

  • Before migrating to an Imply STS release, update to the latest Apache Druid release.
  • Before migrating to an Imply LTS release, update to the latest Apache Druid version available.
  • Review the Imply Release notes for changes in the newest version.
  • Upgrade using cluster restarts. Rolling upgrades are possible, but not preferred.
  • Use the same configuration parameters for the services as the Apache Druid version's configuration after reviewing the release notes for parameter changes.
  • If you changed the location of the Apache Druid segment cache, copy the segment cache from the existing Apache Druid cluster before restarting services using the new Imply version. Update the segment metadata with the new location in the druid_segments table.
  • To start a cluster with a high segment count in Imply, follow these steps:
    1. Start all historical processes and wait for the lifecycle to start.
    2. Start all master services (Coordinator and Overlord) and wait for the lifecycle to start.
    3. Start all query node services (Broker and Router) and wait for the lifecycle to start.
    4. Start the Middle Manager and wait for the lifecycle to start.
    5. Resume Supervisors and tasks.

Migrate from Apache Druid to Imply Enterprise

If you are satisfied with your current Druid configuration, you can deploy the Imply distribution in the same configuration with minimal changes:

  1. Ensure that your current version of Apache Druid is the most recent relative to the version of Druid included in the Imply distribution.

  2. Replace or augment your Druid Broker processes with Query servers. Query servers run Druid Routers, Druid Brokers, and Pivot.

  3. Update all your other Druid nodes to run the Imply distribution, continuing to use your existing configurations.

  4. (Optional) To combine your existing Druid Coordinators and Druid Overlords into Master servers:

    1. Configure a Master server using your existing configurations.

    2. Deploy new Master servers with the following command:

      bin/supervise -c conf/supervise/master-without-zk.conf

      The new Master servers connect to your existing ZooKeeper and metadata storage. A best practice is to run with two Master servers to support failover.

      If you are using unmanaged Imply, instead run:

      bin/supervise -c conf/supervise/master-without-zk.conf

    3. Stop your old Druid Coordinators and Druid Overlords.

Migrate to a new Imply Hybrid cluster

The following instructions are organized based on how the new cluster accesses data.

The new cluster uses the same deep storage and metadata storage servers

In this scenario, you must reboot your Druid services:

  1. Stop the old cluster. To prevent data corruption, ensure the old cluster is completely down before starting your new cluster.
  2. Start the new cluster.

The new cluster has no data and can access the old cluster's deep storage

This scenario applies if you are using Imply Hybrid and:

  • The new Imply cluster can access the old cluster's deep storage.
  • The old and the new metadata storage are both MySQL.
  • The new deep storage is clean with no data.
  • The new metadata storage is brand new and clean with no data.

In this scenario:

  1. Copy the druid_config, druid_datasource, druid_supervisors, and druid_segments tables from the old metadata storage to the new metadata storage:
    1. Run mysqldump on the old metadata storage (and Pivot, if used).

    2. Import the mysqldump into the new metadata storage. The output file has a SQL extension because it contains SQL commands. The -p command in mysqlimport asks for a password:

      mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table --no-create-info --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql

  2. Start the new cluster. The Coordinator automatically starts reading the old segments' metadata in the new metadata storage, and then historical nodes load them from the old deep storage. The data in old deep storage remains intact. The old cluster continues writing to the old metadata storage and the old deep storage. The new cluster writes to the new metadata storage and the new deep storage.

Once you do this migration, the old and the new cluster share the same data segment files in deep storage for any data ingested before the migration. Data ingested after the migration goes to different files. Avoid running permanent data deletion tasks on datasources that share segments between two clusters because it causes the clusters to delete each others' data.

If the new Druid cluster shares the same ZooKeeper quorum as the old, use a different base znode path by configuring druid.zk.paths.base in Druid's common.runtime.properties to a different name, such as /druid-newcluster. The default value is /druid. It must also use a different druid.discovery.curator.path.

The new cluster has data and can access the old cluster's deep storage

This scenario applies if you are using Imply Hybrid and:

  • The new Imply cluster can access the old cluster's deep storage.
  • The old and the new metadata storage are both MySQL.
  • The new cluster's deep storage has some data in it.
  • The new cluster's metadata storage has some data in it.

In this scenario:

  1. Ensure there are no collisions in the paths between the old deep storage and the new deep storage. If there are collisions, do the following:

    1. Change the path of the old deep storage.
    2. Give the same paths when you modify the old mysqldump file of the old metadata storage.
  2. Copy the data from the old deep storage to the new deep storage.

  3. Configure the new clusters with different deep storage path and DB server address.

  4. Run mysqldump on the old cluster's metadata storage, excluding the DDL. Do not overwrite the target metadata storage:

    mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table --no-create-info --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql

  5. Change the location of the segments in the druid_segments table in the mysqldump file from the previous step to point to the new deep storage location:

    sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql

  6. Copy the druid_config, druid_datasource, druid_supervisors, and druid_segments tables from the old metadata storage to the new metadata storage by importing the mysqldump file above into the new target metadata storage. The old cluster keeps writing to the old metadata storage and the old deep storage; the new cluster writes to the new metadata storage and the new deep storage:

    mysql -h <host_name> -u <user_name> -p <db_name> < /dir/source_output_file.sql

The new cluster has no data and cannot access the old cluster's deep storage

This scenario applies if you are using Imply Hybrid and:

  • The old and the new clusters use different deep storage and metadata storage servers.
  • The new cluster cannot access the old cluster's deep storage.
  • The old and the new metadata storage is MySQL.
  • The new deep storage has no data in it.
  • The new metadata storage has no data in it.

In this scenario:

  1. Copy the data from old deep storage to new deep storage. Consider using a staging area as an intermediate location.

  2. Configure the new clusters with a different deep storage path and DB server address.

  3. Run the mysqldump command on the old metadata storage:

    mysqldump -h <host_name> -u <user_name> -p --single-transaction --skip-add-drop-table --no-create-info --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > output_file.sql

  4. Change the location of the segments in the druid_segments table in the above mysqldump file to point to the new deep storage location:

    sed -i .bak 's/\\"bucket\\":\\"<old_bucket_name>\\"/\\"bucket\\":\\"<new_bucket_name>\\"/' /tmp/output_file.sql

  5. Copy the druid_config, druid_datasource, druid_supervisors, druid_rules, and druid_segments tables from the old metadata storage to the new metadata storage by importing the above mysqldump file into new target metadata storage:

    mysql -h <host_name> -u <user_name> -p <db_name> < /dir/output_file.sql

  6. Delete the druid_rules table from target MySQL. Starting the cluster recreates this table.

  7. Start the new cluster.

  8. The Coordinator automatically starts reading the old segments' metadata in the new metadata storage, and then historical nodes load them from the new deep storage. The data in old deep storage remains. The old cluster keeps writing to the old database and old deep storage. The new cluster writes to the new metadata storage and new deep storage.

The new cluster has data and cannot access the old cluster's deep storage

This scenario applies if you are using Imply Hybrid and:

  • The old and the new clusters use different deep storage and metadata storage servers.
  • The new cluster cannot access the old cluster's deep storage.
  • The old and the new metadata storage is MySQL.
  • The new deep storage has data in it.
  • The new metadata storage has data in it.

In this scenario:

  1. Ensure there are no collisions in the paths between the old deep storage and the new deep storage. If there are collisions, change the path of the old deep storage to something else and give the same paths when you modify the old mysqldump file of the old metadata storage.

  2. Copy the data from the old deep storage to the new deep storage. You can use a staging area as an intermediate location.

  3. Configure the new cluster with a different deep storage path and metadata storage server address.

  4. Run the mysqldump command on the old metadata storage excluding the DDL. Do not overwrite the target metadata storage:

    mysqldump -h <host_name> -u <user_name> -p --skip-add-drop-table --no-create-info --no-create-db <db_name> druid_config druid_dataSource druid_supervisors druid_segments > source_output_file.sql

  5. Change the location of the segments in the druid_segments table in the mysqldump file above to point to the new deep storage location:

    sed -i .bak 's/\\"bucket\\":\\"<old_segment_name>\\"/\\"bucket\\":\\"<new_segment_name>\\"/' /dir/source_output_file.sql

  6. Copy the druid_config, druid_datasource, druid_supervisors, and druid_segments tables from the old metadata storage to the new metadata storage by importing the above modified mysqldump file into the new target metadata storage:

    mysql -h <host_name> -u <user_name> -p <db_name> < /dir/source_output_file.sql

  7. Start the new cluster.

  8. The Coordinator automatically starts reading the old segments' metadata in the new metadata storage, and then historical nodes load them from the new deep storage. The data in old deep storage remains. The old cluster keeps writing to the old metadata storage and the old deep storage. The new cluster writes to the new database and the new deep storage.