CI/CD and Management of Data Warehouses can be a serious challenge. In this blog you will learn how to setup CI/CD for Snowflake using Schemachange, Github, and Jenkins. For access to the code check out this repo.
Schemachange is a simple python based tool to manage all of your Snowflake objects. Typically, I appreciate a declarative approach for DevOps, but can be pretty difficult when data migration is involved. Schemachange takes an imperative approach to Database Change Management. The combination of a version control system and CI/CD tool, database changes can be approved and deployed through a pipeline using modern software delivery practices.
Schemachange uses a directory structure like the following:
Step-By-Step Guide
Prerequisites
1. Snowflake Account
2. A Snowflake Database
3. A Snowflake User created with appropriate permissions
4. Github Repository
5. Jenkins Server
a. Jenkins Agent with Python 3.7 or greater
Step 1. Create Github Repo
After cloning the repo to your local machine add a folder called migrations
Step 2. Write and Commit init scripts and Jenkinsfile
Insert a file named V0.0.1__init_scripts.sql (make sure to have two under scores after the version)
Add the following to the file
After this create a Jenkinsfile and insert the following code:
#!/usr/bin/env groovy pipeline { agent {label 'your-docker-agent-name-here'} environment { VIRTUALENV = 'snowflake_dev' // name of virtual environment PYTHON_VERSION = 'python3.7' // can be python3.8, python3.9 PIP_VERSION = 'pip3.7' // can be pip3.8, pip3.9 PROJECT_FOLDER = 'migrations' // name of project folder where scripts are located SF_ACCOUNT = 'YOUR SNOWFLAKE ACCOUNT NAME HERE' // typically everything that comes before snowflakecomputing.com SF_USER = 'YOUR SNOWFLAKE USER ID HERE' SF_ROLE = 'NAME OF ROLE HERE' // Typically requires create, update, delete privileges SF_WH = 'NAME OF YOUR WAREHOUSE HERE' SF_DB = 'NAME OF YOUR DATABASE HERE' SF_CH = 'NAME OF YOUR CHANGE HISTORY TABLE HERE' SECRET_LOCATION = '/home/jenkins/snowflake_pk' // If you are using a Key Pair Authentication JENKINS_CRED_ID_SECRET_FILE = 'YOUR SECRET FILE JENKINS NAME HERE' // If you are using a Key Pair Authentication JENKINS_CRED_ID_SECRET = 'YOUR SECRET PASSWORD JENKINS NAME HERE' // If you are using a Key Pair Authentication SCHEMACHANGE = "${WORKSPACE}/${VIRTUALENV}/lib/${PYTHON_VERSION}/site-packages/schemachange/cli.py" } stages { stage('Deploying Changes To Sandbox Snowflake') { steps { withCredentials([file(credentialsId: "${JENKINS_CRED_ID_SECRET_FILE}", variable: 'SNOWFLAKE_PRIVATE_KEY_FILE'), string(credentialsId: "${JENKINS_CRED_ID_SECRET}", variable: 'SNOWFLAKE_PRIVATE_PASSPHRASE')]) { writeFile file: "${SECRET_LOCATION}", text: readFile(SNOWFLAKE_PRIVATE_KEY_FILE) sh """#!/bin/bash -x echo "PROJECT_FOLDER ${PROJECT_FOLDER}" echo 'Step 1: Installing schemachange' virtualenv ${VIRTUALENV} -p ${PYTHON_VERSION} source ${VIRTUALENV}/bin/activate ${PIP_VERSION} install schemachange --upgrade export SNOWFLAKE_PRIVATE_KEY_PATH=${SECRET_LOCATION} export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=${SNOWFLAKE_PRIVATE_PASSPHRASE} echo 'Step 2: Running schemachange' ${PYTHON_VERSION} ${SCHEMACHANGE} -f ${PROJECT_FOLDER} -a ${SF_ACCOUNT} -u ${SF_USER} -r ${SF_ROLE} -w ${SF_WH} -d ${SF_DB} -c ${SF_CH} -v """ } } } } }
The command does have an argument that allows you to create a change table. But depending on your enterprise’s Snowflake user security you may not be able to use that argument. So I would recommend just creating the change history table and specify it in the schemachange command.
Step 3. Setup Jenkins Server and Create Jenkins Pipeline
Make sure your jenkins agent has the ability to use python3.7 or greater and allows the use of jenkins-pipelines.
I used a multi-branch pipeline to allow for non-prod and prod deployments.
Step 4. Integrate Github and Jenkins using Webhook
Using the jenkins webhook and adding it to github settings
Step 5. Create private/public key
We are going to create an encrypted private/public key.
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
Step 6. Add public key to Snowflake and add Change History Table
In your snowflake account execute the following query but replace the fields with your user and public key.
alter user jsmith set rsa_public_key='MIIBIjANBgkqh...';
After altering your user create your change history table. You don’t have to use SCHEMACHANGE schema. You can use your own. You just have to specify it in your jenkinsfile.
CREATE TABLE IF NOT EXISTS SCHEMACHANGE.CHANGE_HISTORY ( VERSION VARCHAR ,DESCRIPTION VARCHAR ,SCRIPT VARCHAR ,SCRIPT_TYPE VARCHAR ,CHECKSUM VARCHAR ,EXECUTION_TIME NUMBER ,STATUS VARCHAR ,INSTALLED_BY VARCHAR ,INSTALLED_ON TIMESTAMP_LTZ );
Step 7. Either execute a build or make a change to your repo after jenkins pipeline is setup to kick off a build.
Gotchas
2.9.2 vs 2.9.4
2.9.4 Release allows for the use of unencrypted key pairs, but 2.9.4 was not available at the time of writing this blog.
Thanks for checking it out. If you have any questions feel free to leave a comment or an issue on the github repo!