Why Should We Bother Using Git + CI/CD for Reltio?
Pushing an Update from Development into Production
Proof of Concept w/ GitHub Actions and Python
Are you tired of performing cumbersome small tasks everyday that end up eating a lot of your time?
In this article, we will look at one of the possible ways to automate part of our everyday work with the Reltio platform. The focus is on building a robust continuous deployment and continuous integration solution. We are going to look at why this form of automatization could be beneficial for any organization, how to structure your repository and build the CI/CD process from conceptual point of view, what configurations are a good pick for automation, how to perform automated tests. We will wrap up with a some practical takeaways – a walk-through of a proof of concept that involves all topics discussed.
Prerequisites
While this article aims to be understandable for all audiences, it is recommended that at least the following prerequisites are met:
- Good understanding of the Reltio platform and its various configurations
- Good understanding of Git and CI/CD processes
- Some experience with creating CI/CD pipelines (workflows)
Why Should We Bother Using Git + CI/CD for Reltio?
Git and CI/CD have transformed software development by making it easier for teams to work together on code and quickly release updates without sacrificing quality. Utilizing the same concepts for our Reltio assets brings multiple advantages. By storing our configurations on Git we are already benefiting from maintaining version control. The benefits from using CI/CD mirror those in other software development processes that leverage DevOps, including:
- Improved quality and consistency
- Faster release cycles
- Early problem detection
- Reduced manual effort
- Enhanced collaboration and transparency
- Risk mitigation
- Feedback loop
and many more.
The End-to-End Flow
Structuring the Repository
Most Reltio customers would have three or more environments that are dedicated to:
- development
- testing
- production
This is why we suggest modeling the Git repository to follow a similar structure – dev, test and prod branches would be sufficient in this case, where each branch corresponds to the actual state of the Reltio tenant.
We will not stop at how to solve merge issues, create pull requests and other git features since these are very organization-specific.
Building the CI/CD Process
The CI/CD process could be built using any of the modern tools at our disposal. Automation servers like Jenkins or CI/CD platforms like GitHub Actions are great choices to begin with, depending on where your Git repository is hosted. On a high-level we would want to create the following flow:
- Configurations are stored on the repository in the appropriate branches
- On any change of the configuration, appropriate deployment job is triggered
- The deployment job involves all the specificities, depending on the configuration
- The job will push the config directly to Reltio via API calls
- Depending on the environment, a test job might be triggered first
- The test job will trigger our unit tests that will validate that all conditions are met before we can deploy the configuration
- Optionally, if the test job fails, the configuration will not be deployed. It could also be removed from the branch – revert commit might be initiated
Pushing an Update from Development into Production
When a change is introduced to our configuration, it should go through all the branches (corresponding to all the Reltio environments, each of which has its own purpose). It will be first implemented on dev, then be tested on the dedicated environment (sometimes referred to as user acceptance testing or UAT environment) and finally it will be published to production for the users to use and gain value from it.
Development Phase
Pushing every minor change that a developer does through the CI/CD process can not be part of a reliable workflow since it can easily bloat the git history. This would require you to often squash the commits or cleanup afterwards. It can happen as it is expected that most of the work on the dev environment is experimental and does not contain meaningful changes until that one last commit that is approved by the developer.
This is why we recommend leaving the development Reltio environment open for pushes from any developer, while taking away access to directly push changes on the higher environments. One can experiment by posting the configuration via Postman and only when the required changes have been verified as working he/she should commit and push the new configuration on the dev branch. There is a chance that two or more developers doing this at the same time can interfere with each other, but this should be neglectable as it should not be something that happens very often. If this happens you can always recover any override configuration from the Reltio’s configuration history.
A push to the dev branch will automatically pick up the changed configuration and re-deploy it to dev via the CI/CD process.
Testing Phase
Moving our configuration from the dev branch to the testing one should happen via a pull request. Access to push the configuration directly to Reltio should be taken away from developers on the testing Reltio tenant to ensure that the defined process is followed. This is where we could leverage having one or more more experienced people to review the changes and choose to accept or reject them with comments (it is advisable that the rejection always contains at least one comment explaining the reason).
If the change is accepted, two things should happen. Firstly, the configuration should be deployed to the UAT Reltio tenant via the CI/CD process. Secondly, our tests should run on top of that tenant – verifying that the change is not introducing regression to our environment. We could choose to add a third step – in case of test failures to rollback our configuration to its previous working state.
Production
The final step of any configuration’s journey is the production environment. The process here is analogical to moving a change from dev to test. However, we suggest that you add at least two reviewers for this step as a safe measure. The decision of whether we should rollback a configuration, or leave it and go for a quick hotfix is open and would depend on your organization’s needs and security measures.
Reltio Configurations
Not all configurations could be subject to our CI/CD process and some of them would also require a workaround in order to be pushed to a Reltio tenant. However, we will look at some of the most frequently updated configs and what specifics the process might have.
Business Configuration (L3)
The L3 configuration is probably the one that gets updated most often. This is logical since it holds most of the business logic about survivorship, match and merge rules, attributes and data model, sources etc. The specificificity with L3 is that it could hold some property values that are tenant-specific. This is most often the RDM tenant value but some others could be present as well. We suggest a thorough inspection of the file before you move on with the automation.
One solution for that is to turn these values into variables. This could mean storing them on Git as such string: “{RDM_TENANT_ID}”. The process should be enhanced with a middle step that will replace all these variables with their corresponding values before the API call that will push the configuration on Reltio. On Git we will always see “{RDM_TENANT_ID}”. This is helpful when comparing differences in a file on different branches. Had we stored the actual value directly, there would always be issues at pull requests as each value is in fact the correct one (but for its own tenant), triggering merge conflicts.
DnB Mapping Configurations
While not often changed, the DnB configurations are nice to have addition for your automation process. It is easy to implement their CI/CD and we have the benefits of keeping history and not worrying about credentials or miss clicking while using Postman (it may happen that you are pointing at an environment you did not wish to).
UI Configurations
UI configurations cannot be easily involved in the process since there is no official API to push/pull them from and to your Reltio tenant. However, there is still a workaround for this – you could get the API details that your browser is exposing and find out the API to push the files. This could be achieved with a little trick that we discussed on another blog post. Feel free to search for “Reltio Tips & Tricks – Part I” on our blog page. Apart from that, there are no other specifics for the UI configurations.
Loqate Configuration
Loqate configuration is another one that is good to have, even though it is not one which is changed often. The main benefit of having these configurations as part of our process is having the history and the ability to track bug causes. For example on date X, in order to find out what caused a tenant-wide issue on your tenant it would be much easier if you had all the configurations handy – you could then compare and debug.
Metadata Security Configuration
Metadata security could for some customers be one of the configurations which are changed often. That is because of the evolving way of how we are using the Reltio platform in time – you could often create new roles, take and give permissions etc. That is why it’s advisable to include it in the process.
Automated Testing
As of today Reltio is not offering testing as out-of-the-box feature. That is why we should resort to using a separate project (codebase) for our testing. Any modern programming framework for unit testing could be used. Examples would be NUnit, JUnit, PyUnit and more.
Testing could be performed on every configuration update, or set up to be triggered only on L3 updates, if that is considered expensive process for your company this will reduce the runs overall.
What to Test?
Depending on your business needs and priorities different tests could be performed. Here are a few suggestions on what we could validate in our tenant:
- Survivorship – it’s advisable to choose the most important attributes for your implementation and validate that the survivorship is still intact after the changes
- Match and merge – validating that a change in the configuration is not breaking key match rules – checking if two or more entities in the tenant are still considered matches
- Key attributes being present – validating that inserting a JSON payload with key attributes will result in all key attributes being present in the entity on Reltio. If any attribute were to be deleted, this test would fail
- Cleansers – validates that no breaking logic was introduced to your Reltio cleansers. Some attribute (for example Loqate one) could possible be unmapped in an upcoming configuration change. This could break match rules, downstream analytics and cause further issues.
- Household grouping – validates that household grouping works as expected after the configuration update
- Integration – a message could be read off the Reltio queue to validate what is the payload going downstream. It could be compared with an expected payload
How to Test?
Taking into consideration what we are testing and the fact that Reltio is API-based platform the best way to perform our testing would be by firing API calls to the platform and comparing the response we are getting with the expected response.
For some of the test types, for example validating survivorship rules it could make sense to pre-load some entities into your Reltio tenant and just pull them from there each time the test is executed (since survivorship is calculated on the fly for each GET of the entity).
However, it is still advisable to follow the AAA (arrange, act, assert) pattern, followed by an additional step to cleanup after the test – since we are working in the cloud we cannot really be mocking a database in the same way you would do it if it was any local database. For most tests this would mean to have a prepared JSON payloads, insert them in Reltio, run your tests – verify matching, survivorship etc, and then delete the entities to have the tenant clean.
For integration testing the method would vary depending on your implementation, but you would probably need a mechanism to filter out entities to the downstream – for example having a flag “Block” set to True would mean that the entity should not be sent downstream. You can then mark your test entities as blocked and work with them for your own purposes. This could be beneficial in crucial implementations like in the financial sector since having dummy entities in production environment may be harmful.
Proof of Concept w/ GitHub Actions and Python
You could find the proof of concept that we will walk you through on the following GitHub repository: https://github.com/Ulpia-Tech/reltio-assets-ci-cd
In this section we will discuss an exemplary way to achieve CI/CD for Reltio assets. Note that this is just a proof of concept and as such is not one hundred percent complete in its implementation.
As previously discussed you will see three branches on the repository: dev, test, and prod. Below is the workflow that we will be using for the deployments:
permissions:
contents: write
on:
push:
branches:
- dev
- test
- prod
paths:
- 'L3.json'
workflow_dispatch:
jobs:
deploy:
name: Deploy L3 Configuration
runs-on: ubuntu-latest
environment:
name: ${{ github.ref_name }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
ref: ${{ github.ref_name }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Execute L3 Deployment Script
run: python scripts/l3-deployment.py "L3.json" "${{ secrets.RELTIO_USERNAME }}" "${{ secrets.RELTIO_PASSWORD }}" "${{ secrets.RELTIO_ENVIRONMENT }}" "${{ secrets.RELTIO_TENANT_ID }}"
- name: Install Pytest
run: pip install pytest
- name: Run Unit Tests
id: unit_tests
continue-on-error: true
run: pytest ./tests
- name: Configure Git
if: steps.unit_tests.outcome == 'failure'
run: |
git config --global user.email "github.actions@ulpia.tech"
git config --global user.name "Ulpia Tech Actions"
- name: Revert Commit on Failure
if: steps.unit_tests.outcome == 'failure'
run: |
git checkout ${{ github.sha }}~1 L3.json
git add .
git commit -m "Revert ${{ github.sha}} by Rollback Bot"
git push
- name: Re-run Deployment on Failure
if: steps.unit_tests.outcome == 'failure'
run: python scripts/l3-deployment.py "L3.json" "${{ secrets.RELTIO_USERNAME }}" "${{ secrets.RELTIO_PASSWORD }}" "${{ secrets.RELTIO_ENVIRONMENT }}" "${{ secrets.RELTIO_TENANT_ID }}"
Let’s inspect each section of this configuration in details!
- The permissions
permissions:
contents: write
This section defines the permissions required by the workflow – in this case we want to be able to write to the repository’s contents. This is needed later in our revert commit step where we commit a change and push the code.
- Workflow triggers
on:
push:
branches:
- dev
- test
- prod
paths:
- 'L3.json'
workflow_dispatch:
This section specifies the triggers of our workflow. It runs on two types of events:
- push: When a push is made to the dev, test, or prod branches, but only if the push includes changes on the L3.json file.
- workflow_dispatch: Allows for the workflow to be manually triggered from the GitHub UI. This is mainly enabled for running the workflow on demand and can be useful if the Reltio environment was unavailable at the moment, but it is now online and we want to retry the deployment.
- The deploy job step
jobs:
deploy:
name: Deploy L3 Configuration
runs-on: ubuntu-latest
environment:
name: ${{ github.ref_name }}
The section of the configuration defines the jobs to be executed by our workflow. There is a single job named Deploy L3 Configuration. The job is set up to run on the latest Ubuntu runner available on GitHub. The environment field dynamically sets the environment name to the name of the branch that triggered the workflow. This is possible since our environments and branch names follow the same naming convention. As a prerequisite here, we have already set up the environments with all of their secrets and environment variables in the Settings of the repository.
- Checkout the repository step
The steps section defines the actual steps through which the workflow is going to go. Starting with checking out the repository that we are working with.
- uses: actions/checkout@v3
with:
fetch-depth: 0
ref: ${{ github.ref_name }}
token: ${{ secrets.GITHUB_TOKEN }}
actions/checkout@v3
is a predefined action that we are making use of. This is a standard way to check out a copy of your repository.fetch-depth
of 0 is a parameter that specifies that all history for all branches and tags should be fetched, ensuring that the entire git history is available. We are using this later in case of reverting a commit.ref:
Specifies the reference (branch or a tag). In our case we are dynamically checking out the branch that triggered the workflow (so if we are in dev, we will check out dev).token:
Uses a GitHub token for authentication for the repository. This is a special secret that GitHub Actions generates and allows the workflow to make authenticated calls to GitHub’s API.
- The “Execute the L3 Deployment Script” step
- name: Execute L3 Deployment Script
run: python scripts/l3-deployment.py "L3.json" "${{ secrets.RELTIO_USERNAME }}" "${{ secrets.RELTIO_PASSWORD }}" "${{ secrets.RELTIO_ENVIRONMENT }}" "${{ secrets.RELTIO_TENANT_ID }}"
We are executing a custom script in this section. We will go through the script in details later but for now we just have to accept that it will take the provided path to the L3.json file and publish it to Reltio. You could see that there are several parameters passed to the script, most of which are secrets in our environment. These are already setup as a prerequisite – what we discussed in point 3.
- The steps – Installing pytest
- name: Install Pytest
run: pip install pytest
This step is about setting up the testing environment. There is a python testing project that will execute (or mock executing) tests on Reltio that we will be executing in the next step, but we will go into its details later.
- Execute the unit tests
- name: Run Unit Tests
id: unit_tests
continue-on-error: true
run: pytest ./tests
This step executes the python project that contains the tests. It continues on error since we want to be able to revert the commit and re-deploy in case of test failures. We are providing an id here so that we can reference it in later steps.
- The “configuring Git” step
- name: Configure Git
if: steps.unit_tests.outcome == 'failure'
run: |
git config --global user.email "github.actions@ulpia.tech"
git config --global user.name "Ulpia Tech Actions"
The step configures the Git account. This is required for running next steps like push and commit. It is conditionally executed only if the unit tests failed. The unit tests step is referenced by the id we provided in the previous step for it.
- The “Revert Commit on Failure” step
- name: Revert Commit on Failure
if: steps.unit_tests.outcome == 'failure'
run: |
git checkout ${{ github.sha }}~1 L3.json
git add .
git commit -m "Revert ${{ github.sha}} by Rollback Bot"
git push
These steps handles the reveting of the changes in the event of a test failure. We are only reverting the L3.json file, rather then the whole commit.
- The “Re-run Deployment on Failure” step
- name: Re-run Deployment on Failure
if: steps.unit_tests.outcome == 'failure'
run: python scripts/l3-deployment.py "L3.json" "${{ secrets.RELTIO_USERNAME }}" "${{ secrets.RELTIO_PASSWORD }}" "${{ secrets.RELTIO_ENVIRONMENT }}" "${{ secrets.RELTIO_TENANT_ID }}"
Since we reverted the L3 file in the last step, but on the cloud (in our Reltio environment) we still have the previously deployed L3, we have to re-run the deployment step. This again only happens in case of failures on the unit tests step.
Going back to step 5. let’s take a look into what the custom deployment python script looks like.
import sys
import requests
def read_file(file_path):
with open(file_path, 'r') as file:
return file.read()
def get_access_token(username, password, url = "https://auth.reltio.com/oauth/token"):
headers = {'Authorization': 'Basic cmVsdGlvX3VpOm1ha2l0YQ=='}
payload = {'grant_type': 'password', 'username': username, 'password': password}
response = requests.post(url, data=payload, headers=headers)
if response.status_code != 200:
raise Exception(f"Authentication failed: {response.text}")
return response.json().get('access_token')
def update_config(url, data, access_token):
headers = {'Authorization': f'Bearer {access_token}'}
response = requests.put(url, data=data, headers=headers)
return response
def main():
if len(sys.argv) != 6:
sys.exit(1)
l3_path = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
reltio_environment = sys.argv[4]
reltio_tenant_id = sys.argv[5]
rdm_tenant_id = sys.argv[6]
url = 'https://' + reltio_environment + '.reltio.com/reltio/api/' + reltio_tenant_id + '/configuration'
access_token = get_access_token(username, password)
l3_config = read_file(l3_path)
l3_config = l3_config.replace('{RDM_TENANT_ID}', rdm_tenant_id)
response = update_config(url, l3_config, access_token)
print("Response Status Code:", response.status_code)
if __name__ == "__main__":
main()
We will again, look into each section of the script in details, starting from the top we have our imports.
import sys
import requests
Nothing special about them, but let it be noted that we are using requests for making the HTTP requests out of Python and sys to use some methods maintained by the Python interpreter.
Next, we have the read_file function:
def read_file(file_path):
with open(file_path, 'r') as file:
return file.read()
It’s a simple function that reads the contents of a file whose path is passed as an argument. It opens the file in read mode and returns its contents as a string. In our case, we are reading the L3.json file with this function.
The get_access_token
function is responsible for obtaining an access token from the Reltio OAuth service:
def get_access_token(username, password, url = "https://auth.reltio.com/oauth/token"):
headers = {'Authorization': 'Basic cmVsdGlvX3VpOm1ha2l0YQ=='}
payload = {'grant_type': 'password', 'username': username, 'password': password}
response = requests.post(url, data=payload, headers=headers)
if response.status_code != 200:
raise Exception(f"Authentication failed: {response.text}")
return response.json().get('access_token')
It sends a POST request with the username, password and a pre-defined authorization header (for simplicity). If the request is successful, it returns the access_token
from the JSON response. Otherwise, an exception is raised.
The update_config
function sends a PUT request to the specified URL (the URL for updating the Reltio L3 configuration):
def update_config(url, data, access_token):
headers = {'Authorization': f'Bearer {access_token}'}
response = requests.put(url, data=data, headers=headers)
return response
It uses the access_token for authorization (that we previously obtained) and sends the data to Reltio.
The main function performs several tasks:
def main():
if len(sys.argv) != 6:
sys.exit(1)
l3_path = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
reltio_environment = sys.argv[4]
reltio_tenant_id = sys.argv[5]
rdm_tenant_id = sys.argv[6]
url = 'https://' + reltio_environment + '.reltio.com/reltio/api/' + reltio_tenant_id + '/configuration'
access_token = get_access_token(username, password)
l3_config = read_file(l3_path)
l3_config = l3_config.replace('{RDM_TENANT_ID}', rdm_tenant_id)
response = update_config(url, l3_config, access_token)
print("Response Status Code:", response.status_code)
- Checks if the correct number of command-line arguments are passed
- Reads the command-line arguments which include the file path for the configuration, credentials and Reltio-specific information like the environment and tenant id.
- Constructs the URL for the Reltio configuration API
- Calls the functions to get access token, read the configuration from the file
- Replaces our
{RDM_TENANT_ID}
placeholder with the appropriate value passed as parameter - Finally calls the
update_config
function to update the configuration on Reltio
It is a simple yet powerful script that makes our life much easier, than it would be if we had to write all this logic directly in our GitHub Actions workflow.
Finally, we have a very self-explanatory mini project in the tests
folder. Its purpose is to simulate the results of a real testing project for our proof of concept. Let’s now go through the how all the pieces work together – we will simulate an update on the L3 configuration with successfully passing tests, and one more where the tests fail deliberately.
The Workflow in Action
Success
Since our testing project is hard-coded to currently always pass, in order to simulate a successful deployment we just have to make an update to our L3 configuration. It could be as simple as changing the label of a property, source or anything else.
"sources": [
{
"uri": "configuration/sources/Salesforce",
"label": "Salesforce1",
"abbreviation": "Salesforce"
}
For simplicity, we will just update the label for our Salesforce source from Salesforce1
, removing the 1
at the end. Navigating to Actions, we are able to see that the workflow was indeed triggered by the L3.json
change:
After just a few seconds, we will be able to see the successful status of our deployment job:
Now, to validate that everything is as expected, we could also check the L3 configuration on our Reltio tenant via Postman:
Two important things to validate here are:
- The change we did on the source label was indeed pushed to our Reltio tenant
- The
rdmTenantId
(blurred on the picture) has the correct value. Our python script successfully replaced the placeholder in the configuration before pushing the configuration to Reltio
Failure and a Rollback
In the end, let’s look at the scenario where our deployment fails – this could be because someone pushed a corrupted L3, or a drastic change in our survivorship rules that we did not expect. We will simulate this failure by modifying our fake test project:
def fake_reltio_call():
return False
The fake_reltio_call function now returns False. Since we are expecting True in our case, this will cause failure. But in order to trigger the GitHub Actions workflow we also have to introduce a change in our L3 configuration. We will try to put back the label Salesforce1 in the sources. The expectation now is that our configuration will be rolled back to its previous state after the test fails, so lets run these changes and see what happens.
The GitHub Action picked up the change immediately and this time we could take a closer look of each step executed:
We see that all the steps were executed, even the conditional steps that should only run in case of test failures. Expanding on the test step, we could indeed see that the process completed with exit code 1
, which indicates an error.
Navigating to the repository commits shows the L3.json
change was indeed reverted. Optionally, we could, again, verify the changes on our Reltio tenant via Postman – and it confirms that everything works as expected. Even though we pushed the Salesforce1
label, we still have the previous version with Salesforce
in the environment.