/ by

AWS S3 to Reltio PoC Workflow in Jitterbit

Introduction

This project is meant to be a PoC that we quickly put together so it can be used as a starting point for your Jitterbit projects.

You can download the project and example CSV and JSON files from the section at the bottom of this article.

Use Case

We want to read CSV and JSON files consisting of Reltio records from an AWS S3 bucket and push those records to Reltio.

Our Project pane contains these 4 workflows:

In this blog, we will look more thoroughly into each workflow.

Main Workflow Design Pattern

Let’s start with how our Main Workflow is built and configured. Each number corresponds with a description of the operation component.

Get messages for SQS notification event

  1. Prerequisites
    • First, inside your Amazon environment, you need to create an AWS SQS queue and bind it to your targeted AWS S3 bucket. If you are doing this for the first time, this page can help you – AWS SQS configuration
    • For this example, our AWS S3 bucket is configured to publish notifications for “New object created events”
  2. Amazon SQS Endpoint Connector configuration
    • From the Component palette on the right of your Jitterbit Cloud Studio, select the “Amazon SQS” endpoint and configure your connector by providing Connection Name, Access Key Id, Secret Access Key, and AWS Region.
    • Detailed information on the connector is present on Jitterbit Success Central – Jitterbit Amazon SQS Connector configuration
  3. Get Messages Activity
    • After you have configured the AWS SQS Connector, drag and drop the “Get Messages” activity onto the Design Canvas.
      1. Set a Name for the activity and select the Queue that you have created as part of the Prerequisites step
      2. More information about the other parameters in the activity is described here – Jitterbit AWS S3 Get Messages Activity
      3. There are request and response schemas that are generated from the endpoint and are displayed on the 2nd screen of your activity.

Transform SQS response schema

  • From the response schema of the AWS SQS activity, we need only one parameter that holds the name of the newly created file in AWS S3. What we can do here, is to mirror the source schema (it is generated directly from the endpoint) and inside the “body” property of the “messages” property we can create a global variable like this:

Get File Name

  • Our next component is a script component where we parse our global variable to JSON and get the name of the file. Please note, that the Script type is set to JavaScript.
  • Optional – you can log the file name to the operation log with the WriteToOperationLog function for debugging purposes

We should configure some settings on the first operation

  • In the Actions tab – if the 1st operation (Listen to AWS SQS) is executed successfully, we run the next one (Check File Extension)
  • In the Scheduled tab – if you want to check the SQS queue dynamically, Jitterbit gives you the following option for doing that: you can create a new schedule to run the operation every few minutes, in our example, we set them to 2

We want to continue the execution of the operation flow only if the new file is a JSON or a CSV file. In all other cases, we want to suspend the flow and we direct it to the “Failed operation” (check the 7th step below). Our script looks like this:

As we have two separate workflows for the processing of CSV files and JSON files, we make some further checks and determine which workflow to call. This is achieved by calling the first operation of each workflow with the RunOperation function.

We will look into the CSV and JSON workflows more thoroughly in the next two sections of this document.

Our last component is called when we have neither a CSV nor a JSON file. We set a custom error message and display it in the error log using the RaiseError function. Be careful when using this function as it raises a fatal error (Jittertbit Logging and Error Functions)

Reltio Auth Workflow

As we will push data to Reltio, we need first to authenticate to Reltio and get an access token. This token is then used to post the data from the files that we will read.

We will not get deeper into how to build the Reltio Auth workflow in this article because you can find step-by-step information in our How to connect to Reltio through Jitterbit article.

The Reltio Auth Workflow in our project looks like this:

CSV Workflow

The purpose of our CSV workflow is to get a CSV file from an AWS S3 bucket, parse it and push it to Reltio. Our newly created workflow looks like this:

The numbers are corresponding to the description of the components.

Workflow Design Pattern

AWS S3 Component

  • From the Component palette on the right of your Jitterbit Cloud Studio, select “Amazon S3” endpoint and configure your connector by providing Connection Name, Access Key ID, Secret Access Key, and AWS Region.
  • After your connector is ready, grab and drag the “Get Object” activity and configure it by providing Name (of the activity), S3 bucket, and Key (here choose the name of the global variable that we have created earlier – $messageBodyFileName).
  • This endpoint provides a response schema which we will use in our next step.

The property that holds all the contents of the CSV file is the “Data” property provided as a response from the AWS S3 endpoint. We will use a transformation block to map the schemas – the source schema is provided by the endpoint activity, for the target schema we will create a new flat schema with only one field. After that, we map the “Data” property from the source schema to the field created in the flat schema and we add the following script:

The data comes from the AWS S3 bucket in base64binary format so we need to parse it to string. After the parsing, there are some additional double quotes that appear so we need to trim them to receive the contents of the CSV in a proper format.

Our next step is to write our global variable to a temporary storage endpoint. First, we configure the Temporary Storage Connector from the Components palette by giving it only a Connection Name. Then we drag and drop the Write activity and configure it the following way.

Our script component holds the functions that call the Reltio Auth Obtain token operation

The source activity for the second operation of the operation chain is the Read temporary endpoint activity. The properties that we need to set here are:

  • Name – for the name of the activity
  • Provide response schema – check “Yes, provide a new schema”
    • Set the Schema name and provide your schema where you specify the properties that you need
      • In the example below, these properties are name, phone, email, address, etc.
  • Get Files – the name of the file that we set in Step 3
  • Optional Settings -> Ignore Lines – ignore the 1 line which is the header row of the CSV file

In this step, we map the attribute properties through our transformation component. The source schema comes from the Temporary Storage Enpoint Read activity and for the target schema we create a custom JSON schema. After that, we manually map the properties that we want.

An important thing to bear in mind here is that we need to map the “flat” property from the source to the highest hierarchical “item” property in the target schema and choose the “Define loop node” option from the menu that appears. This enables us to push each row from the CSV file as a separate record in Reltio.

For our testing purpose, the following three attributes are set like this:

  • type: “configuration/entityTypes/Individual”
  • crosswalk’s type: “configuration/sources/Salesforce”
  • crosswalk’s value: “jitterbitCSV_” + GUID()

Our last element in the operation is HTTP POST activity which helps us to push the records to Reltio.

First, we need to configure a new HTTP Connector by providing a Connection Name and a Base URL.

After configuring that, choose the POST activity and drop it onto the Design canvas, right next to the “Map CSV to JSON” transformation block. Here, we need to provide a Name for the activity, Path, and two Request Headers properties – Authorization and Content-Type. In the value field for the Authorization, choose the access token variable that you have created in the Reltio Auth Workflow.

There is no need to provide a request and response schema for this activity.

With that, our CSV Workflow is ready.

JSON Workflow

Here we want to do something similar to the CSV workflow, but with JSON files. We get a JSON file from an AWS S3 bucket, parse the file and push its components to Reltio.

Our newly created JSON workflow looks like this and the numbers are corresponding to the description of the components.

Workflow Design Pattern

AWS Component – Please refer to Step 1 of the CSV Workflow as the configuration of this component is the same as in the CSV workflow.

Again like in the CSV example, we need to decode the “Data” property of the AWS S3 activity response

The global variable from step 2 is used for the configuration of a Variable endpoint. The associated Write activity of the variable endpoint is used as a target in the “Get JSON from AWS S3” operation.

The second operation of the operation chain starts with the amazonDataBodyDecoded variable connector’s read activity. Here we set a Name for the activity and we do not provide any response schema.

We configure a Temporary storage connector by giving it only a Connection name. The target activity for the “Var to Temp” operation is the Write activity of the temporary storage endpoint which we have just created:

Coming from the temporary storage, the JSON object has redundant quotes which makes the format of the JSON invalid. In order to fix that, we perform a couple of functions to remove the unnecessary quote characters:

Without these transformations, the JSON file cannot be parsed properly and the following error can appear in the error log:

Also in this script, we call the Reltio auth obtain token operation that we have in a separate reusable workflow.

For our source activity in the 3rd operation (Temp to JSON) we will use again a Read variable activity as we need to satisfy the operation validity patterns. We won’t provide any response schema here.

Before pushing anything to Reltio, we need to make the proper schema transformations. We define new source and target schemas and map the attributes that we need. Like in the CSV Workflow, we need to define a loop by mapping the “flat” property from the source to the first “item” property in the target schema so that we can push each JSON object from our JSON file as a new entity in Reltio:

For our testing purpose, the following three attributes are set like this:

  • type: “configuration/entityTypes/Individual”
  • crosswalk’s type: “configuration/sources/Salesforce”
  • crosswalk’s value: “jitterbitJSON_” + GUID()

POST to Reltio – Please refer to Step 7 of the CSV Workflow as the configuration of this component is the same as in the CSV workflow

With that, we have finished creating and configuring all the needed components and operations for our Main Workflow to function in a proper way.

References

Source code & example files

  • CSV and JSON example files

jitterbitWorkflowJSON.json 578 B
jitterbitWorkflowCSV.csv 306 B

  • Project

AWS S3 to Reltio Workflow.json 232 KB

Contact us

Get in touch and ask us anything. We're happy to answer every single one of your questions.

  • 6A Maria Luiza Blvd, Plovdiv
    4000, Bulgaria
  • Ulpia Tech LinkedIn Ulpia Tech Twitter


    To top