Skip to main content

Filter Operation Use Cases

In this guide, we will be discussing various rules and scenarios in which Filter operation can be used and how the users can add them to their integration bridge.

Rules for Using Filter Operations

1. When the Target is an API

Single Filter Condition:

If a single filter condition is used to filter data and the filtered data is sent to the API as the request body (dropping unfiltered data), Data Aggregation operation is not required.

Multiple Filter Conditions:

If multiple filter conditions are used, and all the filtered data needs to be sent to the API as the request body:

1.  Use the Data Aggregation operation to aggregate the data.
2.  Follow this with the Singleline to Multiline operation.
3.  Send the data to the API.

2.When the Target is Datalake

General Rule:

Whether using one or more filter conditions, always:

1.  Aggregate the data using the Data Aggregation operation.
2.  Apply the Singleline to Multiline operation.
3.  Send the data to the Datalake.

Dropping Unfiltered Data:

After the filter operation, create a meta column to serve as a group-by key in the Data Aggregation operation. This ensures only filtered data is sent to the Datalake.

Retaining All Data:

Use a meta column (created before the filter operation in the Append operation) as the group-by key in the Data Aggregation operation. This ensures both filtered and unfiltered data are sent to the Datalake.

3.When the Target is a Database

General Rule:

For one or more filter conditions, always:
1.  Aggregate the data using the Data Aggregation operation.
2.  Apply the Singleline to Tuple operation.
3.  Send the data to the Database.

Dropping Unfiltered Data:

After the filter operation, create a meta column to serve as a group-by key in the Data Aggregation operation. This ensures only filtered data is sent to the Database.

Retaining All Data:

Use a meta column (created before the filter operation in the Append operation) as the group-by key in the Data Aggregation operation. This ensures both filtered and unfiltered data are sent to the Database.

4.When Ending a Filter Condition

•  Always use the Filter Ends operation once the data transformation is completed.
•  The Filter Ends operation ensures that the filter condition is finalized and no further filtering is applied to the data.
These rules aim to provide clear guidance on how to handle different filter scenarios for various targets.

To show these scenarios, a dataset of 12 records has been used in multiple ways. The dataset consists of records:

Scenario 1: With Target as Datalake and 1 filter condition

In this scenario, one filter condition will be applied on the data coming from the source and then the filtered record will be sent to datalake.

 

Operations Used:

image.png

  • To send the records to datalake, we apply Singleline to Multiline operation as the data coming from source is a singleline JSON data.
  • Next, the filter condition is applied as per the requirement (Here, Filter condition applied is ID not equal to 2, which means records associated with ID 2 should not be sent to target).
  • Once the filter operation is applied, create a new column using Append, which will help represent the status of the records sent, end the filter condition using Filter Ends, and aggregate the data using Data Aggregation for sending to the Target.

Example: Data in Source

{
  "id": 1,
  "email": "george.bluth@reqres.in",
  "first_name": "George",
  "last_name": "Bluth"
},
{
  "id": 2,
  "email": "janet.weaver@reqres.in",
  "first_name": "Janet",
  "last_name": "Weaver"
},
{
  "id": 3,
  "email": "emma.wong@reqres.in",
  "first_name": "Emma",
  "last_name": "Wong"
}

 Filter Condition: 

data['id'] !=2

Output in Datalake:

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Success

3

emma.wong@reqres.in

Emma

Wong

Success

Scenario 2: With Target as API and 1 filter condition

In this scenario, one filter condition will be applied to the data coming from the source and then the filtered record will be sent to the target which is API.

Operations Used:

image.png

  • To send the records to the target (API), we apply Singleline to Multiline operation as the data coming from source is a singleline JSON data.
  • Next, the filter condition is applied as per the requirement (Here, Filter condition applied is ID not equal to 3, which means records associated with ID 3 should not be sent to target).
  • Once the filter operation is applied, create a new column using Append, which will help represent the status of the records sent, end the filter condition using Filter Ends, and now send the records to the target.

Note: Further operations can be used in the integration bridge depending on the Target and data type.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
}

Filter Condition: 

data['id'] !=3

Output in Target:

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
}

Scenario 3: With Target as Database and 1 filter condition

In this scenario, one filter condition will be applied to the data coming from the source and then the filtered record will be sent to the Database.

Operations Used:

image.png

  • To send the records to database, we apply Singleline to Multiline operation on the data coming from source, as it is a singleline JSON data.
  • Next, the filter condition is applied as per the requirement (Here, Filter condition applied is ID not equal to 1, which means records associated with ID 1 should not be sent to target).
  • Once the filter operation is applied, create a new column using Append, which will help represent the status of the records sent, and end the filter condition using Filter Ends.
  • To send the data to the Database, we need to create a Tuple. But before creating the tuple, aggregate all the columns using Data Aggregation and store it in a tuple key.
  • Use Singleline to Tuple and send the records to the Database.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
}

Filter Condition: 

data['id'] !=1

Output in Datalake:

id

email

first_name

last_name

Status

2

janet.weaver@reqres.in

Janet

Weaver

Success

3

emma.wong@reqres.in

Emma

Wong

Success

 

Scenario 4: With Target as Datalake but more than 1 filter condition

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to the Datalake.

Operations Used:

image.png

  • To send the records to Datalake, we will use the operation Singleline to Multiline as your records are in singleline JSON data.
  • Now using Append operation, we will add one dummy key-value pair as this will help to aggregate all records after using Filter operation more than once.
  • Here we will Aggregate Data based on a dummy key.
  • Once the key is created, apply the filter conditions along with the Append operation to create a new column for showing the status of the conditions applied.
  • Now, aggregate all the records using the dummy key created earlier and send the records to the target.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
}

Condition 1

·       Filter Condition: 

data['api_status'] == 200

·       Append Operation: 

"Status": "Success"

Condition 2

·       Filter Condition: 

data['api_status'] != 200

·       Append Operation: 

"Status": "Failure"

Output in Datalake:

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Success

2

janet.weaver@reqres.in

Janet

Weaver

Success


Scenario 5: With Target as Database but more than 1 filter condition

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to the Database.

Operations Used:

image.png

  • To send the records to Datalake, we will use the operation Singleline to Multiline as your records are in singleline JSON data.
  • Now using Append operation, we will add one dummy key-value pair as this will help to aggregate all records after using Filter operation more than once.
  • Here we will Aggregate Data based on a dummy key.
  • Once the key is created, apply the filter conditions along with the Append operation to create a new column for showing the status of the conditions applied.
  • Now, aggregate all the records using the dummy key created earlier.
  • To send the records to the Database, create a tuple using Singleline to Tuple and send the records to the target.

Example: Data in Source

{
      "id": 4,
      "email": "eve.holt@reqres.in",
      "first_name": "Eve",
      "last_name": "Holt"
    },
    {
      "id": 5,
      "email": "charles.morris@reqres.in",
      "first_name": "Charles",
      "last_name": "Morris"
}

Condition 1

·       Filter Condition: 

data['api_status'] == 200

·       Append Operation: 

"Status": "Success"

Condition 2

·       Filter Condition: 

data['api_status'] != 200

·       Append Operation: 

"Status": "Failure"

Output in Database:

id

email

first_name

last_name

Status

4

eve.holt@reqres.in

Eve

Holt

Success

5

charles.morris@reqres.in

Charles

Morris

Success


Scenario 6: With Target as API but more than 1 filter condition

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to the target (API).

Operations Used:

image.png

  • To send the records to Datalake, we will use the operation Singleline to Multiline as your records are in singleline JSON data.
  • Now using the Append operation, we will add one dummy key-value pair as this will help to aggregate all records after using the Filter operation more than once.
  • Here we will Aggregate Data based on a dummy key.
  • Once the key is created, apply the filter conditions along with the Append operation to create a new column for showing the status of the conditions applied.
  • Now, aggregate all the records using the dummy key created earlier and send the records to the target which is API.

Note: Other required operations can also be used in the integration process based on the target and data type.

Example: Data in Source

{
      "id": 7,
      "email": "michael.lawson@reqres.in",
      "first_name": "Michael",
      "last_name": "Lawson"
    },
    {
      "id": 8,
      "email": "lindsay.ferguson@reqres.in",
      "first_name": "Lindsay",
      "last_name": "Ferguson"
}

Condition 1

·       Filter Condition: 

data['api_status'] == 200

·       Append Operation: 

"Status": "Success"

Condition 2

·       Filter Condition: 

data['api_status'] != 200

·       Append Operation: 

"Status": "Failure"

Output in Target:

id

email

first_name

last_name

Status

7

michael.lawson@reqres.in

Michael

Lawson

Success

8

lindsay.ferguson@reqres.in

Lindsay

Ferguson

Success


Scenario 7: With more than 1 filter condition and multiple Targets

Case A: Database and Datalake as Target

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to multiple targets which are Database and Datalake.

Operations Used:

image.png

In this scenario, we have two targets, Database and Datalake. Therefore, we will be sending the records to Database, followed by Datalake.

  • After applying the filter conditions and transforming the data based on the requirements and adding a meta column to our filtered data.
  • To send data to Database as target, We have to filter the data now with the dummy key-value pair condition after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Tuple Operation and insert the data to Database.
  • After sending the data we will use a Filter Ends operation.
  • To send data to Datalake as target, We have to filter the data now with the dummy key-value pair condition again after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and ingest the data to Datalake.

Note: In the two keys created using Append operation, first key will represent the filter condition whereas the second key will help in aggregating the records for second filter condition.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
    },
    {
      "id": 4,
      "email": "eve.holt@reqres.in",
      "first_name": "Eve",
      "last_name": "Holt"
    },
    {
      "id": 5,
      "email": "charles.morris@reqres.in",
      "first_name": "Charles",
      "last_name": "Morris"
    },
	{
      "id": 6,
      "email": "tracey.ramos@reqres.in",
      "first_name": "Tracey",
      "last_name": "Ramos",
      "avatar": "https://reqres.in/img/faces/6-image.jpg"
    },
    {
      "id": 7,
      "email": "michael.lawson@reqres.in",
      "first_name": "Michael",
      "last_name": "Lawson"
    },
    {
      "id": 8,
      "email": "lindsay.ferguson@reqres.in",
      "first_name": "Lindsay",
      "last_name": "Ferguson"
}

Condition 1 (For Database)

·       Filter Condition: 

data['id'] == 4 or data['id'] == 5

·       Append Operation:

"Status": "Success",
"filter_1": "yes"

Condition 2 (For Datalake)

·       Filter Condition: 

data['id'] !=4 and data['id']!=5

·       Append Operation:

"Status": "Failure",
"filter_2": "no"
  • Filter Operation to Aggregate Data:
data['k1'] =='v1'

Note: The filter operation to aggregate data is being used in 3rd and 4th filter operation in the bridge.

Output in Target (Database):

id

email

first_name

last_name

Status

4

eve.holt@reqres.in

Eve

Holt

Success

5

charles.morris@reqres.in

Charles

Morris

Success

 Output in Target (Datalake):

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Failure

2

janet.weaver@reqres.in

Janet

Weaver

Failure

3

emma.wong@reqres.in

Emma

Wong

Failure

6

tracey.ramos@reqres.in

Tracey

Ramos

Failure

7

michael.lawson@reqres.in

Michael

Lawson

Failure

8

lindsay.ferguson@reqres.in

Lindsay

Ferguson

Failure

 

Case B: Datalake and Database as Target

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to multiple targets which are Datalake and Database.

Operations Used:

image.png

In this scenario, we have two targets, Datalake and Database. Therefore, here first we will be sending the records to Datalake, followed by Database.

  • After applying the filter conditions and transforming the data based on the requirements and adding a meta column to our filtered data.
  • To send data to Datalake as target, We have to filter the data now with the dummy key-value pair condition after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and ingest the data to Datalake.
  • After sending the data we will use a Filter Ends operation.
  • To send data to Datalake as target, We have to filter the data now with the dummy key-value pair condition again after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and ingest the data to Datalake.

Note: In the two keys created using Append operation, first key will represent the filter condition whereas the second key will help in aggregating the records for second filter condition.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
    },
    {
      "id": 4,
      "email": "eve.holt@reqres.in",
      "first_name": "Eve",
      "last_name": "Holt"
    },
    {
      "id": 5,
      "email": "charles.morris@reqres.in",
      "first_name": "Charles",
      "last_name": "Morris"
    },
	{
      "id": 6,
      "email": "tracey.ramos@reqres.in",
      "first_name": "Tracey",
      "last_name": "Ramos",
      "avatar": "https://reqres.in/img/faces/6-image.jpg"
    },
    {
      "id": 7,
      "email": "michael.lawson@reqres.in",
      "first_name": "Michael",
      "last_name": "Lawson"
    },
    {
      "id": 8,
      "email": "lindsay.ferguson@reqres.in",
      "first_name": "Lindsay",
      "last_name": "Ferguson"
}

Condition 1 (For Datalake)

·       Filter Condition: 

data['id'] == 3

·       Append Operation:

"Status": "Success",
"filter_1": "yes"

Condition 2 (For Database)

·  Filter Condition: 

data['id'] !=3

·       Append Operation:

"Status": "Failure",
"filter_2": "no"
  • Filter Operation to Aggregate Data:
data['k1'] =='v1'

Note: The filter operation to aggregate data is being used in 3rd and 4th filter operation in the bridge.

Output in Target (Datalake):

id

email

first_name

last_name

Status

3

emma.wong@reqres.in

Emma

Wong

Success

 Output in Target (Database):

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Failure

2

janet.weaver@reqres.in

Janet

Weaver

Failure

4

eve.holt@reqres.in

Eve

Holt

Failure

5

charles.morris@reqres.in

Charles

Morris

Failure

6

tracey.ramos@reqres.in

Tracey

Ramos

Failure

7

michael.lawson@reqres.in

Michael

Lawson

Failure

8

lindsay.ferguson@reqres.in

Lindsay

Ferguson

Failure


Case C: Database and API as Target

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to multiple targets which are Database and API.

Operations Used:

image.png

In this scenario, we have two targets, API and Database. Therefore, here first we will be sending the records to API, followed by Database.

  • After applying the filter conditions and transforming the data based on the requirements and adding a meta column to our filtered data.
  • To send data to Database as target, We have to filter the data now with the dummy key-value pair condition after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Tuple Operation and insert the data to Database.
  • After sending the data we will use a Filter Ends operation.
  • To send data to API as target, We have to filter the data now with the dummy key-value pair condition again after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and send the data as request body into the API target.

Note: In the two keys created using Append operation, first key will represent the filter condition whereas the second key will help in aggregating the records for second filter condition.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
    },
    {
      "id": 4,
      "email": "eve.holt@reqres.in",
      "first_name": "Eve",
      "last_name": "Holt"
    },
    {
      "id": 5,
      "email": "charles.morris@reqres.in",
      "first_name": "Charles",
      "last_name": "Morris"
    },
	{
      "id": 6,
      "email": "tracey.ramos@reqres.in",
      "first_name": "Tracey",
      "last_name": "Ramos",
      "avatar": "https://reqres.in/img/faces/6-image.jpg"
    },
    {
      "id": 7,
      "email": "michael.lawson@reqres.in",
      "first_name": "Michael",
      "last_name": "Lawson"
    },
    {
      "id": 8,
      "email": "lindsay.ferguson@reqres.in",
      "first_name": "Lindsay",
      "last_name": "Ferguson"
}

Condition 1 (For Database)

·       Filter Condition: 

data['id'] == 3

·       Append Operation:

"Status": "Success",
"filter_1": "yes"

Condition 2 (For API)

·       Filter Condition: 

data['id'] !=3

·       Append Operation:

"Status": "Failure",
"filter_2": "no"
  • Filter Operation to Aggregate Data:
data['k1'] =='v1'

Note: The filter operation to aggregate data is being used in 3rd and 4th filter operation in the bridge.

Output in Target (API):

id

email

first_name

last_name

Status

3

emma.wong@reqres.in

Emma

Wong

Success

 Output in Target (Database):

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Failure

2

janet.weaver@reqres.in

Janet

Weaver

Failure

4

eve.holt@reqres.in

Eve

Holt

Failure

5

charles.morris@reqres.in

Charles

Morris

Failure

6

tracey.ramos@reqres.in

Tracey

Ramos

Failure

7

michael.lawson@reqres.in

Michael

Lawson

Failure

8

lindsay.ferguson@reqres.in

Lindsay

Ferguson

Failure


Case D: API and Datalake as Target

In this scenario, more than one filter condition will be applied to the data coming from the source and then the filtered record will be sent to multiple targets which are API and Datalake.

Operations Used:

image.png

In this scenario, we have two targets, API and Database. Therefore, here first we will be sending the records to API, followed by Datalake.

  • After applying the filter conditions and transforming the data based on the requirements and adding a meta column to our filtered data.
  • To send data to API as target, We have to filter the data now with the dummy key-value pair condition again after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and send the data as request body into the API target.
  • After sending the data we will use a Filter Ends operation.
  • To send data to Datalake as target, We have to filter the data now with the dummy key-value pair condition again after this we will Aggregate the data based on the meta column added using Data aggregation operation and followed by Singleline to Multiline Operation and ingest the data to Datalake.

Note: In the two keys created using Append operation, first key will represent the filter condition whereas the second key will help in aggregating the records for second filter condition.

Example: Data in Source

{
      "id": 1,
      "email": "george.bluth@reqres.in",
      "first_name": "George",
      "last_name": "Bluth"
    },
    {
      "id": 2,
      "email": "janet.weaver@reqres.in",
      "first_name": "Janet",
      "last_name": "Weaver"
    },
    {
      "id": 3,
      "email": "emma.wong@reqres.in",
      "first_name": "Emma",
      "last_name": "Wong"
    },
    {
      "id": 4,
      "email": "eve.holt@reqres.in",
      "first_name": "Eve",
      "last_name": "Holt"
    },
    {
      "id": 5,
      "email": "charles.morris@reqres.in",
      "first_name": "Charles",
      "last_name": "Morris"
}

Condition 1 (For API)

·       Filter Condition: 

data['id'] !=5

·       Append Operation:

"Status": "Success",
"filter_1": "yes"

Condition 2 (For Datalake)

·       Filter Condition: 

data['id'] == 5

·       Append Operation:

"Status": "Failure",
"filter_2": "no"
  • Filter Operation to Aggregate Data:
data['k1'] =='v1'

Note: The filter operation to aggregate data is being used in 3rd and 4th filter operation in the bridge.

Output in Target (API):

id

email

first_name

last_name

Status

1

george.bluth@reqres.in

George

Bluth

Success

2

janet.weaver@reqres.in

Janet

Weaver

Success

3

emma.wong@reqres.in

Emma

Wong

Success

4

eve.holt@reqres.in

Eve

Holt

Success

 Output in Target (Datalake):

id

email

first_name

last_name

Status

5

charles.morris@reqres.in

Charles

Morris

Failure