Skip to main content

Data Source - Bizintel360 Data Lake

Bizintel360 Data Lake is a search engine based NO-SQL database owned by Bizdata. Users can ingest petabyte, zettabyte, yottabyte  of records both structured and unstructured for Analytics, Storage, Machine Learning and deep learning.

Bizintel360 Data Lake Source is a connection pool in eZintegrations platform to retrieve data in JSON format.

Response from Bizintel360 Data Lake source is stored in key `bizdata_dataset_response` . If you are using Single Line to Multiline Operations as a next operation in your pipeline then the `Chop key` will have value as 

['bizdata_dataset_response']

Bizintel360 Data Lake Source have following Parameters :

Data Lake Version : Data Lake Version is the Data Lake Name and its version assigned to the organization by Bizdata

Index / Table Name: Index or Table name that you want to retrieve data from Data Lake. For the List of table name or index, check `Datalake` section in Bizintel360 Visualization product.

Pagination Wait Time : By default it is 2m, where m is minute. Pagination is a standard API capability. Bizintel360 Data Lake source retrieve data in paginated way. This parameter is to set as how long need to wait for next page. If the response of Table/Index is very high ( having 100+ keys in a single record) then try increasing the `Pagination Wait Time'. In general `2m` is sufficient to stream data. Use this when you have high network traffic congestion.

Can also use `h` for hours and `s` for seconds

Timeout : By default it is `2m`, where m is minute. In general `2m` is high enough to get response from Bizintel360 Data Lake. Increase this when response from Bizintel360 Data Lake is slow. This can happen when the Data Lake Cluster size is small. Reach out to Bizdata support team to make a increase in cluster size of Bizintel360 Data Lake.

Can also use `h` for hours and `s` for seconds

Size : By default it is `1000`. Size is number of streaming record count from Bizintel360 Data Lake source. The source will stream the records inside pipeline in size of 1000 chunks and move to operations and finally to Data Target. This can be increased to max of 10,000 records in case of use case like one-time historical data loads.

For better performance and durability `1000` is recommended size. This size helps to realize 1000 records in target faster and makes perfection in real-time data processing.

Query : JSON Body based query to retrieve data from tables/index of Bizintel360 Data Lake.

Get all the Records from a table

This below example responds with all the records from a table. This is similar to sql select * from table

{
    "query": {
        "match_all": {}
    }
}

Get all Records with specific columns/keys from a table

In the below example `store_number` and `customer_number` are two keys in the Data Lake table. It will respond with all the records with those keys only. This is similar to sql select store_number,customer_number from table 

{
      "_source": ["store_number", "customer_number"],
      "query": {
            "match_all": {}
    }
}

Get specific records with specific column/keys from a table

In the below example `employee_id` is a key whose value is `130` and it responds with only two keys that is `employee_id` and `employee_name`. This is similar to sql select employee_id,employee_name from table where employee_id=130

{
      "query": {
            "match": {
                  "employee_id": 130
        }
    },
        "_source": {
                "includes": ["employee_id", "employee_name"]
    }
}

Get Specific Columns/keys with Filters from a table

Below example is similar to sql

select Project,title,Assigned To,Priority,Created By,createdDateTime,dueDateTime from table
where Project='Project ABC' 
and Priority is not null
and percentComplete=100

{
    "size": 50,
    "sort": [{}
    ],
    "_source": ["Project", "title", "Assigned To", "Priority", "Created By", "createdDateTime", "dueDateTime"],
    "query": {
        "bool": {
            "must": [{
                    "query_string": {
                        "query": "*"
                    }
                }, {
                    "query_string": {
                        "query": "Project:\"Project ABC\" AND Priority:[* TO *] AND NOT percentComplete:100"
                    }
                }, {
                    "bool": {
                        "should": []
                    }
                }
            ],
            "must_not": []
        }
    }
}

Get Specific Columns/keys with Filters from a table when key name have spaces

Below example is similar to sql

SELECT ThreadId, Ticket Created At
FROM YourTableName
WHERE Status != 'Closed' AND Thread Type = 'create'

In the below JSON body you can see Thread Type  which is having a space between column is written as  with double back slashes Thread\\ Type

{
    "size": 1000,
    "sort": [{}
    ],
    "_source": ["ThreadId", "Ticket Created At"],
    "query": {
        "bool": {
            "must": [{
                    "query_string": {
                        "query": "*"
                    }
                }, {
                    "query_string": {
                        "query": "NOT Status:\"Closed\" AND Thread\\ Type: \"create\""
                    }
                }, {
                    "bool": {
                        "should": []
                    }
                }
            ],
            "must_not": []
        }
    }
}

SELECT * FROM table WHERE asn IS NULL

{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "asn"
        }
      }
    }
  }
}

SELECT id,ipAddress FROM table WHERE asn IS NULL

{
  "_source": ["id", "ipAddress"],
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "asn"
        }
      }
    }
  }
}

SELECT * FROM table WHERE asn IS NULL AND ipAddress = '{%ipAddress%}'

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "ipAddress": "{%ipAddress%}"
          }
        }
      ],
      "must_not": [
        {
          "exists": {
            "field": "asn"
          }
        }
      ]
    }
  }
}

SELECT asn, as FROM table WHERE as IS NOT NULL AND ipAddress = '{%ipAddress%}' 

{
  "_source": ["asn", "as"],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "ipAddress": "{%ipAddress%}"
          }
        },
        {
          "exists": {
            "field": "as"
          }
        }
      ]
    }
  }
}

SELECT asn, as FROM table WHERE as IS NOT NULL AND ipAddress = '{%ipAddress%}' TOP 1

{
  "_source": ["asn", "as"],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "ipAddress": "{%ipAddress%}"
          }
        },
        {
          "exists": {
            "field": "as"
          }
        }
      ]
    }
  },
  "size": 1,
    "terminate_after":1
}

SELECT asn, as FROM table WHERE as IS NOT NULL AND ipAddress = '{%ipAddress%}' order record by ascending TOP 1

{
  "_source": ["asn", "as"],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "ipAddress": "{%ipAddress%}"
          }
        },
        {
          "exists": {
            "field": "as"
          }
        }
      ]
    }
  },
  "size": 1,
  "terminate_after":1
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}