Skip to main content

Data Wrangling

Single Line to Multiline

Description :

Single Line to Multiline operation is used to convert Single Line JSON to Multiline JSON. 
The resulting JSON string can be used to store or transmit data in a structured format.

Number of Parameters : 1

Parameter : Chopkey

Chopkey Adds a new key on the fly with its value as a static value or dynamic value.

Below is an example where we are using chopkey operation to add new key.

['dataset_response']['data']


Delimiter to JSON

Description :

Delimiter to JSON operation is used to convert Delimiter data to JSON data. 
The resulting JSON string can be used to store or transmit data in a structured format.

Number of Parameters : 6

Parameter : Key_data

Key_data holds the value of delimiter data which we want to convert into JSON data.

Below is an example where we are using DataTable as key.

['DataTable']
Parameter : Delimiter

Delimiter is used to separate values in a list, record or file.

Below is an example where we are using , as delimiter.

,
Parameter : Fields

Files is used in to specify headers name, it is not mandatory if the user wants then they can specify the fields else it will be passed as empty and pre defined fields will be generated.

Below is an example where we are using headers Customer ID, Organization Name, Month and Item as header name.

"Customer ID","Organization Name","Month","Item"
Parameter : Autodetect_column_names

Autodetect_column_names if the user defines fields then it will be false else it will be true. 

Below is an example we are taking true which is for pre defined field. 

true

Parameter : Skip_header

If user defines the fields it will be true else it will be false.
Below is an example we are taking false which is for predefined fields.

 false


Parameter : Response_key

Response_key is the key in which our delimiter data will be stored

Below is an example where  the JSON data will be store datatable. 



datatable




JSON to Delimiter

Description :

JSON to Delimiter is used to convert JSON data to Delimiter data. 
The resulting Delimiter data can be used to store or transmit data in a structured format.

Number of Parameters : 6

Parameter : Key_data

Key_data holds the value of JSON data which needs to be converted into delimiter data.

Below is an example where we are giving the key name ['items'] which holds our JSON data.

['items']
Parameter : Delimiter

Delimiter is used to separate values in a list, record, or file.

Below is an example where we using \t as delimiter.

\t
Parameter : Fields

Files is used in to specify headers name, it is not mandatory if the user wants then they can specify the fields else it will be passed as empty and pre defined fields will be generated.

Below is an example where we are using values like Item1,Item2 and Item3 as header name.

 "Item1","Item2","Item3" 
Parameter : Autodetect_column_names

Autodetect_column_names if the user defines fields then it will be false else it will be true. 

Below is an example we are taking true which is for pre defined field. 

true
Parameter : Skip_header

If user defines the fields it will be true else it will be false.

Below is an example we are taking false which is for predefined fields.

false
Parameter : Response_key

Response_key is the key in which our delimiter data will be stored

Below is an example where  the JSON data will be store delimiter_data.   

delimiter_data



Data Aggregation

Description: 

Data aggregation operation is used for the processing of raw data, this will also help in the grouping, summarizing, and processing of the data to make it easier to understand and analyze.

Number of Parameters : 3


Parameter : Agg_data_key

Agg_data_key is passed as empty for multiline data and will have data key in case of single line data.

Below is an example where we using
['product_data_response']['items'] as agg_data_key for singleline data


['product_data_response']['items']

Parameter : Groupby_key

Groupby_key gives key name that the user wants to group by, basically this is a unique identifier in the dataset.

Below is an example where we using Orders as Groupby_key as a unique identifier of a dataset which is having Order and Order Line Items, for the one Order we can have multiple items.

"Orders"


Parameter : Array_key

Array_key gives the key where the user wants to hold the common keys, the user can provide any key name

Below is an example where Order Lines holds a common key.

"Order Lines"

Parameter : Array_key_nested_columns

Array_key_nested_columns gives the key name as comma separated keys that should be reflected in the level of groupby_key. This will be outside the array key.

Below is an example where we using id, name and year as column names

"id","name","year"


Unpivot

Description:

An unpivot operation is used to convert single object into list of object based on transposed values tracking transpose key name parameters. 

Number of Parameters : 2

Parameter : Transpose_key_name

Transpose_key_name used in specify name of key which needs to be transposed

Below is an example where we using bucket_type and bucket_value as values that needs to be transpose. 

"bucket_type","bucket_value"


Parameter : Transpose value

Transpose value used in specify the values which needs to be transposed

Below is an example where the transpose values are on_hand, purchase_orders and goods_in_transit

"on_hand", "purchase_orders","goods_in_transit"



Pivot

Description: 

In pivot operation we are combining multiple dictionaries (object) into a single dictionary (object) based on the transposed value and the transposed key name provided by the user.

Number of Parameters : 3

Parameter : Get key

Get key is will be passed as empty for multiline data and will have get key in case of single line data.

Below is an example where we are using get key as ['product_data_response']['items'] because of singleline data.

['product_data_response']['items'] 
Parameter : Transpose_key_name

Transpose_key_name specifies name of the key which needs to be transposed.

Below is an example where we are using Item Id and Item Name as key name.

"Item Id","Item Name"

Parameter : 
Transpose value

Transpose value specifies which values needs to be transposed.

Below is an example where we are using some particular item ids to be transposed OrderID-1, OrderID-2, OrderID-3

"OrderID-1", "OrderID-2", "OrderID-3"



Single Line to Tuple

Description: 

Single Line to Tuple operation is used to convert a single line of data to a tuple.


Number of Parameters : 3

Parameter : Singleline_key

Singleline_key helps in reading the dataset from a single line.

Below is an example where we are using a key DataTable holds the single line data.

"DataTable"
Parameter : Table_headers

Table_headers specifies the sequence of the converted tuple data. 

Below is an example where we are using a key names Item, Customer and Month the values of these key names will appear in same sequence in the tuple data.

"Item","Customer","Month"
Parameter : Tuple_key

Tuple_key is the key which is holds tuple data.

Below is an example where we are using datatable as tuple key.

datatable


Tuple to Single line

Description: 

Tuple to Single line operation is used in convert a tuple into a single line, which involves taking a tuple and converting it to a singleline string.

Number of Parameters : 3

Parameter : Tuple_key

Tuple_key is used to read the user's tuple data.

Below is an example where we are using DataTable which holds the tuple data .

'DataTable'


Parameter : Headers

Table_headers are the headers or key name of the new JSON.

Below is an example where we are using key name's Item, Customer and Month as the values of these key names will appear in same sequence in the singleline data.

"Item","Customer","Month"
Parameter : Singleline_key

Singleline_key helps in to storing the converted singleline data.

Below is an example where we are using datatable to store the singleline data.

"datatable"




Grok Pattern

Description: 

Grok operation is used for parsing log files and extracting structured data from unstructured log lines. It employs predefined patterns to efficiently identify and capture specific types of information. Here is a list of commonly used Grok patterns:

  • WORD: Matches a single word (sequence of letters).
  • NUMBER: Matches any integer or floating-point number.
  • INT: Matches an integer.
  • BASE10NUM: Matches a base-10 number.
  • POSINT: Matches a positive integer.
  • NONNEGINT: Matches a non-negative integer.
  • NEGINT: Matches a negative integer.
  • UUID: Matches a Universally Unique Identifier (UUID).
  • IP: Matches an IP address (IPv4 or IPv6).
  • EMAILADDRESS: Matches an email address.
  • HOSTNAME: Matches a hostname.
  • URIPROTO: Matches the protocol part of a URI (e.g., http, ftp).
  • URIPATH: Matches the path part of a URI.
  • URI: Matches a complete URI.
  • USERNAME: Matches a username.
  • DATA: Matches any character sequence.
  • GREEDYDATA: Matches any character sequence but consumes as much as possible.
  • TIMESTAMP_ISO8601: Matches a timestamp in ISO 8601 format (e.g., "2023-09-13T12:34:56.789Z").
  • HTTPD_COMMONLOG: Matches the common log format used in web server logs.
  • HTTPD_COMBINEDLOG: Matches the combined log format used in web server logs.
  • SYSLOGTIMESTAMP: Matches a timestamp in syslog format.
  • SYSLOGHOST: Matches the hostname in syslog format.
  • SYSLOGPROG: Matches the program name in syslog format.
  • SYSLOGMESSAGE: Matches the syslog message.
  • QUOTEDSTRING: Matches a string enclosed in double or single quotes.
  • PATH: Matches a file system path.
  • URL: Matches a URL.
  • USERAGENT: Matches a user-agent string from a web log.
  • WORDNUM: Matches a word followed by a number.
  • UUID4: Matches a UUID version 4.
  • MAC: Matches a MAC address.
  • POSREAL: Matches a positive real number.

These patterns enable the Grok operation to efficiently process log data and extract relevant information, facilitating better analysis and understanding of system logs. Users can customize their log parsing by leveraging these patterns to suit the specific needs of their applications.


Number of Parameters : 2

Parameter : input_key

In the input_key parameter, users are required to specify the key from which they intend to extract the data. This key serves as the reference point for the Grok operation to identify and capture the relevant information based on the predefined patterns.

For instance, when utilizing the input_key parameter, consider a scenario where the specified key is 'Details.'

Details : This is endpoint url https://www.example.com/path/to/resource for mac add 00:1A:2B:3C:4D:5E and v4 192.168.1.1 and V6 2001:0db8:85a3:0000:0000:8a2e:0370:7334.


Within the 'Details' key, the data encapsulates an endpoint URL, a MAC address (00:1A:2B:3C:4D:5E), and both IPv4 (192.168.1.1) and IPv6 (2001:0db8:85a3:0000:0000:8a2e:0370:7334) addresses .

Parameter : grok_pattern


In the grok_pattern parameter, users can specify a predefined pattern to guide the extraction of data. This pattern serves as a template, enabling the Grok operation to accurately identify and capture relevant information from the input data according to the defined structure.

For instance, when utilizing the grok_pattern parameter, let's consider a scenario where we input the pattern 'grok_pattern.' This specified pattern guides the Grok operation in parsing and extracting data from the input based on the provided template.


grok_pattern : This is endpoint url %{URI:endpoint_url} for mac add %{MAC:mac_address} and v4 %{IPV4:ip_address_v4} and V6 %{IPV6:ip_address_v6}.


Expected Result :

Details: This is endpoint url https://www.example.com/path/to/resource for mac add 00:1A:2B:3C:4D:5E and v4 192.168.1.1 and V6 2001:0db8:85a3:0000:0000:8a2e:0370:7334
endpoint_url: https://www.example.com/path/to/resource
mac_address: 00:1A:2B:3C:4D:5E
ip_address_v4: 192.168.1.1 
ip_address_v6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334