Batch Analytics

Batch Analytics with Data Lake Analytics Service

Header Image

Process big data jobs in seconds with Azure Data Lake Analytics. There is no infrastructure to worry about because there are no servers, virtual machines, or clusters to wait for, manage, or tune. Instantly scale the processing power, measured in Azure Data Lake Analytics Units (AU), from one to thousands for each job. You only pay for the processing that you use per job.

U-SQL is a simple, expressive, and extensible language that allows you to write code once and have it automatically parallelized for the scale you need. Process petabytes of data for diverse workload categories such as querying, ETL, analytics, machine learning, machine translation, image processing, and sentiment analysis by leveraging existing libraries written in .NET languages, R, or Python.

In this lab Learn how to

Create Azure Data Lake Analytics Service

Create Data Lakae Analytics service to mine data stored in Data Lake Store.

Click on Create a resource

Create Datalake Analytics Service

Click on Data + Analytics

Create Datalake Analytics Service

Click on Data Lake Analytics

Create Datalake Analytics Service

Pick the Data Lake Store where device telemetry data is being stored from Stream Analytics job

Pick A Store

Use existing resource group and click on Create button

Create Service

Create Sample Data and Install Extensions

Click on Sample scripts

Create Sample Data

Click on sample data missing button to create sample data in Data lake Store

Create Sample Data

You should see successful message after data is copied

Create Sample Data

Install Extensions

Install Extensions

Successful Extension installation

Install Extensions

VS Code Integration

Submit Job using VS Code, Try Samples First to Learn through Data Lake Analytics

Install VS Code Extension for Data Lake Analytics

Install Extensions

Run Samples

Run Samples to learn Data Lake Analytics

Run Samples

Compile Script

Compile Script

Click on List more accounts

![Run Samples]images/11_VSCode_Open_Sample_Script_Compile_Select_Account.png)

Select a Data Lake Analytics Account

Select Account

Select master key

Select master key

Compile as USQL

Compile USQL

USQL script should be compiled

Compiled USQL

Submit Job To Run

Submit Job

Default priority is 1000 and number default number of nodes to run the script are 5

Submit Job Success

Job Success with Job Analytics

Job Success

View Input File

Veiw Input File

View Output File

View Output File

Create an Analytics Job against MXChip Data to convert JSON to CSV using U-SQL and Data Lake Analytics

Create a new mxchip_analytics.usql file in the project

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

//Extract the Json string using a default Text extractor. 

@json = 
    EXTRACT jsonString string FROM @"/workshop/streaming/2018/03/{*}/{*}.json" USING Extractors.Tsv(quoting:false);

//Use the JsonTuple function to get the Json Token of the string so it can be parsed later with Json .NET functions

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS rec FROM @json;

@columnized = SELECT 
            rec["deviceId"] AS deviceId,
            rec["temperature"] AS temperature,
            rec["humidity"] AS humidity,
            rec["time"] AS time
    FROM @jsonify;



//Output the file to a tool of your choice.

OUTPUT @columnized
TO @"/workshop/output/out.csv"
USING Outputters.Csv();

Register two assemblies Newtonsoft and Samples.formats. Download the dlls from /libs folder and register

U-SQL Analytics

Select the dlls from the /libs folder

U-SQL Analytics

Submit Job

Submit Job to convert all JSON files to CSV files

U-SQL Analytics

View Jobs

U-SQL Analytics