https://mprpdfartifactstore.azureedge.net/publicartifactsmigration/Microsoft.DataX.1.0.1-preview/Icons/Large.png

DataX for Microsoft Azure

Microsoft

DataX for Microsoft Azure

Microsoft

An easy way to set up and run a streaming big data pipeline on Apache Spark

Data Accelerator for Apache Spark simplifies streaming big data using Spark. Data Accelerator has been used for two years within Microsoft for processing streamed data across many internal deployments handling data volumes at Microsoft scale. Offering an easy to use platform to learn and evaluate your streaming needs and requirements, we are excited to share this project with the wider community as open source.

A few of the ways Data Accelerator will make it easier to build a streaming pipeline on spark:

  • Plug and Play: Easily set up input sources and output sinks in order to establish a pipeline in minutes. Data Accelerator supports reading from both Eventhub and IoThub and supports sinking data to Azure blobs, CosmosDB, Eventhub, and more.
  • No-Code Experience: Set up alerts and data processing without writing any code. Through a rules designer experience you can specify simple and aggregate data processing, tagging, and alerts.
  • SQL queries: Write complex processing in SQL – no need to work in Scala. The built in extensibility model also supports User Defined Functions and leveraging Azure Functions – e.g., for ML mid-stream.
  • Live query: Validate your queries in seconds by running against a sample of incoming data, saving hours of work setting up and testing the processing of your pipeline.
https://mprpdfartifactstore.azureedge.net/publicartifactsmigration/Microsoft.DataX.1.0.1-preview/Screenshots/Image01.png
https://mprpdfartifactstore.azureedge.net/publicartifactsmigration/Microsoft.DataX.1.0.1-preview/Screenshots/Image01.png