Naiad-Timely Dataflow Model was done by a team from Microsoft Research in Silicon Valley in 2013 and it is available under Apache 2.0 open source license. Naiad is an open source “.Net” based system which was developed for high throughput, low latency and incremental computation over other existing contemporary batch processing, graph processing and stream processing systems techniques such as MapReduce, Dryad, Spark, Storm, Pregel, GraphLab, etc. It can be taken as next level of thinking after Dryad, which used Directed Acyclic Graphs (DAG), as Naiad supports directed dataflow graphs with structured cycles as well.

Naiad coordinates all the processing jobs by timestamps. These timestamps added to messages by timely dataflow, which flows between vertices in the graph. This feature enables the model in a few ways as, stateful vertices to consume and produce messages asynchronously, structured loops which allow feedback in the data flow, and notifications for vertices whenever all records have been received for a given input or loop.

Contribution

Many big data algorithms contain loops, and these loops are often data-dependent, and it keeps computation iterating until the answer doesn’t change anymore. When the answer starts to stabilize, there is a data redundancy between each iteration and previous iterations as much of the data in both iterations are same.

Within the past few years big data platforms like Hadoop was introduced and people were more ambitious to process their data in the cloud and they wanted to do the graph processing, stream processing, batch processing with huge data sources. Those problems lead the Microsoft team towards Naiad implementation. It is designed to solve above mentioned challenging problems with a support of few key factors like executing loops, maintaining state, reacting quickly to incoming data and providing high performance environment to run in a scalable distributed system.

Some of the Naiad provided features can be achieved with existing systems but those applications depend on different platforms and have different effects on efficiency, maintainability, and simplicity. Naiad combines all these features into one general framework which supports low latency streaming, high throughput batch processing as well as coordination within a dataflow.

Naiad runs simple programs with a speed of existing general purpose platforms and complex programs with a speed of specialized systems for machine learning, graph analysis and stream processing. Most importantly, Naiad can be effectively used in a range of simple program to a large scale distributed systems. Naiad enables the power of graph analysis, machine learning, and stream processing in a single framework.

This is the first solution to bring the power of these technologies together.It may be possible to combine the existing systems and come up with some sort of a solution for a given scenario, but when compared that to Naiad as a single platform solution it is typically more efficient, maintainable and extensible.

Categorized in: