Business Analytics

Why Data Native? Benefits and Philosophy Behind Going Data Native

Posted on May 13th, 2017 | Steve Wooledge

What is data native?

Data native refers to the use of analytics and data-centric applications that run where the data resides, without data movement. Instead of external applications that need to pull the necessary data from your data sources, data-native applications sit within the data source to accelerate performance and reduce system complexity.

Simply put, data native is the future of BI and analytics technologies.

The data native philosophy

We live in an age of agility where everything from software development to end-user engagement is more iterative and interactive. Software application modifications are simple and expected. The philosophy behind data native is to build on software with the goal of making it complete.

For example, Apache Hadoop as a data platform is just that: a platform. To make it complete requires someone to build interfaces or applications which provide a “face” or user experience to access, analyze, and share insights. As technology becomes a greater part of our everyday lives, both personally and professionally, we find ourselves needing additional functionality from the tools we use.

Rather than creating entirely new tools that require extraction and filtering to sit next to the original software, data native tools sit that new functionality on top of the original software. It uses what already exists to its advantage.

Data-native Apache Hadoop

Hadoop, cloud, and modern scale-out data platforms are re-architecting the enterprise and data center for next-gen data applications. As the data “center of gravity” shifts to these platforms, business users demand direct access to analyze and understand their business from across all customer and operational data touch-points for a single version of the truth.

The majority of modern BI platforms are not data native. So they run outside of where the data resides and can only operate on extracts and aggregates of the data. This requires up-front data preparation and modeling, moving data outside of the original application or platform. This increases complexity, cost, and latency.

Moreover, performance lags as data is updated in batches since the systems can’t handle streaming updates. Business users, analysts, or citizen data scientists who want to go after all the data for high-definition insights are handcuffed by the platform they are using. Imagine having to give up your 4K HDTV for an analog TV. BI platforms that exist outside of where the data exists hinder analysis and overall performance.

Native Hadoop software is becoming more appealing to users because of the possibilities it presents in terms of minimizing workload, saving time, and lowering costs. But, Hadoop-native and data-native software also present the ability to get real-time insights from data in its native format. The goal is to get from data to insight as quickly as possible.

Benefits of going data native

There are a slew of benefits to going data native that affect the workflows and productivity of teams. These positive changes aren’t just good news for data architects and IT professionals, but the end users who need this data to be more effective in their work.

No extraction

Becoming data native eliminates the need for data extraction, so you’re able to work faster. You can simplify your workflow, lower costs, and decrease the involvement of administration while getting direct data visualization and analysis.

Once your data is in Hadoop or the cloud, it stays there. There is no longer a need for external storage systems where you would usually analyze your data. ETL (Extract, Transform, Load) is essentially eliminated from your processes external to the cluster/cloud, and you can dive right into the data when you need it.

Of course, there is always a need to have clean and organized data, some of which is done within native visual analytics platforms. But you can also take advantage of next-gen data wrangling and cataloging tools such as Trifacta, Waterline, or StreamSets to prepare and organize either while it’s flowing into the system, or once it’s inside,

Real-time data access

Because there’s no extraction, there is no waiting period between when you need your data and when you can access and analyze it. Data native means you’re working with the data immediately to get deeper business insights and drive more real-time data applications.

This is especially vital for creating a 360-degree view of your customer. After all, customer interactions are different day to day, and you cannot make decisions based on stale data. Analyze your most recent customer reviews, social interactions, and fulfilled orders on a rolling basis. There’s no more need to wait.

Scalability for large numbers of users

Data-native visual analytics and BI solutions are fully-distributed, massively parallel systems. By definition, they run inside the data platform and scale with the additional storage and processing resources. This provides linear scalability and ultra-fast performance by distributing query workloads and pushing them down into each data node or machine image in the cluster or cloud.

Sophisticated data-native solutions provide acceleration mechanisms to automatically cache frequently-accessed data or visuals and route incoming queries from users to these performance-optimized caches, either stored as aggregates on disk or in memory. This allows you to deploy analytics to hundreds and thousands of concurrent users both internally, as well as to customers, suppliers, and partners who can benefit from visibly deeper insights into their business.

Unified security

Since a data-native model allows you to remove the need for data extraction, it also reduces the security risks with your data.

First, you maintain that only authorized users have access to the data by providing role-based access controls which are automatically inherited from the underlying data platform. Second, access to data is provided in one location as opposed to multiple places. Finally, there is no dispersal of the data to multiple systems, but instead it remains in its original location. For example, healthcare organizations who must maintain tight controls for personally identifiable information (PII) do not want internal users of the system to extract data into desktop BI tools where data governance and auditing becomes exponentially more complicated. They want a browser-based way to provide authorized access to information which only visualizes (or displays) the information without a way for the user to download the data to their local machine.

System administration is simplified with a data-native approach. Administrators leverage the security of the underlying data platform and existing enterprise security standards. Administer security in one place and it applies from the platform up, through the end-user visual analytics tools and data applications. Arcadia Enterprise does this by importing group memberships from the underlying directory sources via SAML, PAM, Kerberos, LDAP, or other role membership information via Apache Sentry, or Apache Ranger.

Simplifying workflows

We’ve already stated that data-native applications allow you to remove extraction and secondary storage for your process. This means you’re simplifying your workflows across departments. Now, data is in the hands of those who need it more quickly, minimizing the usual steps present in a big data analysis workflow. Each department is empowered to get the data and insights they need so your data architects can focus their efforts elsewhere.

There is no need to build a pathway from the solution you’re using to Apache Hadoop (or whatever platform you use). You do not need to then transfer the data to another solution, run it back into Hadoop, and run analytics through Hadoop. When you do that you’re using external storage to create additional data and a lot of extra work.

Here at Arcadia Data, we take being data native seriously. Our visual analytics platform is a powerful data-native solution for Apache Hadoop and cloud architectures that gives you brilliant visualization and analytics from your data without extraction from the data platform. It also provides unified security and scalability that can easily take you from 100s to 1,000s of users. Register for a demo of Arcadia Data.