Today’s Big Data and AI workloads, especially in the IoT world, demand solutions that provide real-time streaming (i.e. Hardware and Software in the loop), handle complex measurement data, scale-out in petabytes, and offer access for multiple analytics teams.
Supporting different AI/ML and analytics workloads, while also meeting the need of Data Analysts to securely access all the data using their preferred AI tools, is quite a challenging task for any IT manager or team. Many times, IT organizations find themselves addressing these needs using multiple clusters, or a combination of compute and storage resources. Unfortunately, such environments are complicated to operate, create data silos and replication, don’t facilitate hardware and software optimization, and are intrinsically unsecure.
That’s the kind of stuff that keeps an IT manager awake night.
I do have good news to share, though. With HPE’s experience in complex AI and analytics environments, we have been able to integrate the HPE BlueData EPIC and Qumulo file system to provide customers like you with a simple,flexible and comprehensive AI environment.
We built out the following lab scenario based on a real customer’s IoT process:
- Qumulo FS provided a software-defined distributed file system designed specifically to support the enterprise. IoT data is generally provided in a special file format (e.g. MDF4), then converted into a different file format (Parquet/Avro) to optimize and speed up queries, which also allows efficient data compression and encoding schemes.
- HPE BlueData EPIC provided an enterprise-grade governance and security platform to manage multi-tenant environments. It allowed multiple analytics teams to access all the data they needed. The EPIC tool also created advanced analytics models, using Big Data scale-out environments such as Hadoop & Spark, to then utilize their preferred analytics tools.
HPE BlueData and Qumulo as a complement to create an enterprise-level AI environment
The goal of the solution tested was to leverage the HPE BlueData EPIC Software to create a single namespace containing multiple environments that could access data from a Qumulo environment. The results would then be analyzed for performance during the integration process.
- Qumulo’s software defined distributed file system is a scale-across solution capable of managing billions of files, large or small, seamlessly across an Enterprise environment.The file system offered real-time visibility, scale, and control of data without performance degradation, while also providing centralized access to files.
- HPE BlueData Software provided a platform for distributed AI, machine Learning, and analytics on containers. HPE BlueData EPIC Software provided multitenancy and data isolation to ensure logical separation between each project, group, or department within the organization. By creating a single environment that not only contained massive datasets, the software also enabled multiple teams to use their analytical tool of choice to work on the data. The result? Cluster sprawl and redundant copies of test data were avoided or eliminated entirely.
- HPE BlueData EPIC is integrated with Qumulo file systems via EPIC DataTap. EPIC DataTap implements a high-performance connection to remote data storage systems via NFS and HDFS. This allows unmodified Hadoop and AI/ML applications to run against data stored in the remote NFS and HDFS without any modification or loss of performance. All data can be managed as a single pool of storage (single namenode), so no data movement is needed.
A Spark Analytics engine was used to test the performance from HPE BlueData clusters (Cloudera and Spark) to the Qumulo File System. We tested the Spark clusters connected to Qumulo data via DataTap first, and then we compared the results with Spark clusters directly connected, via NFS mounts, to the same Qumulo environment.
The testing results confirmed that there was no performance degradation when using DataTap compared to using an NFS mount. The HPE BlueData and Qumulo environments were vanilla installations — with no performance or configuration tweaking required.
Here's a demo that Calvin Zito got showing Qumulo in action - but note this is not with HPE BlueData.
Conclusion? A flexible analytics environment
The Lab team’s testing demonstrated that the integration of HPE BlueData and Qumulo can provide a single, flexible environment where multiple clusters are created that could ultimately deploy the same tool to work with the same data—with no cluster sprawl or dealing with multiple storage systems.
Data scientists and analysts can focus on analyzing the test data with their preferred tools, while DataTap allows them to access all data lakes without having to create multiple copies and without a performance penalty.
To find out more about how to Handle massive IoT data with HPE BlueData and Qumulo please check out our References Source list:
Hewlett Packard Enterprise