Parallelize analyses with jBEAM Cluster
One Data Lake with multiple jBEAM instances
Within one data lake, several instances of jBEAM, which are located in a jBEAM Cluster, analyze different measurement data simultaneously. Subsequently, the partial results will be aggregated to one final result.
- An analysis, which is created in one jBEAM, is processed in parallel in a jBEAM Cluster with several jBEAM instances.
- A jBEAM Cluster consists of several PCs, on which one or more jBEAM instances (nodes) are running.
- Each cluster node processes one file, sends the result back to the aggregator and informs the cluster manager about its actual status (e.g. finished and ready for the next file).
- This method corresponds to a MapReduce with files. Splitting the files is not necessary.
- After all files are processed, the aggregator combines all partial results and sends them - as a final result - back to the user (client) via network.
- Conventional file systems (e.g. Windows, NAS, Linux, …) as well as the Hadoop Distributed File System (HDFS) can be used.
Transferring vast amount of data will be unnecessary.
No raw data is transferred to the user PC. Only the end result is send back to the user.
Calculations are brought to the data.
The calculation will be run where the data is actually located.
Integrable into the measurement data management system MaDaM.
For desktop applications as well as for global MDM systems.
Components for Cluster Operation:
- File Importer
A file importer defines, how to import the files (jBEAM supports more than 100 data formats)
- Data Reduction Calculations
Statistical (Min, Max, …), Event-Detection, Histogram, Rainflow, …
Component, which defines how the results should be aggregated (sum, append, …)
- Multi-File-Analysis-Controller (MFAC)
jBEAM side controller to create cluster jobs
- Cluster Service
There are different types of cluster services
- One node
- Local Cluster
jBEAM instances in own Java VM
- External Cluster
Built with several PCs, each with one or N nodes (jBEAMs)
- External High Performance Cluster
Plenty of high performance Linux PCs
- One node