SAP Cloud


  • HANA Vora Hadoop leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on enterprise and Hadoop data
  • the HANA Vora Developer Edition cluster is running on top of Hortonworks HDP and Apache Spark
  • github: SAP: HANAVora-Extensions Spark extensions for business contexts. SAP has recently contributed a few new features to the Apache Spark ecosystem, available as a GitHub project, including a data hierarchy capability that enables drill-down analysis on Hadoop data, and an extension to Spark’s data source API that improves distributed query efficiency from Spark to SAP HANA.
  • github: saphanaacademy: Vora


SAP Spark

With over 200,000 customers and among the largest portfolios of enterprise applications, SAP’s software serves as the gateway to one of the most valuable treasure troves of enterprise data globally, and SAP HANA is the cornerstone of SAP’s platform strategy underpinning these enterprise applications moving forward. Now, the community of Spark developers and users will have full access as well, enabling a richness of analytical possibilities that has been hard to achieve otherwise.

  • Spark can push down more advanced queries (e.g., complex joins, aggregates, and classification algorithms) – leveraging HANA’s horsepower and reducing expensive shuffles of data
  • TGFs (Table Generating Functions) and Custom UDFs (User Defined functions) provide access to the full breadth of Spark’s capabilities through the Smart Data Access functionality



SAP HANA Cloud Platform

Binary Installer

Hardware Requirements

Download Manager

$ chmod +x ~/Downloads/HXEDownloadManager_linux.bin
$ ./HXEDownloadManager_linux.bin


  • Installing Binary: Download the binary image of SAP HANA, express edition, install the image on your Linux server, and install additional tools for express edition if desired.

VM Image



hxehost login
initial password

Amazon AWS


PDF Reference Guides

Predictive Analysis

Sample Algorithms in the Predictive Analysis Library (PAL)

Association Analysis
  • Apriori
  • Apriori Lite
  • FP-growth
  • KORD - Top K Rule Discovery
  • Multiple linear regression
  • Polynomial regression
  • Exponential regression
  • Bivariate geometric regression
  • Bivariate logarithmic regression
Cluster Analysis
  • ABC Classification
  • K-Means
  • K-Medoid Clustering
  • K-Medians
  • Kohonen self-organizing maps
  • Agglomerate hierarchical clustering
  • Affinity propagation
Classification Analysis
  • CART
  • C4.5 decision tree analysis
  • CHAID decision tree analysis
  • K-nearest neighbor
  • Logistic regression
  • Back-propagation (neural network)
  • Naïve Bayes
  • Support vector machine
Time Series Analysis
  • Single exponential smoothing
  • Double exponential smoothing
  • Triple exponential smoothing
  • Forecast smoothing
  • Autoregressive integrated moving average (ARIMA)
  • Brown exponential smoothing
  • Croston method
  • Forecast accuracy measure
  • Linear regression with damped trend and seasonal adjust
Probability Distribution
  • Distribution fit
  • Cumulative distribution function
  • Quantile function
Outlier Detection
  • Interquartile range test (Tukey’s test)
  • Variance test
  • Anomaly detection
  • Common neighbors
  • Jaccard coefficient
  • Adamic/Adar
  • Katzß
Data Preparation
  • Sampling
  • Random distribution sampling
  • Binning
  • Scaling
  • Partitioning
  • Principal component analysis (PCA)
Statistic Functions (Univariate)
  • Mean, median, variance, standard deviation
  • Kurtosis
  • Skewness
Statistic Functions (Multivariate)
  • Covariance matrix
  • Pearson correlations matrix
  • Chi-squared tests:
    • Test of quality of fit
    • Test of independence
    • F-test (variance equality test)
  • Weighted scores table
  • Substitute missing values


11 November 2016