Intel(R) Data Analytics Acceleration Library

Boost machine learning and data analytics performance with this easy-to-use library

License	License Intel Simplified Software License
GroupId	GroupId com.intel.daal
ArtifactId	ArtifactId daal-parent
Last Version	Last Version 2020.3.013
Release Date	Release Date Oct 26, 2020
Type	Type pom
Description	Description Intel(R) Data Analytics Acceleration Library Boost machine learning and data analytics performance with this easy-to-use library
Project URL	Project URL https://software.intel.com/en-us/intel-daal
Source Code Management	Source Code Management https://github.com/intel/daal

Download daal-parent

Filename	Size
daal-parent-2020.3.013.pom	2 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.intel.daal/daal-parent/ -->
<dependency>
    <groupId>com.intel.daal</groupId>
    <artifactId>daal-parent</artifactId>
    <version>2020.3.013</version>
    <type>pom</type>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.intel.daal/daal-parent/
implementation 'com.intel.daal:daal-parent:2020.3.013'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.intel.daal/daal-parent/
implementation ("com.intel.daal:daal-parent:2020.3.013")

Apache Buildr

'com.intel.daal:daal-parent:pom:2020.3.013'

Apache Ivy

<dependency org="com.intel.daal" name="daal-parent" rev="2020.3.013">
  <artifact name="daal-parent" type="pom" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.intel.daal', module='daal-parent', version='2020.3.013')
)

Scala SBT

libraryDependencies += "com.intel.daal" % "daal-parent" % "2020.3.013"

Leiningen

[com.intel.daal/daal-parent "2020.3.013"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

There are no modules declared in this project.

Intel® oneAPI Data Analytics Library

Intel® oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python in Scikit-learn optimization.

Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).

Build your high-performance data science application with oneDAL

oneDAL uses all capabilities of Intel® hardware, which allows you to get a significant performance boost for the classic machine learning algorithms.

We provide highly optimized algorithmic building blocks for all stages of data analytics: preprocessing, transformation, analysis, modeling, validation, and decision making.

The current version of oneDAL provides Data Parallel C++ (DPC++) API extensions to the traditional C++ interface.

The size of the data is growing exponentially, as is the need for high-performance and scalable frameworks to analyze all this data and extract some benefits from it. Besides superior performance on a single node, the oneDAL distributed computation mode also provides excellent strong and weak scaling (check charts below).

oneDAL K-means fit, strong scaling result	oneDAL K-means fit, weak scaling results

technical details: FPType: float32; HW: Intel Xeon Processor E5-2698 v3 @2.3GHz, 2 sockets, 16 cores per socket; SW: Intel® DAAL (2019.3), MPI4Py (3.0.0), Intel® Distribution Of Python (IDP) 3.6.8; Details available in the article https://arxiv.org/abs/1909.11822

Refer to our examples and documentation for more information about our API.

Python API

oneDAL has a python API that is provided as a standalone python library called daal4py. Below is an example of how daal4py can be used for calculation KMeans clusters

import numpy as np
import pandas as pd
import daal4py as d4p

data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)

init_alg = d4p.kmeans_init(nClusters = 10,
                           fptype = "float",
                           method = "randomDense")

centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
                 accuracyThreshold = 0, assignFlag = False)
result = alg.compute(data, centroids)

Scikit-learn patching

Python interface to efficient Intel® oneDAL provided by daal4py allows one to create scikit-learn compatible estimators, transformers, clusterers, etc. powered by oneDAL which are nearly as efficient as native programs.

Speedups of oneDAL powered Scikit-learn over the original Scikit-learn, 28 cores, 1 thread/core

technical details: FPType: float32; HW: Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.22.2, Intel® DAAL (2019.5), Intel® Distribution Of Python (IDP) 3.7.4; Details available in the article https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912

daal4py have an API which matches API from scikit-learn. This framework allows you to speed up your existing projects by changing one line of code

from daal4py.sklearn.svm import SVC
from sklearn.datasets import load_digits

digits = load_digits()
X, y = digits.data, digits.target

svm = SVC(kernel='rbf', gamma='scale', C = 0.5).fit(X, y)
print(svm.score(X, y))

In addition daal4py provides an option to replace some scikit-learn methods by oneDAL solvers which makes it possible to get a performance gain without any code changes. This approach is the basis of Intel distribution for python scikit-learn. You can patch stock scikit-learn by using the only following commandline flag

python -m daal4py my_application.py

Patches can also be enabled programmatically:

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from time import time

svm_sklearn = SVC(kernel="rbf", gamma="scale", C=0.5)

digits = load_digits()
X, y = digits.data, digits.target

start = time()
svm_sklearn = svm_sklearn.fit(X, y)
end = time()
print(end - start) # output: 0.141261...
print(svm_sklearn.score(X, y)) # output: 0.9905397885364496

from daal4py.sklearn import patch_sklearn
patch_sklearn() # <-- apply patch
from sklearn.svm import SVC

svm_d4p = SVC(kernel="rbf", gamma="scale", C=0.5)

start = time()
svm_d4p = svm_d4p.fit(X, y)
end = time()
print(end - start) # output: 0.032536...
print(svm_d4p.score(X, y)) # output: 0.9905397885364496

Distributed multi-node mode

Data scientists often require different tools for analysis of regular and big data. daal4py offers various processing models, which makes it easy to enable distributed multi-node mode.

import numpy as np
import pandas as pd
import daal4py as d4p

d4p.daalinit() # <-- Initialize SPMD mode
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)

init_alg = d4p.kmeans_init(nClusters = 10,
                           fptype = "float",
                           method = "randomDense",
                           distributed = True) # <-- change model to distributed

centroids = init_alg.compute(data).centroids

alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
                 accuracyThreshold = 0, assignFlag = False,
                 distributed = True)  # <-- change model to distributed

result = alg.compute(data, centroids)

For more details browse daal4py documentation.

oneDAL Apache Spark MLlib samples

oneDAL provides scala / java interfaces that match Apache Spark MlLib API and use oneDAL solvers under the hood. This implementation allows you to get a 3-18X increase in performance compared to default Apache Spark MLlib.

technical details: FPType: double; HW: 7 x m5.2xlarge AWS instances; SW: Intel DAAL 2020 Gold, Apache Spark 2.4.4, emr-5.27.0; Spark config num executors 12, executor cores 8, executor memory 19GB, task cpus 8

Check samples tab for more details.

Installation

You can install oneDAL:

from oneDAL home page as a part of Intel® oneAPI Base Toolkit.
from GitHub*.

Installation from Source

See Installation from Sources for details.

Examples

Except C++ and Python API oneDAL also provide API for C++ SYCL and Java languages. Check out tabs below for more examples.

Documentation

Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at [email protected].

Security

To report a vulnerability, refer to Intel vulnerability reporting policy.

Samples

Samples is an examples of how oneDAL can be used in different applications.

Technical Preview Features

Technical preview features are introduced to gain early feedback from developers. A technical preview feature is subject to change in the future releases. Using a technical preview feature in a production code base is therefore strongly discouraged.

In C++ APIs, technical preview features are located in daal::preview and oneapi::dal::preview namespaces. In Java APIs, technical preview features are located in packages that have the com.intel.daal.preview name prefix.

The preview features list:

Graph Analytics:
- Undirected graph without edge and vertex weights (undirected_adjacency_array_graph), where vertex indices can only be of type int32
- Jaccard Similarity Coefficients for all pairs of vertices, a batch algorithm that processes the graph by blocks

oneDAL and Intel® DAAL

Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).

This repository contains branches corresponding to both oneAPI and classical versions of the library. We encourage you to use oneDAL located under the master branch.

Product	Latest release	Branch	Resources
oneDAL	2021.1	master rls/2021-gold-mnt	Home page Documentation System Requirements
Intel® DAAL	2020 Update 3	rls/daal-2020-u3-rls	Home page Developer Guide System Requirements

Contribute

See CONTRIBUTING for more information.

License

Distributed under the Apache License 2.0 license. See LICENSE for more information.

Intel Corporation

Versions

Version
2020.3.013 Oct 26, 2020
2020.1.009 Mar 30, 2020
2020.0.004 Dec 12, 2019
2019.5.205 Nov 27, 2019
2019.4.202 Nov 27, 2019
2019.3.199 Mar 11, 2019
2019.1.001 Nov 6, 2018
2019.0.001 Sep 17, 2018

Intel(R) Data Analytics Acceleration Library

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download daal-parent

How to add to project

Dependencies

Project Modules

Intel® oneAPI Data Analytics Library

Table of Contents

Build your high-performance data science application with oneDAL

Python API

Scikit-learn patching

Distributed multi-node mode

oneDAL Apache Spark MLlib samples

Installation

Installation from Source

Examples

Documentation

Support

Security

Samples

Technical Preview Features

oneDAL and Intel® DAAL

Contribute

License

Intel Corporation

Versions