User Tools

Site Tools


products:ict:ai:machine_learning:ml_pipelines

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

products:ict:ai:machine_learning:ml_pipelines [2022/04/16 22:58] – created wikiadminproducts:ict:ai:machine_learning:ml_pipelines [2022/04/26 14:19] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +https://www.datarobot.com/blog/what-a-machine-learning-pipeline-is-and-why-its-important/
 +
 +What a Machine Learning Pipeline is and Why It’s Important
 +
 +https://databricks.com/glossary/what-are-ml-pipelines
 +
 +
 +ML Pipelines
 + 
 +Typically when running machine learning algorithms, it involves a sequence of tasks including pre-processing, feature extraction, model fitting, and validation stages. For example, when classifying text documents might involve text segmentation and cleaning, extracting features, and training a classification model with cross-validation. Though there are many libraries we can use for each stage, connecting the dots is not as easy as it may look, especially with large-scale datasets. Most ML libraries are not designed for distributed computation or they do not provide native support for pipeline creation and tuning. 
 +
 +
 +https://spark.apache.org/docs/latest/ml-pipeline.html
 +
 +
 +
 +ML Pipelines
 +
 +In this section, we introduce the concept of ML Pipelines. ML Pipelines provide a uniform set of high-level APIs built on top of DataFrames that help users create and tune practical machine learning pipelines.
 +