sql 2017 机器学习_使用R和SQL Server 2017进行机器学习
sql 2017 机器学习

The primitive Business Intelligence (BI) methodology has its primary focus on data sourcing from disparate source systems and data augmentation in a data lake or data warehouse. This respiratory of data acts as the primary source purpose like reporting, data marts, and data mining. All these data analysis forms require the end user to apply analytical thinking for result interpretation.

原始的商业智能(BI)方法主要集中于从不同的源系统进行数据采购以及在数据湖或数据仓库中进行数据扩充。 数据呼吸是报告,数据集市和数据挖掘等主要来源。 所有这些数据分析表格都要求最终用户运用分析思想进行结果解释。

Machine Learning, being an advanced analysis forms where the model learns from the model of data fed and for predicting analysis through derives intelligence. This analysis majorly depends on the model of machine learning to develop the process. It is the combination of data transformation/modeling, model training, model improvisation, and model testing and data analysis.

机器学习是一种高级分析形式,其中,模型从馈入的数据模型中学习,并通过导出情报来预测分析。 这种分析主要取决于机器学习的模型来开发过程。 它是数据转换/建模,模型训练,模型即兴以及模型测试和数据分析的组合。

Professionals often think that their database experience covers exploratory skills of data analysis. The professionals of database professionals are fluent in data analysis which is more of a query logic/ database model assessment. The study of exploratory data that is involved in machine learning systems is nature wise statistical and often named as data science.

专业人士通常认为他们的数据库经验涵盖了数据分析的探索技能。 数据库专业人员的专业技能精通数据分析,这更像是查询逻辑/数据库模型评估。 机器学习系统中涉及的探索性数据的研究是自然明智的统计方法,通常被称为数据科学。

ML has deep roots in statistics that are required to create a solid foundation for data science basics for exploratory data analysis. We can divide statistics into two broad categories- inferential and descriptive and is widely used in the model development of machine learning.

ML在统计方面具有深厚的渊源,这是为探索性数据分析的数据科学基础创建坚实基础所必需的。 我们可以将统计信息分为推论性和描述性两大类,并且广泛用于机器学习的模型开发中。


SQL Server hosted data provides the benefits of a predefined schema and T-SQL constructs. SSIS and other ETL tools provide the benefits of data transformation at a broader scale and faster pace. Assuming data is concisely structured and treated for errors during data quality/ capture, exploratory data analysis can be applied over this data, the fundamental step in machine learning model development. Model training, model development and model training follows this analysis.

SQL Server托管数据提供了预定义架构和T-SQL构造的好处。 SSIS和其他ETL工具以更大的规模和更快的速度提供了数据转换的好处。 假设数据结构简洁,并在数据质量/捕获期间针对错误进行处理,则可以对这些数据进行探索性数据分析,这是机器学习模型开发的基本步骤。 模型训练,模型开发和模型训练遵循此分析。

What is Machine Learning and reason to learn?


When we train a machine to learn from a given dataset, we can use these items for distinct purposes like prediction, classification, and others; we call this concept as Machine Learning. One more point to learn is that a machine not only means a physical device. For easy understanding, it can be perceived as a program or a data model.

当我们训练机器从给定的数据集中学习时,我们可以将这些项目用于不同的目的,例如预测,分类等。 我们将此概念称为机器学习。 还有一点要学习的是,机器不仅意味着物理设备。 为了易于理解,可以将其视为程序或数据模型。

Some key points and definitions related to Machine Learning are mentioned below:


Machine Learning is concerned with automatic concerned programs to improve their performance through expertise.



Machine Learning is one of the types of AI provides the computer devices with the learning ability without any explicit programming.


ML primarily focuses on computer program development that can change with new data exposition.


The process of ML is comparable to data processing. Both systems search through information to appear for patterns. However, rather than extracting information for human comprehension as just in case of knowledge mining, ML uses that information to discover trends in data and alter program actions consequently.

机器学习的过程可与数据处理媲美。 两个系统都搜索信息以显示模式。 但是,ML不会像知识挖掘那样为人类理解而提取信息,而是使用该信息来发现数据趋势并因此改变程序动作。

Some of the applications mentioned below will provide the best answer for the question, why learn ML.


Machine Learning Applications


  • Web Search through page ranking based on user likelihood and clicks

  • Finance to decide target users for new offers of credit card

  • E-commerce to predict the transactions that are fraudulent

  • Space exploration to radio astronomy and space probes

  • Robotics to handle uncertainty in environments like self-driving cars

  • Computational suggestion to application bugs based on cognitive processing

  • ML deals with the predictive/advance analysis that makes it a primary extension for data professionals who are seeking skill enhancement.


Machine Learning Types


The types of ML learning can be found in distinct reference materials. Usually, the process of ML classifies into three categories as Supervised, Unsupervised and Reinforcement Learning.

机器学习的类型可以在不同的参考资料中找到。 通常,机器学习的过程可分为三类:监督学习,无监督学习和强化学习。

Supervised ML: This form of ML learns from unlabeled knowledge and takes actions. For instance, think about a dataset containing attributes of all the homes in a given country or state or town. Also, even if it is, prediction intends to predict the price of a given home based on attributes and not which house the attributes belong.

监督式ML :这种形式的ML从未标记的知识中学习并采取行动。 例如,考虑一个数据集,其中包含给定国家,州或镇中所有房屋的属性。 而且,即使是这样,预测也打算基于属性而不是属性所属的房屋来预测给定房屋的价格。

Unsupervised ML: This form of ML learns style unlabeled data and then takes actions. The best example is “consider a dataset with attributes of all houses in a particular country or state or city.

无监督的ML :这种形式的ML学习样式未标记的数据,然后采取措施。 最好的例子是“考虑具有特定国家,州或城市中所有房屋属性的数据集。

Reinforcement Learning: In this form of ML, the learning is possible based on the rewards according to the depending system upon the actions performed by the model. This is the most advanced machine learning form applies to AI-based systems like robotics, neural networks, and recommendation engines.

强化学习 :以这种形式的ML,可以根据依赖于模型执行的动作的系统获得的奖励进行学习。 这是适用于基于AI的系统(如机器人技术,神经网络和推荐引擎)的最先进的机器学习形式。

Machine Learning Support in Microsoft Technology Stack

Microsoft Technology Stack中的机器学习支持

ML Support in Microsoft Technology Stack

Microsoft Technology Stack中的ML支持

Microsoft acquired R in 2016 enabling a vision of Microsoft data platforms on-premises, hybrid environments and on Microsoft Azure. Microsoft post-acquisition integrated R with SQL Server, Azure, PowerBI, and Cortana Analytics. Additionally, Revolution R open has been renamed to Microsoft R Open and Revolution R Enterprise to SQL Server R Services and Microsoft R Server.

微软在2016年收购了R,从而实现了在本地,混合环境以及Microsoft Azure上对Microsoft数据平台的愿景。 微软收购后将R与SQL Server,Azure,PowerBI和Cortana Analytics集成在一起。 此外,Revolution R open已重命名为Microsoft R Open,Revolution R Enterprise重命名为SQL Server R Services和Microsoft R Server。

R Services from SQL Server/ SQL Server ML Services installs an open source R distribution as well as packages provided by Microsoft that support distributed and parallel processing. This architecture is specially designed to enable external scripts using R run in a separate process from SQL Server. R services integrate the R language with SQL Server and help to perform analytics close to the data and eliminate the security risks and costs that are associated with data movement.

SQL Server / SQL Server ML Services的R Services安装了一个开源R发行版以及Microsoft提供的支持分布式和并行处理的软件包。 该体系结构经过专门设计,以允许使用R在与SQL Server分开的进程中运行的外部脚本。 R服务将R语言与SQL Server集成在一起,有助于执行接近数据的分析并消除与数据移动相关的安全风险和成本。

The methodology of traditional data analytics relies on transforming and transporting the data from OLTP databases> Data Warehouses> Data Marts using Power shell administration, SSAS for in-memory analytics and multi-dimensional, and reporting SSRS. Manipulation of data using set-based operations and numerical algebra has been the perfect solution with T-SQL on data stored in OLTP databases. Using T-SQL and R extends the data science power, machine learning, and statistical computing and other advanced predictive analysis capabilities to OLTP systems.

传统数据分析的方法依赖于使用Power Shell管理,用于内存中分析和多维的SSAS以及报告SSRS来转换和传输OLTP数据库>数据仓库>数据市场中的数据。 使用基于集合的操作和数值代数来处理数据已成为T-SQL对OLTP数据库中存储的数据的完美解决方案。 使用T-SQL和R将数据科学能力,机器学习和统计计算以及其他高级预测分析功能扩展到OLTP系统。

In this tutorial, we'll be acting active exercises exploitation R and T-SQL for exploratory information analysis and machine learning. It's assumed that you just have already put in SQL Server 2017, Machine Learning Services still as R. just in case you've got not, you'll learn the way to that here.

在本教程中,我们将进行主动练习,利用R和T-SQL进行探索性信息分析和机器学习。 假定您刚刚已经将SQL Server 2017,Machine Learning Services仍然保留为R。以防万一,请在这里学习实现的方法。

How Statistics are used in Machine Learning


ML has deep roots in Statistics and Mathematics. Here are some distinct phases of an ML model development with their order.

ML在统计学和数学领域具有深厚的渊源。 这是ML模型开发的一些不同阶段及其顺序。

  • Data Exploration-Structural data analysis including probability, central tendency, variance, etc.

  • Model Testing

  • Data Standardization like Normalization, Feature extraction, Noise filtering, etc

  • Model Improvisation

  • Model Development and Training


In the process of ML model development, the initial step is data exploration. Here the investigation does not mean data querying form distinct sources using complex functions, queries or joins.

在ML模型开发过程中,第一步是数据探索。 这里的调查并不意味着数据查询使用复杂的函数,查询或联接从不同的源进行查询。

The exploration intent is assessing the data balance from a standard point to develop a model of ML. If the data is not balanced correctly, it requires both transformations as well as standardization.

探索意图是从一个标准点评估数据平衡,以开发ML模型。 如果数据平衡不正确,则既需要转换又需要标准化。

Upon identifying the attributes of inputs, an ML model is developed and trained with a significant data portion. The remaining data tests the accuracy of the model’s prediction. Improvising the prediction accuracy of any model is an iterative process until it reaches a level of satisfactory convenience.

在确定输入的属性后,将开发一个ML模型并使用重要的数据部分进行训练。 其余数据测试模型预测的准确性。 提高任何模型的预测准确性都是一个反复的过程,直到达到令人满意的便利水平为止。

Branches of Statistics


Generally, statistics are categorized into two branches at the best level as Descriptive and Inferential.


Firstly, let’s understand about descriptive statistics that explains organization’s data and summarizes it with a representative sample. Its significant parts include Central Tendency Measures, Variability Measures, and Correlation. Quantitative analysis designs this particular branch.

首先,让我们了解描述性统计数据,该统计数据解释了组织的数据并用代表性样本对其进行了总结。 它的重要部分包括集中趋势度量,可变性度量和相关性。 定量分析设计了这个特定的分支。

Coming to the inferential statistics, it interprets and determines data as well as statistical significance thus concludes an unknown broader dataset from a sample one. Its foundation lies in the theory of Hypothesis Testing and Central Limit Theorem.

来到推论统计,它解释和确定数据以及统计显着性,从而从一个样本中推断出一个未知的更广泛的数据集。 它的基础在于假设检验和中心极限定理的理论。

According to inferential statistics, the algorithms number deals with a particular predictive analysis types problems. ML models use these algorithms that mean it requires a detailed understanding of the algorithm before applying.

根据推论统计,算法编号处理特定的预测分析类型问题。 ML模型使用这些算法,这意味着在应用之前需要对算法进行详细的了解。

Studying Statistics of ML


Any ML algorithms explanation starts with statistics. These statistics are usually at a higher level as describing it from the lowest level requiring a separate book itself for each algorithm but do not have the appropriate statistical background to learn these concepts.

任何ML算法的解释都从统计开始。 这些统计信息通常处于较高的层次,从最低层次描述它时,每种算法本身都需要单独编写一本书,但是没有适当的统计背景来学习这些概念。

Without proper statistics foundation, any tutorial on ML would look like a mathematics class. Therefore, the question is learning statistics without touching the breakdown point where you give-up ML or lose interest due to learning struggle more and more about statistics.

没有适当的统计基础,任何有关ML的教程都将看起来像一门数学课。 因此,问题是学习统计数据时不要触及到由于学习统计方面的越来越多而放弃ML或失去兴趣的崩溃点。

The learning approach is distinct for distinct persons based on their likes and dislikes. One of the following ways is a top-down approach to identify the best starting point. It is recommended to consider any of the statistics topics.

对于不同的人,根据他们的好恶,学习方法是不同的。 以下方法之一是自上而下的方法,用于确定最佳起点。 建议考虑任何统计主题。

  • It may be difficult to understand the characteristics of Normal Distribution if you are unaware of standard deviation.

  • It may be difficult to understand the standard deviation, its calculation, and the significance if you do not know variance.

  • To understand Variance, you need to know Mean and the formula to calculate Variance.

  • The low factor is independent of any other statistical derivation and is a part of elementary mathematics.


So, in this way you can deduce the point where you have the appropriate background to understand the most fundamental topics and slowly build-up until you reach the statistical terms that are used in ML algorithms.


Some inferences are faster and easier to make with the help of graphical analysis instead of looking at distinct numbers. There are different varieties of statistical visualizations based on the analysis types and variable categories. Some among them are quite fundamental and are almost used in every kind of analysis as a beginning point. The most commonly used visualizations for graphical exploratory study are :

借助图形分析而不是查看不同的数字,可以更快,更轻松地进行某些推断。 根据分析类型和变量类别,统计可视化的种类繁多。 其中一些是非常基础的,几乎被用作各种分析的起点。 用于图形探索性研究的最常用的可视化对象是:

  • Density Plot

  • Histogram

  • Box Plot

  • Scatterplot




Now let’s assume that you are entirely new to the ML discipline, we started this discussing some basic terms, concepts and ML theory. We have a glance at the components of SQL Server 2017 which supports deep roots in statistics and mathematics. We came across some basic statistics terms, fundamentals and ML learning statistics.

现在,假设您是ML领域的新手,我们开始讨论一些基本术语,概念和ML理论。 我们对SQL Server 2017的组件一目了然,该组件支持统计和数学的深入研究。 我们遇到了一些基本统计术语,基础知识和机器学习学习统计。

Having a strong statistics foundation, theoretical ML knowledge learning and implementation of R knowledge, we came across that how about the data spread and about the shape of learning distinct statistics that are extracted using T-SQL and R. We have also learned about how to do this graphically by using different statistical visualizations.



sql 2017 机器学习


