Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
SQL Server 2017 Machine Learning Services with R
SQL Server 2017 Machine Learning Services with R

SQL Server 2017 Machine Learning Services with R: Data exploration, modeling, and advanced analytics

Arrow left icon
Profile Icon Koesmarno Profile Icon Toma≈æ Ka≈°trun Kaštrun
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (8 Ratings)
Paperback Feb 2018 338 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Koesmarno Profile Icon Toma≈æ Ka≈°trun Kaštrun
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (8 Ratings)
Paperback Feb 2018 338 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

SQL Server 2017 Machine Learning Services with R

Introduction to R and SQL Server

SQL Server 2016 came with great new features, and among them was R integration into SQL Server, partly with advanced analytics and partly with new programmability capabilities. Microsoft R Services for SQL Server is part of the family of new extensibilities for highly scalable and parallel advanced analytics. R Services allows you to perform advanced analytics (statistical, multivariate statistics, predictive analytics, machine learning, and deep learning) on large quantities of data stored in the database. Microsoft published R Services as part of Microsoft R Server (MRS), which was specially designed for reading data directly from the SQL Server database within the same SQL Server computational context.

We will cover the following aspects in this chapter:

  • Using R prior to SQL Server 2016
  • Microsoft's commitment on open source R language
  • Boosting analytics with SQL Server R integration
  • Outline of the book

Using R prior to SQL Server 2016

The R language has been in the community since the 90's (even though it was developed a decade before). With its open source GNU license, R gained popularity for its no-fuss installation and ability to evoke any available package for additional statistical learning functions. This was a clear advantage to R as there were not that many statistical programs available on the market in the '80s and '90s; in addition, most of them were not free. The extensibility with emerging new packages for the core R engine gave a broader community and users more and more abilities to use the R language for multiple purposes, in addition to its strong statistical analysis and predictive modeling capabilities.

SQL Server 2005 introduced SQL Server Analysis Services (SSAS) data mining features to be applied against the customer's existing rich data stored in SQL Server and SSAS OLAP cubes. This feature allows users to use Data Mining eXpression (DMX) for creating predictive queries. In the next couple of years, several questions, requests, and ideas emerged on SQL forums, blogs, and community websites regarding additional statistical and predictive methods and approaches.

Back in 2011, I started working on the idea of extending the capabilities of statistical analysis in SQL Server 2008 R2 with the help of open source R language. One reason for that decision was to have flexibility of running statistical analysis (from data provisioning to multivariate analysis) without feeding the data into OLAP cube first, and another reason was more business orientated, with the need to get faster, statistical insights from all the people involved in data preparing, data munging, and data cleaning.

I kicked in and started working on a framework that was based on a combination of T-SQL stored procedure and R package RODBC (https://cran.r-project.org/web/packages/RODBC). The idea was simple; get the transactional or OLAP data, select the columns you want to perform analysis against, and the analysis itself (from simple to predictive analytics, which would stretch beyond SSAS, T-SQL, or CLR capabilities):

Figure 1: Process flow of a framework

The framework was far from simple, and calling the procedure considered calling a mixture of R code, T-SQL select statements, and configurations to your R engine.

The stored procedure with all its parameters looked like this:

EXECUTE AdventureWorks2012.dbo.sp_getStatistics
             @TargetTable = '[vStoreWithAddresses]'
            ,@Variables = 'Name'
            ,@Statistics = '8'
            ,@ServerName = 'WORKSTATION-31'
            ,@DatabaseName = 'AdventureWorks2012'
            ,@WorkingDirectory = 'C:\DataTK'
            ,@RPath = 'C:\Program Files\R\R-3.0.3\bin'; 

The nuts and bolts explanation is outside the scope of this book and is well-documented at: http://www.sqlservercentral.com/articles/R+Language/106760/.

Looking back on this framework and the feedback from the community and people on forums, it was accepted positively and many commented that they needed something similar for their daily business.

The framework in general had, besides pioneering the idea and bringing R engine one step closer to SQL Server, many flaws. The major one was security. Because it needed access to a working local directory for generating R files to be run by the vanilla R engine, it needed xp_cmdshell enabled. The following reconfiguration was mandatory and many sysadmins would not approve of it:

EXECUTE SP_CONFIGURE 'xp_cmdshell', 1;
GO
RECONFIGURE;
GO
    
EXECUTE SP_CONFIGURE 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO  

In addition, the framework needed to have access to R engine installation, together with R packages to execute the desired code. Installing open source programs and providing read/write access was again a drawback in terms of security and corporate software decisions. Nevertheless, one of the bigger issues—later when everything was installed and put into production—was performance and memory issues. R is memory-based, meaning all the computations are done in the memory. So, if your dataset is bigger than the size of the available memory, the only result you will get will be error messages. Another aspect of performance issues was also the speed. With no parallel and distributive computations, the framework was bound to dexterity of an author of the package. For example, if the package was written in C or C++, rather than in Fortran, the framework performed better, respectively.

The great part of this framework was the ability to deliver results from statistical analysis or predictive modeling much faster, because it could take OLTP or any other data that needed statistical analysis. Furthermore, statisticians and data scientists could prepare the R code that was stored in the table, which was later run by data wranglers, data analysts, or data stewards. Therefore, one version of truth is maintained, because there was no need for data movement or data copying and all users were reading the same data source. In terms of predictive modeling, the framework also enabled users to take advantage of various additional predictive algorithms (for example, decision forest, glm, CNN, SVM, and word cloud) that were not part of SSAS Data Mining at that time.

Besides the pros and cons, the framework was a successful initial attempt to get more data insights that were easily distributable among different business units through pushing visualizations in SQL Server Reporting Services. In the years prior to the release of SQL Server 2016, I had met people from the SQL Server community that developed similar frameworks, in order to push predictions to the SQL Server database to support business applications and solutions. With SQL Server 2016, many such similar solutions were internalized and brought closer to the SQL Server engine to achieve better performance and to address many of the issues and cons.

Microsoft's commitment to the open source R language

With a growing popularity and community, R has become and continues to be a big player in the field of advanced analytics and data visualization. R and machine learning servers (or services) are not just buzzword that will be forgotten in the next cycle of SQL Server, but it is infiltrating more and more into different layers of open source and corporate software. In the past five years, many big analytical players have introduced R integration, interpreters, and wrappers for the R language, because of the language's practicality, usability, and inter-disciplinarily and open source orientation. As Microsoft's making a bold and strategic move toward being open source friendly, the use cases for integrating R in SQL Server are growing, making this move even more natural and at the right point in time. This move had been very well appreciated in the SQL community and the business as well.

In comparison to other big analytical tools, Microsoft took integration very seriously. It addressed many of the issues and limitations of the language itself, and created complete integration of R with the SQL Server in order to give the best user experience. Many competitors (such as SAS, IBM, SAP, and Oracle) have done similar integration, but failed to take into account many aspects that contribute to a holistic user experience. Microsoft has announced that joining the R consortium will give them the ability to help the development of the R language and to support future development. In addition, Microsoft has created its own package repository called MRAN (from CRAN, where M stands for Microsoft) and is giving support and SLA agreement for R as well, even though the language and engine is based on Open R (a free, open-sourced version). All these steps tell us how dedicated Microsoft is in bringing an open source, statistical and programming language into the SQL Server environment.

We can only expect more R integration into other services. For example, Power BI supports native R visuals (https://powerbi.microsoft.com/en-us/blog/r-powered-custom-visuals) since October 2016, and R language since December 2015. Therefore, I am a strong believer that R will soon be part of the whole SQL Server ecosystem such as SSAS, SSIS, and SSRS natively as well. With Azure Analysis Services, R is again one step closer to analysis services.

Boosting analytics with SQL Server R integration

Data science is in the forefront of the SQL Server and R integration. Every task performed by DBA, sysadmin, the analyst, wrangler, or any other role that is working with SQL server can have these tasks supported with any kind of statistics, data correlation, data analysis, or data prediction. R integration should not be restricted only to the fields of data science. Instead, it should be explored and used in all tasks. DBA can gain from R integration by using switching from monitoring tasks to understanding and predicting what might or will happen next. Likewise, this idea can be applied to sysadmin, data wranglers, and so on. R integration also brings different roles of people closer to understand statistics, metrics, measures, and learn how to improve them by using statistical analysis and predictions.

Besides bringing siloed individual teamwork into more coherent and cohesive teams, R integration also brings less data movement, because different users can now—with the help of R code—execute, drill down, and feel the data, instead of waiting to have data first prepared, exported, and imported again. With smoother workflows comes faster time to deployment, whether it is a simple report, a predictive model, or analysis. This allows the boundaries of data ownership to shift into insights ownership, which is a positive aspect of faster reactions to business needs.

In the past year, we have also seen much more interest in data science in Microsoft stack. With R integration, Azure Machine Learning, and Power BI, all users who want to learn new skills and virtues have great starting points from the available products.

Summary

Starting with SQL Server 2016, R integration became a very important part of the SQL Server platform. Since the public release of SQL server 2016, until February 2018 (the time of writing this), the community had embraced R as well as Python very well, making data exploration and data analysis part of the general database task. Microsoft addressed many of the issues, and broadened the SQL Server as a product. With SQL Server 2017, Python was added as a secondary analytical language, reaching to an even broader community as well as businesses, and at the same time, taking are of data scalability, performance, and security.

In the next chapter, we will cover different R distributions and IDE tools for using R as a standalone or within the SQL Server, and what the differences among them are when deciding which one to choose.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn how you can combine the power of R and SQL Server 2017 to build efficient, cost-effective data science solutions
  • Leverage the capabilities of R Services to perform advanced analytics—from data exploration to predictive modeling
  • A quick primer with practical examples to help you get up- and- running with SQL Server 2017 Machine Learning Services with R, as part of database solutions with continuous integration / continuous delivery.

Description

R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment. This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services. Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.

Who is this book for?

This book is for data analysts, data scientists, and database administrators with some or no experience in R but who are eager to easily deliver practical data science solutions in their day-to-day work (or future projects) using SQL Server.

What you will learn

  • Get an overview of SQL Server 2017 Machine Learning Services with R
  • Manage SQL Server Machine Learning Services from installation to configuration and maintenance
  • Handle and operationalize R code
  • Explore RevoScaleR R algorithms and create predictive models
  • Deploy, manage, and monitor database solutions with R
  • Extend R with SQL Server 2017 features
  • Explore the power of R for database administrators

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 27, 2018
Length: 338 pages
Edition : 1st
Language : English
ISBN-13 : 9781787283572
Vendor :
Oracle
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Feb 27, 2018
Length: 338 pages
Edition : 1st
Language : English
ISBN-13 : 9781787283572
Vendor :
Oracle
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 115.97
SQL Server 2017 Machine Learning Services with R
€32.99
SQL Server 2017 Developer???s Guide
€45.99
Hands-On Data Science with SQL Server 2017
€36.99
Total 115.97 Stars icon
Banner background image

Table of Contents

11 Chapters
Introduction to R and SQL Server Chevron down icon Chevron up icon
Overview of Microsoft Machine Learning Server and SQL Server Chevron down icon Chevron up icon
Managing Machine Learning Services for SQL Server 2017 and R Chevron down icon Chevron up icon
Data Exploration and Data Visualization Chevron down icon Chevron up icon
RevoScaleR Package Chevron down icon Chevron up icon
Predictive Modeling Chevron down icon Chevron up icon
Operationalizing R Code Chevron down icon Chevron up icon
Deploying, Managing, and Monitoring Database Solutions containing R Code Chevron down icon Chevron up icon
Machine Learning Services with R for DBAs Chevron down icon Chevron up icon
R and SQL Server 2016/2017 Features Extended Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(8 Ratings)
5 star 62.5%
4 star 25%
3 star 0%
2 star 0%
1 star 12.5%
Filter icon Filter
Top Reviews

Filter reviews by




Weiyun Huang Mar 23, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am one of the Microsoft developers that contributed to some features described in this book (I got this book from the author, for free, but that doesn't have an effect on my review). Therefore, before reading this book, I didn’t expect to see much I didn't already know. However, I was pleasantly surprised. This book not only tells a complete story – which starts from basic things such as installation, user creation, permission setting and goes all the way to extended features such as Polybase – but also comes with a lot of practical examples that can be utilized in various scenarios. To me that’s the most valuable part, as I am not a data scientist (I did study R for my work but never solved any real problems with it) and I am happy to see that using our product people can do so many things with their data! The book also introduces functionalities from different Microsoft products which help the data scientists on machine learning/data mining tasks (some of them I only heard of and never used myself, and it is very helpful to see detailed instructions and screen shots). Following this book step by step, one can easily learn how to use all those tools and mine their data. I especially like the organization of the chapters – from basic to advanced, each chapter clearly defines prerequisites, provides instructions and examples and gives precise comments. IMHO this book is a good reference for people trying to create their data science solutions.
Amazon Verified review Amazon
Andrej Apr 03, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a great reference for those who work with Microsoft Machine learning server and SQL Server on a regular basis. It is all you need to know in one place. The writing style is very clear and concise, there are a lot of examples that are easily reproducible if you want to try them on your own.
Amazon Verified review Amazon
mjfagan May 26, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am currently working on the Data Science professional program by Microsoft and am taking the Analyzing Big Data with Microsoft R for one of the course options (note: this book isn't required for the class--I just like having reference books that supplement what I'm learning). I've bought several R programming books but this is the first and so far, only, book I've been able to find about SQL Server Machine Learning Services with R.I've read through 5 chapters already and I wish I had it months ago when I was getting started programming in R because I had issues here and there with my machine set-up and the book covered some of the problems I had.If you're new to R programming and machine learning--I'd recommend reading other books that go more into detail about the specifics of the R language, data preparation, modeling, etc. This book doesn't go into the R programming language enough for someone new to the subject to get a handle of the kinds of objects in R, what you can and cannot do with each, etc. Nor does it go into the differences of the different types of regression, how to interpret data to determine features to include/not include in your model, etc.The book does a great job of going through everything in the machine learning process in a SQL Server environment from setup to deployment. I'm glad I've got it on my desk.
Amazon Verified review Amazon
NGson Mar 29, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am one of the program managers who worked with the release of this feature in SQL Server. (I received a copy this book from the author in exchange for my honest opinion). This book does an excellent job at providing an end-to-end guide to SQL Server Machine Learning Services, covering the feature in the context of the full data science process. It targets the different roles involved in a successful deployment of a Machine Learning project and contains very useful content for each of those roles. The roles I am referring to are Data Scientist, data engineer and database administrator. Another thing that makes this book practically very useful is that the authors provide the code files covered per chapter. This makes this book a great training material for anyone who wants to thoroughly learn how to leverage R and Python in SQL Server.
Amazon Verified review Amazon
Matthias Riekert Oct 08, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have more than 5 years of experience with R. However, when I first tried all of Microsoft R's advertised features, it was very difficult to find useful articles, documentation, and blog posts. With the mentioned sources, I also learned for the Exam 773: Analyzing Big Data with Microsoft R. I passed the exam without the book. After buying the book, however, I have the impression that I could have saved a lot of research time.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.