Machine Learning Projects for Mobile Applications

Mobile Landscapes in Machine Learning

Computers are improving by the day, and device form factors are changing tremendously. In the past, we would only see computers at offices, but now we see them on our home desks, on our laps, in our pockets, and on our wrists. The market is becoming increasingly varied as machines are being equipped with more and more intelligence.

Almost every adult currently carries a device around with them, and it is estimated that we look at our smartphones at least 50 times a day, whether there is a need to or not. These machines affect our daily decision-making processes. Devices are now equipped with applications such as Siri, Google Assistant, Alexa, or Cortana, features that are designed to mimic human intelligence. The ability to answer any query thrown at them presents these types of technology as master humans. On the backend, these systems improve using the collective intelligence acquired from all users. The more you interact with virtual assistants, the better are the results they give out.

Despite these advancements, how much closer are we to creating a human brain through a machine? We are in 2018 now. If science discovers a way to control the neurons of our brain, this may be possible in the near future. Machines that mimic the capabilities of a human are helping to solve complex textual, visual, and audio problems. They resemble the tasks carried out by a human brain on a daily basis—on average, the human brain makes approximately 35,000 decisions in a day.

While we will be able to mimic the human brain in the future, it will come at a cost. We don't have a cheaper solution for it at the moment. The magnitude of power consumption of a human brain simulation program limits it in comparison to a human brain. The human brain consumes about 20 W of power, while a simulation program consumes about 1 MW of power or more. Neurons in the human brain operate at a speed of 200 Hz, while a typical microprocessor operates at a speed of 2 GHz, which is 10 million times more than that.

While we are still far from cloning a human brain, we can implement an algorithm that makes conscious decisions based on previous data as well as data from similar devices. This is where the subset of Artificial Intelligence (AI) comes in handy. With predefined algorithms that identify patterns from the complex data we have, these types of intelligence can then give us useful information.

When the computer starts making decisions without being instructed explicitly every time, we achieve machine learning (ML) capability. ML is used everywhere right now, including through features such as identifying email spam, recommending the best product to buy on an e-commerce website, tagging your face automatically on a social media photograph, and so on. All of these are done using the patterns identified in historical data, and also through algorithms that reduce unnecessary noise from the data and produce quality output. When the data accumulates more and more, the computers can make better decisions.

Since we have wider access to mobile devices and the amount of time we spend on those devices is rapidly increasing, it makes sense to run ML models on the mobile phone itself. In the mobile phone market, Android and iOS platforms take the lead to cover the whole smartphone spectrum. We will explore how TensorFlow Lite and Core ML works on these mobile platforms.

The topics that will be covered in this chapter are as follows:

ML basics (with an example)
TensorFlow and Core ML basics

Machine learning basics

ML is a concept that describes the process of a set of generic algorithms analyzing your data, and providing you with interesting data without writing any specific codes for your problem.

Alternatively, we can look at ML as a black box how cutting edge scientists are using it to do something crazy like detecting epilepsy or cancer disease, yet your simple email inbox is using it to filter spam every day.

On a larger level, ML can be classified into the following two categories:

Supervised learning
Unsupervised learning

Supervised learning

With supervised learning, your main task is to develop a function that maps inputs to outputs. For example, if you have input variables (x) and an output variable (y), then you can use an algorithm to learn the mapping function from the input to the output:

y = f(x)

The goal is to approximate the mapping function so well that when you have new input data (x), you can predict the output variables (y) for it.

For example, you have a bunch of fruits and baskets. You have started labeling the fruits and baskets as apple, banana, strawberry, etc., When you are done with labeling all the fruits into their corresponding baskets, now your job is to label the new fruit that comes in. You have already learnt all the fruits and their details by labeling them. Based on the previous experience you can now label the new fruit based on its attributes like color, size and pattern.

Unsupervised learning

In this case, you only have input data (x) and no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about it.

In unsupervised learning, you may not have any data in the beginning. Say for example on the same scenario discussed above in supervised learning, you have a basket full of fruits and you are asked to group them into similar groups. But you don't have any previous data or there are no training or labeling is done earlier. In that case, you need to understand the domain first because you have no idea whether the input is a fruit or not. In that case, you need to first understand all the characteristics of every input and then to try to match with every new input. May be at the final step you might have classified all the red color fruits into one baskets and the green color fruits into another basket. But not an accurate classification. This is called as unsupervised learning.

Linear regression - supervised learning

Let's look at one simple example of linear regression with its implementation on TensorFlow.

Let's predict the price of a house, looking at the prices of other houses in the same area along with its size information:

We have a house that costs $82,000 and another one at $55,000. Now, our task is to find the price of the third house. We know the sizes of all of the houses along with the prices, and we can map this data in a graph. Let's find out the price of the third house based on the other two data points that we have:

You may wonder how to draw the line now. Draw a random line close to all the points marked on the graph. Now, calculate the distance from each point to the line and add them together. The result of this is the error value. Our algorithm should move toward minimizing the error, because the best fit line has a lower error value. This procedure is known as a gradient descent.

The prices of all the houses in the particular locality are mapped on the graph accordingly. Now, let's plot the values of the two houses that we already know:

After that, let's draw one line that passes closer to most of the values. The line fits the data perfectly. From this, we should be able to identify the price of house number three:

Based on the size of the data provided for home number three, we can map the size of the data provided for the home number on the graph. This will allow us to figure out the connecting point on the line drawn through all points. That maps to $98,300 on the y axis. This is known as linear regression:

Let's try to translate our problem into the form of a pseudocode:

def estimate_house_price(sqft, location): 
 price = 0 
 #In my area, the average house costs 2000 per sq.ft 
 price_per_sqft = 2000 
 if location == "vegas": 
     #but some areas cost a bit more 
     price_per_sqft = 5000 
 elif location == "newyork": 
     #and some areas cost less 
     price_per_sqft = 4000 
 #start with a base price estimate based on how big the place is 
 price = price_per_sqft * sqft 
 return price

This will be your typical method for estimating the house price. We can keep adding more and more condition checks to this, but it can't be controlled beyond the point at which the number of locations or the parameter increases. For a typical house price estimation, there are a lot of other factors also considered, such as the number of rooms, locations in the vicinity, schools, gas stations, hospitals, water level, transport, and so on. We can generalize this function into something very simple and guess the right answer:

def estimate_house_price(sqft, location): 
 price = < DO MAGIC HERE >
 return price

How can we identify a line that fits perfectly without writing condition checks? Typically, the linear regression line is represented in the following form:

Y = XW +b

In our example, let's put this into a more simple form in order to gain a better understanding:

prediction = X * Weight +bias

Weight is the slope of the line and bias is the intercept (the value of Y when X = 0). After constructing the linear model, we need to identify the gradient descent. The cost function identifies the mean squared error to bring out the gradient descent:

Let's represent the cost function through pseudocode to solve our problem of estimating the house price:

def estimate_house_price(sqft, location):
 price = 0
 #and this
 price += sqft * 235.43
 #maybe this too
 price += location * 643.34
 #adding a little bit of salt for a perfect result 
 price += 191.23
 return price

The values 235.43, 643.34, and 191.23 may look random, but with these values we are able to discover the estimation of any new house. How do we arrive at this value? We should do an iteration to arrive at the right value while reducing our error in the right direction:

def estimate_house_price(sqft, location):
 price = 0
 #and this
 price += sqft * 1.0
 #maybe this too
 price += location * 1.0
 #adding a little bit of salt for a perfect result
 price += 1.0
 return price

So, we will start our iterations from 1.0 and start minimizing errors in the right direction. Let's put this into code using TensorFlow. We will look at the methods that are used in more depth later on:

#import all the necessary libraries
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy

#Random number generator
randnumgen = numpy.random

#The values that we have plotted on the graph
values_X = 
  numpy.asarray([1,2,3,4,5.5,6.75,7.2,8,3.5,4.65,5,1.5,4.32,1.65,6.08])
values_Y = 
 numpy.asarray([50,60,65,78,89,104,111,122,71,85,79,56,81.8,55.5,98.3])

# Parameters
learning_rate = 0.01
training_steps = 1000
iterations = values_X.shape[0]

# tf float points - graph inputs
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set the weight and bias
W = tf.Variable(randnumgen.randn(), name="weight")
b = tf.Variable(randnumgen.randn(), name="bias")

# Linear model construction
# y = xw + b
prediction = tf.add(tf.multiply(X, W), b)

#The cost method helps to minimize error for gradient descent. 
#This is called mean squared error.
cost = tf.reduce_sum(tf.pow(prediction-Y, 2))/(2*iterations)

# In TensorFlow, minimize() method knows how to optimize the values for # weight & bias. 
optimizer = 
   tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

#assigning default values 
init = tf.global_variables_initializer()

#We can start the training now
with tf.Session() as sess:

    # Run the initializer. We will see more in detail with later       
    #chapters
     sess.run(init)

    # Fit all training data
     for step in range(training_steps):
         for (x, y) in zip(values_X, values_Y):
             sess.run(optimizer, feed_dict={X: x, Y: y})
             c = sess.run(cost, feed_dict={X: values_X, Y:values_Y})
             print("Step:", '%04d' % (step+1), "cost=", "  
                             {:.4f}".format(c), \
                             "W=", sess.run(W), "b=", sess.run(b))

     print("Successfully completed!")
     # with this we can identify the values of Weight & bias
     training_cost = sess.run(cost, feed_dict={X: values_X, Y: 
                                               values_Y})
     print("Training cost=", training_cost, "Weight=", sess.run(W), 
           "bias=", sess.run(b))

     # Lets plot all the values on the graph
     plt.plot(values_X, values_Y, 'ro', label='house price points')
     plt.plot(values_X, sess.run(W) * values_X + sess.run(b), 
                                       label='Line Fitting')
     plt.legend()
     plt.show()

You can find the same code from our GitHub repository (https://github.com/PacktPublishing/Machine-Learning-Projects-for-Mobile-Applications) under Chapter01.

TensorFlow Lite and Core ML

A good starting point for this book would be to get our hands dirty playing with the ML model dataset and training the model. It will be useful to jump in quickly in further chapters. We are not going to deal with basic ML algorithms here; instead, this will be more of a practical-based approach. You can download the complete code base from our GitHub repository (https://github.com/intrepidkarthi/MLmobileapps).

Throughout this book, we will deal with two frameworks: TensorFlow Lite and Core ML. These two frameworks are tightly coupled with Android and iOS. We will look into the basics of ML on a mobile device with TensorFlow Lite. It is assumed that the reader knows the basics of TensorFlow and basic ML algorithms, because this book is not going to cover those elements.

As previously stated, every one of us is holding a smartphone in our pocket almost all of the time. We have a rich amount of data that comes from the sensors available on these devices. As well as this, we have data that is coming from edge devices. At the time of writing this book, there are close to 23 billion devices under this category, including smart speakers, smart watches, and smart sensors. High-end technologies that used to only be available on costlier devices are now available on cheaper devices as well. This exponential rate of growth for these devices paves the way to ML on these devices.

While there are many reasons to run ML on the devices, the foremost reason is that of latency. If you are processing video or audio, you don't want to keep pinging the server with data to and fro. Another advantage is that you can do the processing when the device is in offline mode. Importantly, the data stays on the device itself, local to the user. This is more energy-efficient in terms of battery/power consumption.

While this looks like an advantage, there are also a few cons to this approach. Most of our devices are running on batteries with limited capacity, less processing capability, and strict memory constraints. The TensorFlow framework won't resolve all these issues, which is why it has shifted to have a framework that works efficiently under all these conditions. TensorFlow Lite is a lightweight, energy-and memory-efficient framework that will run on embedded smaller-form factor devices.

TensorFlow Lite

The TensorFlow Lite framework consists of five high-level components. All of these components are optimized to run on a mobile platform as shown below in the architecture diagram:

Here are the core units of the TensorFlow Lite architecture:

The first part is to convert your existing model into a TensorFlow Lite-compatible model (.tflite) using the TensorFlow Lite Converter, and have your trained model on the disk itself. You can also use the pre-trained model in your mobile or embedded applications.
Java/C++ API—the API loads the .tflite model and invokes the interpreter. It is available on all platforms. Java API is a wrapper written on top of C++ API, and it is available only on Android.
Interpreter and kernels—the interpreter module operates with the help of operation kernels. It loads kernels selectively; the size of the core interpreter is 75 KB. This is a significant reduction on TensorFlow Lite from the 1.1 MB required by TensorFlow Mobile. With all the supported ops, its core interpreter size comes to 400 KB. Developers can selectively choose which ops they want to include. In that way, they can keep the footprint small.

H/W accelerated delegates—on select Android devices, the interpreter will use the Android Neural Networks API (NNAPI) for hardware acceleration, or default to CPU execution if none are available.

You can also implement custom kernels using the C++ API that can be used by the interpreter.

Supported platforms

TensorFlow Lite currently supports Android/iOS platforms as well as Linux (for example Raspberry Pi) platforms. On embedded devices such as Raspberry Pi, Python API helps. TensorFlow Lite platforms also support Core ML models as well as iOS platforms.

On iOS platforms, from the pre-trained TensorFlow model, we can directly convert the format into the Core ML model where the app will directly run on the Core ML runtime:

With a single model, we can run the model on both Android/iOS platforms by converting the formats.

TensorFlow Lite memory usage and performance

TensorFlow uses FlatBuffers for the model. FlatBuffers is a cross-platform, open source serialization library. The main advantage of using FlatBuffers is that it does not need a secondary representation before accessing the data through packing/unpacking. It is often coupled with per-object memory allocation. FlatBuffers is more memory-efficient than Protocol Buffers because it helps us to keep the memory footprint small.

FlatBuffers was originally developed for gaming platforms. It is also used in other platforms since it is performance-sensitive. At the time of conversion, TensorFlow Lite pre-fuses the activations and biases, allowing TensorFlow Lite to execute faster. The interpreter uses static memory and execution plans that allow it to load faster. The optimized operation kernels run faster on the NEON and ARM platforms.

TensorFlow takes advantage of all innovations that happen on a silicon level on these devices. TensorFlow Lite supports the Android NNAPI. At the time of writing, a few of the Oracle Enterprise Managers (OEMs) have started using the NNAPI. TensorFlow Lite uses direct graphics acceleration, which uses Open Graphics Library (OpenGL) on Android and Metal on iOS.

To improve performance, there have been changes to quantization. This is a technique to store numbers and perform calculations on them. This helps in two ways. Firstly, as long as the model is smaller, it is better for smaller devices. Secondly, many processors have specialized synthe instruction sets, which process fixed-point operands much faster than they process floating point numbers. So, a very naive way to do quantization would be to simply shrink the weights and activations after you are done training. However, this leads to suboptimal accuracies.

TensorFlow Lite gives three times the performance of TensorFlow on MobileNet and Inception-v3. While TensorFlow Lite only supports inference, it will soon be adapted to also have a training module in it. TensorFlow Lite supports around 50 commonly used operations.

It supports MobileNet, Inception-v3, ResNet50, SqueezeNet, DenseNet, Inception-v4, SmartReply, and others:

The y axis in the graph is measured in milliseconds.

Hands-on with TensorFlow Lite

With TensorFlow Lite, you can use an existing model to quickly start building your first TensorFlow Lite-based application:

Using TensorFlow Lite in real time consists of four steps:

In the first step, we need to either use an existing model or prepare our own model and train it.
Once the model is ready, it needs to be converted into .tflite format using converters.
Then, we can write ops on top of it for any kind of optimization.
You can start writing your hello world project.

Let's jump straight into the code from here.

Converting SavedModel into TensorFlow Lite format

Converting your ML model into a TensorFlow Lite model can be done in just one line of code by calling the conversion method. Here is the simple Python snippet that converts your existing model into TensorFlow Lite format. You can feed in the existing model and convert that into .tflite format:

import sys
from tf.contrib.lite import convert_savedmodel
convert_savedmodel.convert(
                            saved_model_directory="/tmp/your_model",
                            output_tflite_file="/tmp/my_model.tflite")

The code here converts the existing model created in other frameworks into TensorFlow Lite format using FlatBuffers. There are a few conversion strategies that need to be followed.

Strategies

We implement the following strategies:

Use a frozen graphdef (or SavedModel)
Avoid unsupported operators

Use visualizers to understand the model (TensorBoard and TensorFlow Lite visualizer)
Write custom operators for any missing functionality
If anything is missed out, file an issue with the community

We will see these strategies in detail when we go further into practical applications in future chapters.

TensorFlow Lite on Android

We can start using the demo app provided in the TensorFlow GitHub repository. This is a camera application that classifies images continuously using either a floating point Inception-v3 model or a quantized MobileNet model. Try this using Android Version 5.0 or preceding.

The demo app can be found at: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo/app.

This application performs real-time classification of frames. It displays the top most-probable classification categories. It also displays the time taken to detect the object.

There are three ways to get the demo app on your device:

You can download the APK binary, which is pre-built
You can build on Android Studio and run the application
You can use Bazel to download the source code of TensorFlow Lite, and run the app through the command line

Downloading the APK binary

This is the easiest way to try the application.

Once you install the app, start the application. When you open the app for the first time, it will prompt you to access the device camera using runtime permissions. Once the permissions are enabled, you can use the app to recognize objects in the real-time back camera view. In the results, you can see the top three classifications for the identified object, along with the latency.

TensorFlow Lite on Android Studio

You can download and build TensorFlow Lite directly from Android Studio by following these steps:

Download and install the latest version of Android Studio.
In your studio settings, make sure that the NDK version is greater than 14 and the SDK version is greater than 26. We are using 27 in this book and on further applications. We will look in detail at how to configure this in further projects.
You can download the application from the link in the following information box.
As Android Studio instructs, you need to install all the Gradle dependencies.

The demo app can be found at: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/java/demo/app/src/main/java/com/example/android/tflitecamerademo.

We need a model in order to use it in the application. We can either use an existing model or train our own model. Let's use an existing model in this application.

You can download models at the link given next, in the information box. You can also download the zipped model file from the link given:

You can download an Inception-v3 floating point model or the latest MobileNet model. Copy the appropriate .tflite to the Android app's assets directory. You can then change the classifier in the Camera2BasicFragment.java file, tensorflow/contrib/lite/java/demo/app/src/main/assets/.

The models can be downloaded from: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md.

Now, you can build and run the demo app.

Building the TensorFlow Lite demo app from the source

As a first step, clone the TensorFlow repo. You need Bazel to build the APK:

git clone https://github.com/tensorflow/tensorflow

Installing Bazel

If Bazel is not installed on your system, you need to install it. This book is written according to the macOS High Sierra 10.13.2 experience. Bazel is installed through Homebrew.

Installing using Homebrew

The following are the steps to install Homebrew:

Homebrew has dependency with JDK, which you need to install first. Download the latest JDK from the Oracle website and install it.
Then, install Homebrew.

You can run the following script directly from Terminal:

/usr/bin/ruby -e "$(curl -fsSL \
   https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once Homebrew is installed, you can install Bazel with the following command:

brew install bazel

All is well. Now, you can verify the Bazel version using the command shown here:

bazel version

If Bazel is already installed, you can upgrade the version using this command:

brew upgrade bazel

Note that Bazel does not currently support Android builds on Windows. Windows users should download the pre-built binary.

Installing Android NDK and SDK

You need Android NDK to build the TensorFlow Lite code. You can download this from NDK Archives, accessed through the following link.

Android NDK Archives can be downloaded from: https://developer.android.com/ndk/downloads/older_releases.

Android Studio comes with SDK tools. You need to access build tools version 23 or higher (the application runs on devices with API 21 or higher).

You can update the WORKSPACE file in the root of the directory with the API level and path to both SDK and NDK.

Update the api_level and location of the SDK and NDK at the root of the repository. If you open SDK Manager from Studio, you can find the SDK path. For example, note the following for SDK:

android_sdk_repository (
 name = "androidsdk",
 api_level = 27,
 build_tools_version = "27.0.3",
 path = "/Users/coco/Library/Android/sdk",
)

And for Android NDK archives:

android_ndk_repository(
 name = "androidndk",
 path = "/home/coco/android-ndk-r14b/",
 api_level = 19,
)

At the time of writing, android-ndk-r14b-darwin-x86_64.zip is used from the NDK Archives. You can adjust the preceding parameters based on the availability.

Now, we are ready to build the source code. To build the demo app, run Bazel:

bazel build --cxxopt=--std=c++11 
 //tensorflow/contrib/lite/java/demo/app/src/main:TfLiteCameraDemo

Caution: Due to a bug, Bazel only supports the Python 2 environment right now.

MobileNet is a good place to start ML. The model images from this dataset consist of images in 299 * 299 pixel. But, the camera captures in a 224 * 224 pixel image and resizes it to match the size in the model. It occupies 224 * 224 * 3 bytes in the disk, per image. These bytes are converted into ByteBuffer row by row after that. Here, the number 3 represents RGB values of a pixel.

The demo app here uses the TensorFlow Lite Java API, which takes a single image as input and produces the same in output. The output contains a two-dimensional array. The first array contains the category index value, and the second dimension contains the confidence value of the classification. From the values, it displays the top three to the user on the frontend.

TensorFlow Lite on iOS

Now, we will build the same application on the iOS environment. The app has the same features, and we will also use the same quantized MobileNet model. We need to run it on a real iOS device to use the camera functionality; it won't work on a simulator.

Prerequisites

To begin using Xcode, you need to have a valid Apple developer ID on their portal. This application also requires an iPhone since it uses the camera module. You need to have the provisioning profile assigned to the particular device. Only then should you be able to build and run the application on the device.

You can clone the complete TensorFlow repository, but to run this application you may not need the complete source code. If you have downloaded it already, you don't need to do it again:

git clone https://github.com/tensorflow/tensorflow

Xcode comes with command-line tools, as shown here:

xcode-select --install

Building the iOS demo app

If you are not very familiar with iOS application building, please look at some basic tutorials for this. You need to install cocoapods to install all the dependencies:

sudo gem install cocoapods

There is a shell script available to download the model files required to run this application:

sh tensorflow/contrib/lite/examples/ios/download_models.sh

You can go to the project directory and install pod from the command line:

cd tensorflow/contrib/lite/examples/ios/camera
pod install
pod update

Once the update is done, you should be able to see tflite_camera_example.xcworkspace. Then, you can open the application in Xcode. You can use the following command as well:

open tflite_camera_example.xcworkspace

It is now time to build and run the application on your iPhone.

You need to allow the app the user permissions for camera usage. Use the camera to point to objects, and start seeing the classification results!

Core ML

Core ML helps us to build ML learning applications for iOS platforms.

Core ML uses trained models that make predictions based on new input data. For example, a model that's been trained on a region's historical land prices may be able to predict the price of land when given the details of locality and size.

Core ML acts as a foundation for other frameworks that are domain-specific. The major frameworks that Core ML supports include GamePlayKit to evaluate the learned decision trees, natural language processing (NLP) for text analysis, and vision framework for image-based analysis.

Core ML is built on top of accelerate, basic neural network subroutines (BNNSs), and Metal Performance Shaders, as shown in the architecture diagram from the Core ML documentation:

With the Accelerate Framework, you can do mathematical computations on a large scale as well as calculations based on images. It is optimized for high performance and also contains APIs written in C for vector and matrix calculations, Digital Signal Processing (DSP), and other computations.
BNNS help to implement neural networks. From the training data, the subroutine methods and other collections are useful for implementing and running neural network.

With the Metal framework, you can render advanced three-dimensional graphics and run parallel computations using the GPU device. It comes with Metal shading language, the MetalKit framework, and the Metal Performance Shaders framework. With the Metal Performance Shaders framework, it is tuned to work with the hardware features of each GPU family for optimal performance.

Core ML applications are built on top of the three layers of components mentioned, as shown in the following diagram:

Core ML is optimized for on-device performance, which minimizes memory footprint and power consumption.

Core ML model conversion

To run your first application on iOS, you don't need to start building your own model. You can use any one of the best existing models. If you have a model that is created using another third-party framework, you can use the Core ML Tools Python package, or third-party packages such as MXNet converter or TensorFlow converter. The links to access these tools are given next. If your model doesn't support any of these converters, you can also write your own converter.

Core ML Tools Python package can be downloaded from: https://pypi.org/project/coremltools/
TensorFlow converter can be accessed through the link : https://github.com/tf-coreml/tf-coreml
MXNet converter can be downloaded from: https://github.com/apache/incubator-mxnet/tree/master/tools/coreml

The Core ML Tools Python package supports conversion from Caffe v1, Keras 1.2.2+, scikit-learn 0.18, XGBoost 0.6, and LIBSVM 3.22 frameworks. This covers models of SVM, tree ensembles, neural networks, generalized linear models, feature engineering, and pipeline models.

You can install Core ML tools through pip:

pip install -U coremltools

Converting your own model into a Core ML model

Convert your existing model into a Core ML model can be done through the coremltools Python package. If you want to convert a simple Caffe model to a Core ML model, it can be done with the following example:

import coremltools
my_coremlmodel = 
  coremltools.converters.caffe.convert('faces.caffemodel')
  coremltools.utils.save_spec(my_coremlmodel, 'faces.mlmodel')

This conversion step varies between different models. You may need to add labels and input names, as well as the structure of the model.

Core ML on an iOS app

Integrating Core ML on an iOS app is pretty straightforward. Go and download pre-trained models from the Apple developer page. Download MobileNet model from there.

After you download MobileNet.mlmodel, add it to the Resources group in your project. The vision framework eases our problems by converting our existing image formats into acceptable input types. You can see the details of your model as shown in the following screenshot. In the upcoming chapters, we will start creating our own models on top of existing models.

Let's look at how to load the model into our application:

Open ViewController.swift in your recently created Xcode project, and import both Vision and Core ML frameworks:

/**
Lets see the UIImage given to vision framework for the prediction.
The results could be slightly different based on the UIImage conversion.
**/
func visionPrediction(image: UIImage) {
     guard let visionModel = try? VNCoreMLModel(for: model.model) else{
                fatalError("World is gonna crash!")
     }
    let request = VNCoreMLRequest(model: visionModel) { request, error  
                                                        in
     if let predictions = request.results as? [VNClassificationObservation] {
 //top predictions sorted based on confidence
 //results come in string, double tuple
     let topPredictions = observations.prefix(through: 5)
 .map { ($0.identifier, Double($0.confidence)) }
     self.show(results: topPredictions)
     }
   }
}

Let's load the same image through the Core ML MobileNet model for the prediction:

/** 
Method that predicts objects from image using CoreML. The only downside of this method is, the mlmodel expects images in 224 * 224 pixels resolutions. So we need to manually convert UIImage
into pixelBuffer.
**/
func coremlPrediction(image: UIImage) {
     if let makeBuffer = image.pixelBuffer(width: 224, height: 224),
     let prediction = try? model.prediction(data: makeBuffer) {
     let topPredictions = top(5, prediction.prob)
     show(results: topPredictions)
    }
}