Machine Learning in JavaScript with TensorFlow.js


Google releases and showcases TensorFlow.js at the TensorFlow Dev Summit 2018. Machine learning in the browser has never been better. Here is the presentation:

ML in JS with TensorFlow.js Video Transcript

All right, hi everyone thanks for coming today. My name is Daniel, my name is Nikhil. We’re from the Google brain team and today we’re delighted to talk about JavaScript. So, Python has been one of the mainstream languages for scientific computing and it’s been like that for a while and there’s a lot of tools and libraries around Python but it’s that where it ends. We’re here today to take the job to convince you that JavaScript and the browser have a lot to offer and the TensorFlow playground is a great example of that. Curious, how many people have seen TensorFlow playground before?

Oh wow, quite a few. I’m very glad. So, for those of you that haven’t seen it you can check it out after our talk at – it is an in-browser visualization of a small
neural network and it shows in real time all the internals of the network as it’s training and this was a lot of fun to make and had a huge educational success.

We’ve been getting emails from high schools and universities that have been using this to teach students about machine learning. After we launched playground we were wondering why was it
so successful and we think one big reason was that it was in the browser and the browser is this unique platform where with the things you build you can share with anyone with just a link and those people that open your app, don’t have to install any drivers or any software it just works.

Another thing is it’s the browser is highly interactive and so the user is going to be engaged with whatever you’re building. Another big thing is that browsers, we didn’t take advantage of
this in the playground, but browsers have access to sensors like the microphone and the camera and the accelerometer and all of these sensors are behind standardized APIs that work on all
browsers, and the last thing, the most important thing, is the data that comes from these
sensors doesn’t ever have to leave the client, you don’t have to upload anything to the server, which preserves privacy.

Now the playground that we built is powered by a small neural network 300 lines of vanilla JavaScript that we wrote as a one-off library. It doesn’t scale, is just simple for loops and it
wasn’t engineered to be reusable. But it was clear to us that if we were to open the door to for people to merge machine learning and the browser we had to build a library and we did it.

We released deeplearn.js, a JavaScript library that is GPU accelerated, it does that via WebGL, which is a standard in the browser that allows you to render 3D graphics. We utilize it to do linear algebra for us and deeplearn.js allows you to both run inference in the browser and training entirely in the browser. When we released it we had an incredible momentum the community took
the plunges and took existing models in Python and ported it to the browser and build interactive fun things with it.

So one example is the style transfer. Another person ported the character RNN and then built a novel interface that allows you to explore all the different possible endings of a sentence, all
generated by the model in real time.

Another example is a font generative model. There was a post about this that the person that built it allowed users to explore the hidden dimensions the interesting dimensions in the embedding space and you can see how they relate to boldness and slanted nosov the font and
there was even educational examples like teachable machine that build this fun little game that taught people how computer vision models work so people could interact directly with the webcam now all the examples I showed you point to the incredible momentum we have with
deep lunges and building on that – we’re very excited today to announce that deep learn Jas is joining the tensorflow family and with that we are releasing a new ecosystem of libraries and tools for machine learning with JavaScript called tensorflow Jas now before we get into the details I want to go over three main use cases of how you can use tensorflow Jas today with the tools and libraries that we’re releasing so one use case is you can write models directly in the browser and this has huge educational implications think of the playground that I just showed a second use case is a major use case is you can take a pre-existing model pre-trained model in python use a script and you can import it into the browser to do inference and a related use case is the same model that you take to do inference you can retrain it potentially with private data that comes from those sensors of the browser in the browser itself now to give you more of a schematic view we have the browser that utilizes WebGL to do fast linear algebra on top of it tensorflow jeaious has two sets of api’s the apps API which used to be deep learn Jas and we worked hard to align the API with tensorflow Python it is powered by an automatic differentiation library that is built analogous to eager mode and on top of that we have a high level API layers API that allow you to use best practices and high level building blocks to write models what I’m also very excited today to announce is that we’re releasing tools that can take a existing Kerris model or tensorflow safe model and port it automatically free for execution in the browser now to show you an example of our API we’re going to go over a small program that tries to learn the coefficients of a quadratic function so the coefficients we’re trying to learn our a B and C from data so we have our import TF from tensorflow J’s for those of you that don’t know this is a standard yes six important JavaScript very common we have our three tensors a B and C we mark them as variables which means that they are mutable and the optimizer can change them we have our f of X function that does the polynomial computation you can see here familiar API like TF ad + TF square like tensorflow in addition to that API we also have a chaining API which allows you to call these math operations on tensors itself and this leads to better readable code that is closer to how we write math chaining is very popular in in JavaScript world so that’s the feed-forward part of the model now for the training part we need a loss function so here is a loss function there is just a mean squared error from between the prediction and the label we have our optimizer and SJ edge the optimizer and we train the model we call optimizer that minimize for some number of epochs and here I want to emphasize for those of you that have used Thea figure before or the talk before us Alexes talk the API in tens of ojs is aligned with the eager API in Python alright so clearly that’s not how most people write machine learning because because those a low-level linear algebra ops can be quite verbose and for that we have our layers API and to show you an example of that we’re gonna we’re going to build a recurrent neural network that learns to sum two numbers but the complicated part is that those numbers like the number 90 + 10 are being fed character by character and then the neural network has to maintain an internal state with an Alice TM cell that that state then gets passed into a decoder and then the decoder has to output 100 character by characters so it’s a sequence to sequence model now this may sound a little complicated but with the layers API is not that much lines of code we have our import TF from tensorflow J’s the four we have our sequential model which just means it’s a stack of layers for those of you that are familiar with TF layers in Python or Kerris this API looks very familiar we have the first two layers of the encoder the last three layers are the decoder and that’s our model we then compile it with a loss and optimizer and a metric we want a monitor like accuracy and we call model dot fit with our data now what I want to point out here is the await keyword so model dot fit is an asynchronous call which means because in practice that can take
about 30 or 40 seconds in a browser and in those 30 or 40 seconds you don’t want the main UI thread of the browser to be locked and this is why you get a callback with a history object after that’s done and in between the GPU is going to do the work now the code I showed you is when you are trying to write models directly when you want to write models directly in the browser but as I said before a major use case where even with deep learning is with people importing models that were pre trained and they just want to do inference in the browser and before we jump into the details of that I want to show you a fun little game that our friends at Google brand studio built that takes advantage of an automatically pre trained model poured it into the browser and the game is called emoji scavenger hunt and the way it works I’m going to show you here a real demo with the phone it’s in the browser let me see I see here so you can see I have a Chrome browser open up on a pixel phone you can see the URL on the top and the game uses the webcam and shows me an emoji and then I have some time some number of seconds to find the real version item of that emoji before the time runs out before we play the kill here is going to help me identify the objects that this game asks you ready I’m ready all right let’s go alright alright watch watch have a watch didn’t I spot a velvet come on yay we got that let’s see what’s our next item is choo-choo you gotta help me out here buddy oh yeah we got the shoe alright what’s next right nice a banana banana yes use any whenever but this guy’s got a banana oh come over here alright yeah high score here beer yeah it’s 10:30 in the morning Daniel let’s get back to the talk alright alright so I’m gonna jump into some of the technical details of how we actually built that game so what we did was we trained a model in tensorflow to be an object recognizer for the scavenger hunt game we chose about 400 different classes that would be reasonable for a game like this you know watches and bananas and and and beer so what we did was we use the tensorflow for poets code lab and in that code lab what you essentially do is you take a pre trained mobile net model and if you don’t know what mobile net is it’s a state-of-the-art computer vision model for edge devices so what we effectively did was we took that model and we retrained it for these classes now we have an object detector in the Python world how do we actually now get this into the browser well we provide a set of tools today that help you actually do that once it’s in you skin the game and you you know make the computer talk and all that kind of fun stuff let’s jump into how we actually convert that model so as Daniel mentioned earlier today we actually support two types of models so we support tensorflow saved models we have a convertor for that and we also have a convertor for Karis saved models so you define your model and you save it with a saved model this is a standard way to do that similarly for similarly this is code that you would do that for Karis is our important the next piece is that we actually convert it to the web today we’re releasing a pip package it’s tensorflow Jess you can install that there there’s a script in there that lets you point to your tensorflow save model and point to an output directory and that output directory will be where those static build artifacts will go for the web caris is the same exact flow point to your hdf5 input and you have an output directory where those Bulldog effects will go now you statically host those on your website somewhere you know just simple static hosting and on the JavaScript side we provide an API that lets you load that model so this is what it looks like for tensor flow and the tensor flow state model you’ll notice that it’s a frozen model we don’t right now support continuing training of this model well in the Karis case we actually let you continue training and we’re working hard to keep these api’s aligned in the future okay so under the cover what are we actually doing so we’re doing some graph optimization which essentially means that we prune out nodes that you don’t need to make the prediction you don’t need that in the web we optimize waits for browser Auto caching so we pack in shard in chunks of four megabytes which helps your browser be quick the next time your page reloads today we support about 90 of the most commonly used tensorflow ups and we’re working very hard to support more like control flops and we support 32 of the most commonly as Kara Slayers today and as I mentioned we let you continue training for Karis models and we let you do evaluation as well as make predictions from that mole okay so obviously there’s a lot you can do with just porting your models to the web for inference but since the beginning of deep learned s we’ve made it a high priority to make sure that you can train directly in the browser you know this opens up the door for education and interactive tools like we saw with a playground as well as let you train with data that never leaves your client so this is huge for privacy so to show off what you can do with something like this we built another little game so the goal of the game is to play pac-man with your webcam now Daniel is gonna be my helper here he is much much better at this game that I am for some reason to say hi so there are three phases of the game phase one we’re going to collect frames from the webcam and we’re going to associate those with up down left and right these classes now Daniel is gonna do you know move his head up down left and right and he’s just simply gonna play the game like that and you’ll notice as he’s collecting frames he’s kind of moving around a little bit this kind of helps the model see different angles for that class and generalize a little bit better so after he’s done collecting these frames we’re gonna go and train our model so we’re not actually training from scratch here when we hit that train button we’re taking a pre train mobile net again porting that to the web and doing a retraining phase with that data that’s local and we’re using the layers API to do that in the browser here you want to press that train button alright our loss is going down it looks like we’re learning something that’s great so as soon as we press that play button what’s gonna happen is we’re gonna make predictions from the webcam those are gonna get plugged into those controls and it’s going to control the Plaquemine pac-man game ready all right so you can see in the bottom right it’s  highlighting the class that it thinks it is and down if he moves his head around you’ll see it you’ll see a change like class and he’s off. So so all of this code is online and you can go for kit we invite you to do so and obviously this is just a game but

You can imagine other types of applications of this. Like make a browser extension that lets you control the page for accessibility purposes so again all this code is online please go for kit and play and make something else with it okay Daniel I know this is fine I know .all right