The field of Machine Learning (ML) currently lacks a common platform for the development of massively distributed and collaborative computing. As a result, there are impediments to leveraging and reproducing the work of other ML researchers, potentially slowing down the progress of the field. The ubiquity of the browser as a computational engine makes it an ideal platform for the development of massively distributed and collaborative ML. Machine Learning in the Browser (MLitB) is an ambitious software development project whose aim is to bring ML, in all its facets, to an audience that includes both the general public and the research community.
By writing ML models and algorithms in browser-based programming languages, many research opportunities become available. The most obvious is software compatibility: nearly all computing devices can collaborate in the training of ML models by contributing some computational resources to the overall training procedure and can, with the same code, harness the power of sophisticated predictive models on the same devices (see Fig. 1). This goal of ubiquitous ML has several important consequences: training ML models can now occur on a massive, even global scale, with minimal cost, and ML research can now be shared and reproduced everywhere, by everyone, making ML models a freely accessible, public good. In this paper, we present both a long-term vision for MLitB and a light-weight prototype implementation of MLitB, that represents a first step in completing the vision, and is based on an important ML use-case, Deep Neural Networks.
In Section ‘MLITB: Vision’ we describe in more detail our vision for MLitB in terms of three main objectives: (1) make ML models and algorithms ubiquitous, for both the public and the scientific community, (2) create an framework for cheap distributed computing by harnessing existing infrastructure and personal devices as novel computing resources, and (3) design research closures, software objects that archive ML models, algorithms, and parameters to be shared, reused, and in general, support reproducible research.
MLitB is influenced and inspired by current volunteer computing projects. These and other related projects, including those from machine learning, are presented in Section ‘Related Work.’ Our prototype has exposed several challenges requiring further research and engineering; these are presented in Section ‘Opportunities and Challenges,’ along with discussion of interesting application avenues MLitB makes possible. The most urgent software development directions follow in Section ‘Future MLitB Development.’
Our long-term vision for MLitB is guided by three overarching objectives:
Ubiquitous ML: models can be training and executed in any web browsing environment without any further software installation.
Cheap distributed computing: algorithms can be executed on existing grid, cloud, etc., computing resources with minimal (and possibly no) software installation, and can be easily managed remotely via the web; additionally, small internet enabled devices can contribute computational resources.
Reproducibility: MLitB should foster reproducible science with research closures, universally readable objects containing ML model specifications, algorithms, and parameters, that can be used seamlessly to achieve the first two objectives, as well as support sharing of ML models and collaboration within the research community and the public at large.
Ubiquitous machine learning
The browser is the most ubiquitous computing device of our time, running, in some shape or form on all desktops, laptops, and mobile devices. Software for state-of-the-art ML algorithms and models, on the other hand, are very sophisticated software libraries written in highly specific programming languages within the ML research community (Bastien et al., 2012; Jia et al., 2014; Collobert, Kavukcuoglu & Farabet, 2011). As research tools, these software libraries have been invaluable. We argue, however, that to make ML truly ubiquitous requires writing ML models and algorithms with web programming languages and using the browser as the computational engine.
The software we propose can run sophisticated predictive models on cell phones or super-computers; for the former, this extends the distributed nature of ML to a global internet. By further encapsulating the algorithms and model together, the benefit of powerful predictive modeling becomes a public commodity.
Cheap distributed computing
The usage of web browsers as compute nodes provides the capability of running sophisticated ML algorithms without the expense and technical difficulty of using custom grid or super-computing facilities (e.g., Hadoop cloud computing Shvachko et al. (2010)). It has long been a dream to use volunteer computing to achieve real massive scale computing. Successes include Seti@Home (Anderson et al., 2002) and protein folding (Lane et al., 2013). MLitB is being developed to not only run natively on browsers but also for scaled distributed computing on existing cluster and/or grid resources and, by harnessing the capacity of non-traditional devices, for extremely massive scale computing with a global volunteer base. In the former set-up, low communication overhead and homogeneous devices (a “typical” grid computing solution) can be exploited. In the latter, volunteer computing via the internet opens the scaling possibilities tremendously, albeit at the cost of unreliable compute nodes, variable power, limited memory, etc. Both have serious implications for the user, but, most importantly, both are implemented by the same software.
Although the current version of MLitB does not provide GPU computing, it does not preclude its implementation in future versions. It is therefore possible to seamlessly provide GPU computing when available on existing grid computing resources. Using GPUs on mobile devices is a more delicate proposition since power consumption management is of paramount importance for mobile devices. However, it is possible for MLitB to manage power intelligently by detecting, for example, if the device is connected to a power source, its temperature, and whether it is actively used for other activities. A user might volunteer periodic “mini-bursts” of GPU power towards a learning problem with minimal disruption to or power consumption from their device. In other words, MLitB will be able to take advantage of the improvements and breakthroughs of GPU computing for web engines and mobile chips, with minimal software development and/or support.
Reproducible and collaborative research
Reproducibility is a difficult yet fundamental requirement for science (McNutt, 2014). Reproducibility is now considered just as essential for high-quality research as peer review; simply providing mathematical representations of models and algorithms is no longer considered acceptable (Stodden, Guo & Ma, 2013). Furthermore, merely replicating other work, despite its importance, can be given low publication priority (Casadevall & Fang, 2010) even though it is considered a prerequisite for publication. In other words, submissions must demonstrate that their research has been, or could be, independently reproduced.
For ML research there is no reason for not providing working software that allows reproduction of results (for other fields in science, constraints restricting software publication may exist). Currently, the main bottlenecks are the time cost to researchers for making research available, and the incompatibility of the research (i.e., code) for others, which further increases the time investment for researchers. One of our primary goals for MLitB is to provide reproducible research with minimal to no time cost to both the primary researcher and other researchers in the community. Following (Stodden, Borwein & Bailey, 2013), we support “setting the default to reproducible.”
For ML disciplines, this means other researchers should not only be able to use a model reported in a paper to verify the reported results, but also retrain the model using the reported algorithm. This higher standard is difficult and time-consuming to achieve, but fortunately this approach is being adopted more and more often, in particular by a sub-discipline of machine learning called deep learning. In the deep learning community, the introduction of new datasets and competitions, along with innovations in algorithms and modeling, have produced a rapid progress on many ML prediction tasks. Model collections (also called model zoos), such as those built with Caffe (Jia et al., 2014) make this collaboration explicit and easy to access for researchers. However, there remains a significant time investment to run any particular deep learning model (these include compilation, library installations, platform dependencies, GPU dependencies, etc.). We argue that these are real barriers to reproducible research and choosing ubiquitous software and compute engines makes it easier. For example, during our testing we converted a very performant computer vision model (Lin, Chen & Yan, 2013) into JSON format and it can now be used on any browser with minimal effort.1
In a nod to the concept of closures concept common in functional programming, our approach treats a learning problem as a research closure: a single object containing model and algorithm configuration plus code, along with model parameters that can be executed (and therefore tested and analyzed) by other researchers.
General architecture and design
The minimal requirements for MLitB are based on the scenario of running the network as public resource computing. The downside of public resource computing is the lack of control over the computing environment. Participants are free to leave (or join) the network at anytime and their connectivity may be variable with high latency. MLitB is designed to be robust to these potentially destabilizing events. The loss of a participant results in the loss of computational power and data allocation. Most importantly, MLitB must robustly handle new and lost clients, re-allocation of data, and client variability in terms of computational power, storage capacity, and network latency.
Although we are agnostic to the specific technologies used to fulfill the vision of MLitB, in practice we are guided by both the requirements of MLitB and our development constraints. Therefore, as a first step towards implementing our vision, we chose technology pragmatically. Our choices also follow closely the design principles for web-based big data applications (Begoli & Horey, 2012), which recommend popular standards and light-weight architectures. As we will see, some of our choices may be limiting at large scale, but they have permitted a successful small-scale MLitB implementation (with up to 100 clients).
Figure 2 shows the high-level architecture and web technologies used in MLitB. Modern web browsers provide functionality for two essential aspects of MLitB: Web Workers (W3C, 2014) for parallelizing program execution with threads and Web Sockets (IETF, 2011) for fast bi-directional communication channels to exchange messages more quickly between server and browser. To maintain compatibility across browser vendors, there is little choice for alternatives to Web Workers and Web Sockets. These same choices are also used in another browser-based distributed computing platform (Cushing et al., 2013).
The general design of MLitB is composed of several parts. A master server hosts ML problems/projects and connects clients to them. The master server also manages the main event loop, where client triggered events are handled, along with the reduce steps of a (bespoke) map-reduce procedure used for computation. When a browser (i.e., a heterogeneous device) makes an initial connection to the master server, a user-interface (UI) client (also known as a boss) is instantiated. Through the UI, clients can add workers that can perform different tasks (e.g., train a model, download parameters, take a picture, etc.). An independent data server serves data to clients using zip files and prevents the master server from blocking while serving data. For efficiency, data transfer is performed using XHR.2 Trained models can be saved into JSON objects at any point in the training process; these can later be loaded in lieu of creating new models.
The master node (server) is implemented in Node.js with communication between the master and slave nodes handled by Web Sockets. The master server hosts multiple ML problems/projects simultaneously along with all clients’ connections. All processes within the master are event-driven, triggered by actions of the slave nodes. Calling the appropriate functions by slave nodes to the master node is handled by the router. The master must efficiently perform its tasks (data reallocation and distribution, reduce-steps) because the clients are idle awaiting new parameters before their next work cycle. New clients must also wait until the end of an iteration before joining a network. The MLitB network is dynamic and permits slave nodes to join and leave during processing. The master monitors its connections and is able to detect lost participants. When this occurs, data that was allocated to the lost client is re-allocated the remaining clients, if possible, otherwise it is marked as to be allocated.
Clients are browser connections from heterogeneous devices that visit the master server’s url. Clients interact through a UI worker, called a boss, and can create slave workers to perform various tasks (see Workers). The boss is the main worker running in a client’s browser. It manages the slave and image download worker and functions as a bridge between the downloader and slaves. A simple wrapper handles UI interactions, and provides input/output to the boss. Client bosses use a data worker to download data from the data server using XHR. The data worker and server communicate using XHR and pass zip files in both directions. The boss handles unzipping and decoding data for slaves that request data. Clients therefore require no software installation other than its native browser. Clients can contribute to any project hosted by the master server. Clients can trigger several events through the UI worker. These include adjusting hyper-parameters, adding data, and adding slave workers, etc. (Fig. 3). Most tasks are run in a separate Web Worker thread (including the boss), ensuring a non-blocking and responsive client UI. Data downloading is a special task that, via the boss and the data worker, uses XHR to download from the data server.
In Fig. 3 the tasks implemented using Web Worker threads are shown. At the highest-level is the client UI, with which the user interacts with ML problems and controls their slave workers. From the client UI, a user can create a new project, load a project from file, upload data to a project, or add slave workers for a project. Slaves can perform several tasks; most important is the trainer, which connects to an event loop of a ML project and contributes to its computation (i.e., its map step). Each slave worker communicates directly to the master server using Web Sockets. For the latter three tasks, the communication is mainly for sending requests for models parameters and receiving them. The training slave has more complicated behavior because it must download data then perform computation as part of the main event loop. To begin training, the user sets the slave task to train and selects start/restart. This will trigger a join event at the master server; model parameters and data will be downloaded and the slave will begin computation upon completion of the data download. The user can remove a slave at any time. Other slave tasks are tracking, which requires receiving model parameters from the master, and allows users to monitor statistics of the model on a dataset (e.g., classification error) or to execute the model (e.g., classify an image on a mobile device). Each slave worker communicates directly to the master server using Web Sockets.
Events and software behavior
The MLitB network is constructed as a master–slave relationship, with one server and multiple slave nodes (clients). The setup for computation is similar to a MapReduce network (Dean & Ghemawat, 2008); however, the master server performs many tasks during an iteration of the master event loop, including a reduce step, but also several other important tasks.
The specific tasks will be dictated by events triggered by the client, such as requests for parameters, new client workers, removed/lost clients, etc. Our master event loop can be considered as a synchronized map-reduce algorithm with a user defined iteration duration T, where values of T may range from 1 to 30 s, depending on the size of the network and the problem. MLitB is not limited to a map-reduce paradigm and in fact we believe that our framework opens the door to peer-to-peer or gossip algorithms (Boyd et al., 2006). We are currently developing asynchronous algorithms to improve the scalability of MLitB.
Master event loop
The master event loop consists of five steps and is executed by the master server node as long there is at least one slave node connected. Each loop includes one map-reduce step, and runs for at least T seconds. The following steps are executed, in order:
New data uploading and allocation.
New client trainer initialization and data allocation.
Training workers reduce step.
Latency monitoring and data allocation adjustment.
Master broadcasts parameters.
(a) New data uploading and allocation
When a client boss uploads data, it directly communicates with the data server using XHR. Once the data server has uploaded the zip file, it sends the data indices and classification labels to the boss. The boss then registers the indices with the master server. Each data index is managed: MLitB stores an allocated index (the worker that is allocated the ID) and a cached index (the worker that has cached the ID). The master ensures that the data allocation is balanced amongst its clients. Once a data set is allocated on the master server, the master allocates indices and sends the set of IDs to workers. Workers can then request data from the boss, who in turn use its data downloader worker to download those worker specific IDs from the data server. The data server sends a zipped file to the data downloader, which are then unzipped and processed by the boss (e.g., JPEG decoding for images). The zip file transfers are fast but the decoding can be slow. We therefore allow workers to begin computing before the entire dataset is downloaded and decoded, allowing projects to start training almost immediately while data gets cached in the background.
(b) New client trainer initialization and data allocation
When a client boss adds a new slave, a request to join the project is sent to the master. If there is unallocated data, a balanced fraction of the data is allocated to the new worker. If there is no unallocated data, a pie-cutter algorithm is used to remove allocated data from other clients and assign it to the new client. This prevents unnecessary data transfers. The new worker is sent a set of data IDs it will need to download from the client’s data worker. Once the data has been downloaded and put into the new worker’s cache, the master will then add the new worker to the computation performed at each iteration. The master server is immediately informed when a client or one of its workers is removed from the network.3 Because of this, it can manage the newly unallocated data (that were allocated to the lost client).
(c) Training workers’ reduce step
The reduce step is completely problem specific. In our prototype, workers compute gradients with respect to model parameters over their allocated data vectors, and the reduce step sums over the gradients and updates the model parameters.
(d) Latency monitoring and data allocation adjustment
The interval T represents both the time of computation and the latency between the client and the master node. The synchronization is stochastic and adaptive. At each reduce step, the master node estimates the latency between the client and the master and informs the client worker how long it should run for. A client does not need to have a batch size because it just clocks its own computation and returns results at the end of its scheduled work time. Under this setting, it is possible to have mobile devices that compute only a few gradients per second and a powerful desktop machine that performs hundreds or thousands. This simple approach also allows the master to account for unexpected user activity: if the user’s device slows or has increased latency, the master will decrease the load on the device for the next iteration. Generally, devices with a cellular network connection communicate with longer delays than hardwired machines. In practice, this means the reduction step in the master node receives delayed responses from slave nodes, forcing it to run the reduction function after the slowest slave node (with largest latency) has returned. This is called asynchronous reduction callback delay.
(e) Master broadcasts parameters
An array of model parameters is broadcast to each clients’ boss worker using XHR; when the boss receives new parameters, they are given to each of its workers who then start another computation iteration.
ML use-case: deep neural networks
Scaling behavior of MLitB
Results for power and latency are shown in Fig. 4. Power increases linearly up to 64 slave nodes, at which point a large increase in latency limits additional power gains for new nodes. This is due to a single server reaching the limit of its capacity to process incoming gradients synchronously. Solutions include using multiple server processes, asynchronous updates, and partial gradient communication. Test error, as a function of the number of nodes is shown in Fig. 5 after 50 iterations (200 s) and 100 iterations (400 s); i.e., each point represents the same wall-clock computation time. This demonstrates the correctness of MLitB for a given model architecture and learning hyperparameters.
Due to the data allocation policy that limits the data vector capacity of each node to 3,000 vectors, experiments with more nodes process more of the training set during the training procedure. For example, using only 1 slave node trains on 3/60 of the full training set. With 20 nodes, the network is training on the full dataset. This policy could easily be modified to include data refreshment when running with unallocated data.
The primary latency issue is due to all clients simultaneously sending gradients to the server at the end of each iteration. Three simple scaling solutions are (1) increasing the number of master node processes that receive gradients (2) using asynchronous update rules (each slave computes for a random amount of time, then sends updates), reducing the load of any one master node process, and (3) partial communication of gradients (decreasing bandwidth).
Walk-through of MLitB prototype
We briefly describe how MLitB works from a researcher’s point of view.
Specification of neural network and training parameters
Using a minimalist UI (not shown), the researcher can specify their neural network, for example they can add/remove layers of different types, and adjust regularization parameters (L1/L2/dropout) and learning rates. Alternatively, the researcher can load a previously saved neural network in JSON format (that may or may not have already been trained). Once a NN is specified (or loaded), it appears in the display, along with other neural networks also managed by the master node. By selecting a specific neural network, the researcher can then add workers and data (e.g., project cifar10 in Fig. 6).
Specification of training data
Image classification data is simple to upload using named directory structures for image labels. For example, for CIFAR10 all files in the “apple” subdirectory will be given label “apple” once loaded (e.g., the image file /cifar10/apple/apple_apple_s_000022.png). The entire “cifar10” directory can be zipped and uploaded. MLitB processes JPEG and PNG formats. A test set can be uploaded in tracker mode.
In the training mode, a training worker performs as many gradient computations as possible within the iteration duration T (i.e., during the map step of the main event loop). The total gradient and the number of gradients is sent to the master, which then in the reduce step computes a weighted average of gradients from all workers and takes a gradient step using AdaGrad (Duchi, Hazan & Singer, 2011). At the end of the main event loop, new neural network weights are sent via Web Sockets to both trainer workers (for the next gradient computations) and to tracker workers (for computing statistics and executing the latest model).
There are two possible functions in tracking mode: (1) executing the neural network on test data, and (2) monitoring classification error on an independent data set. For 1, users can predict class labels for images taken with a device’s camera or locally stored images. Users can also learn a new classification problem on the fly by taking a picture and giving it a new label; this is treated as a new data vector and a new output neuron is added dynamically to the neural network if the label is also new. Figure 7 shows a test image being classified by the cifar10 trained neural network. For 2, users create a statistics worker and can upload test images and track their error over time; after each complete evaluation of the test images, the latest neural network received from the master is used. Fig. 8 shows the error for cifar10 using a small test set for the first 600 parameter updates.
Archiving trained neural network model
The prototype does not include a research closure specification. However, it does provide easy archiving functionality. At any moment, users can download the entire model specification and current parameter values in JSON format. Users can then share or initialize a new training session with the JSON object by uploading it during the model specification phase, which represents a high-level of reproducibility. Although the JSON object fully specifies the model, it does not include training or testing code. Despite this shortcoming, using a standard protocol is simple way of providing a lightweight archiving system.
Limitations of MLitB prototype
In this section we briefly discuss the limitations of the current prototype; later in Section ‘Opportunities and Challenges’ we will discuss the challenges we face in scaling MLitB to a massive level.
Our scaling experiment demonstrates that the MLitB prototype can accommodate up to 64 clients before latency significantly degrades its performance. Latency, however, is primarily affected by the length of an iteration and by size of the neural network. For longer iterations, latency will become a smaller portion of the main event loop. For very large neural networks, latency will increase due to bandwidth pressure.
As discussed previously, the main computational efficiency loss is due to the synchronization requirement of the master event loop. This requirement causes the master server to be idle while the clients are computing and the clients to wait while the master processes all the gradients. As the size of the full gradients can be large (at least >1 MB for small neural networks), the network bandwidth is quickly saturated at the end of a computation iteration and during the parameter broadcast. By changing to an asynchronous model, the master can continuously process gradients and the bandwidth can be maximally utilized. By communicating partial gradients, further efficiency can be attained. We leave this for future work.
There is a theoretical limit of 500 MB data storage per client (the viable memory of a web-browser). In our experience, the practical limit is closer to 100 MB at which point performance is lost due to memory management issues. We found that 1 MB/s bandwidth was achievable on a local network, which meant that it could handle images on MNIST and CIFAR-10 easily, but would stall for larger images. With respect to Deep Neural Networks, the data processing ability of a single node was limited (especially is one compared to sophisticated GPU enables libraries (Bastien et al., 2012)). Although we were most interested in the scaling performance, we note that naive convolution implementations significantly slow performance. We found that reasonable sized images, up to 100 × 100 × 3 pixels, can be processed on mobile devices in less than a second without convolutions, but can take several seconds with convolutions, limiting its usefulness. In the future, near native or better implementations will be required for the convolutional layers.
MLitB has been influenced by a several different technologies and ideas presented by previous authors and from work in different specialization areas. We briefly summarize this related work below.
BOINC (Anderson, 2004) is an open-source software library used to set up a grid computing network, allowing anyone with a desktop computer connected to the internet to participate in computation; this is called public resource computing. Public resource or volunteer computing was popularized by SETI@Home (Anderson et al., 2002), a research project that analyzes radio signals from space in the search of signs of extraterrestrial intelligence. More recently, protein folding has emerged as significant success story (Lane et al., 2013). Hadoop (Shvachko et al., 2010) is an open-source software system for storing very large datasets and executing user application tasks on large networks of computers. MapReduce (Dean & Ghemawat, 2008) is a general solution for performing computation on large datasets using computer clusters.
Distributed machine learning
The most performant deep neural network models are trained with sophisticated scientific libraries written for GPUs (Bergstra et al., 2010; Jia et al., 2014; Collobert, Kavukcuoglu & Farabet, 2011) that provide orders of magnitude computational speed-ups compared to CPUs. Each implements some form of stochastic gradient descent (SGD) (Bottou, 2010) as the training algorithm. Most implementations are limited to running on the cores of a single machine and by extension the memory limitations of the GPU. Exceptionally, there are distributed deep learning algorithms that use a farm of GPUs (e.g., Downpour SGD (Dean et al., 2012)) and farms of commodity servers (e.g., COTS-HPS (Coates et al., 2013)). Other distributed ML algorithm research includes the parameter server model (Li et al., 2014), parallelized SGD (Zinkevich et al., 2010), and distributed SGD (Ahn, Shahbaba & Welling, 2014). MLitB could potentially push commodity computing to the extreme using pre-existing devices, some of which may be GPU capable, with and without an organization’s existing computing infrastructure. As we discuss below, there are still many open research questions and opportunities for distributed ML algorithm research.
Opportunities and Challenges
In tandem with our vision, there are several directions the next version of MLitB can take, both in terms of the library itself and the potential kinds of applications a ubiquitous ML framework like MLitB can offer. We first focus on the engineering and research challenges we have discovered during the development of our prototype, along with some we expect as the project grows. Second, we look at the opportunities MLitB provides, not only based on the research directions the challenges uncovered, but also novel application areas that are perfect fits for MLitB. In Section ‘Future MLitB Development’ we preview the next concrete steps in MLitB development.
We have identified three keys engineering and research challenges that must be overcome for MLitB to achieve its vision of learning models a global scale.
State-of-the-art Neural Network models have huge numbers of parameters. This prevents them from fitting onto mobile devices. There are two possible solutions to this problem. The first solution is to learn or use smaller neural networks. Smaller NN models have shown promise on image classification performance, in particular the Network in Network (Lin, Chen & Yan, 2013) model from the Caffe model zoo, is 16 MB, and outperforms AlexNet which is 256 MB (Jia et al., 2014). It is also possible to first train a deep neural network then use it to train a much smaller, shallow neural network (Ba & Caruana, 2014). Another solution is to distribute the NN (during training and prediction) across clients. An example of this approach is Downpour SGD (Dean et al., 2012).
With large models, large of numbers of parameters are communicated regularly. This is a similar issue to memory limitation and could benefit from the same solutions. However, given a fixed bandwidth and asynchronous parameter updates, we can ask what parameter updates (from master to client) and which gradients (from client to master) should be communicated. An algorithm could transmit a random subset of the weight gradients, or send the most informative. In other words, given a fixed bandwidth budget, we want to maximize the information transferred per iteration.
Massively distributed learning algorithms
The challenges just presented are obvious areas of future distributed machine learning research (and are currently being developed for the next version of MLitB). Perhaps more interesting is, at a higher level, that the MLitB vision raises novel questions about what it means to train models on a global scale. For instance, what does it mean for a model to be trained across a global internet of heterogeneous and unreliable devices? Is there a single model or a continuum of models that are consistent locally, but different from one region to another? How should a model adapt over long periods of time? These are largely untapped research areas for ML.
Moving data collection and predictive models onto mobile devices makes is easy to bring models into the field. Connecting users with mobile devices to powerful NN models can aid field research by bringing the predictive models to the field, e.g., for fast labeling and data gathering. For example, a pilot program of crop surveillance in Uganda currently uses bespoke computer vision models for detecting pestilence (insect eggs, leaf diseases, etc.) (Quinn, Leyton-Brown & Mwebaze, 2011). Projects like these could leverage publicly available, state-of-the-art computer vision models to bootstrap their field research.
Privacy preserving computing and mobile health
Our MLitB framework provides a natural platform for the development of real privacy-preserving application (Dwork, 2008) by naturally protecting user information contained on mobile devices, yet allowing the data to be used for valuable model development. The current version of MLitB does not provide privacy preserving algorithms such as (Han et al., 2010), but these could be easily incorporated into MLitB. It would therefore be possible for a collection of personal devices to collaboratively train machine learning models using sensitive data stored locally and with modified training algorithms that guarantee privacy. One could imagine, for example, using privately stored images of a skin disease to build a classifier based on large collection of disease exemplars, yet with the data always kept on each patient’s mobile device, thus never shared, and trained using privacy preserving algorithms.
One of our main objectives was to provide simple, cheap, distributed computing capability with MLitB. Because MLitB runs with minimal software installation (in most cases requiring none), it is possible to use this framework for low-power consumption distributed computing. By using existing organizational resources running in low-energy states (dormant or near dormant) MLitB can wake the machines, perform some computing cycles, and return them to their low-energy states. This is in stark contrast to a data center approach which has near constant, heavy energy usage (Natural Resources Defense Council, 2014).
Future MLitB Development
Many ML models are constructed as chains of processing modules. This lends itself to a visual programming paradigm, where the chains can be constructed by dragging and dropping modules together. This way models can be visualized and compared, dissected, etc. Algorithms are tightly coupled to the model and a visual representation of the model can allow interaction with the algorithm as it proceeds. For example, learning rates for each layer of a neural network can be adjusted while monitoring error rates (even turned off for certain layers), or training modules can be added to improve learning of hidden layers for very deep neural networks, as done in Szegedy et al. (2014). With a visual UI it would be easy to pull in other existing, pre-trained models, remove parts, and train on new data. For example, a researcher could start with a pre-trained image classifier, remove the last layer, and easily train a new image classifier, taking advantage of an existing, generalized image representation model.
Machine learning library
Implementation of GPU kernels can bring MLitB performance up to the level of current state-of-the-art scientific libraries such as Theano (Bergstra et al., 2010; Bastien et al., 2012) and Caffe (Jia et al., 2014), while retaining the advantages of using heterogeneous devices. For example, balancing computational loads during training is very simple in MLitB and any learning algorithm can be shared by GPU powered desktops and mobile devices. Smart phones could be part of the distributed computing process by permitting the training algorithms to use short bursts of GPU power for their calculations, and therefore limiting battery drain and user disruption.
Design of research closures
MLitB can save and load JSON model configurations and parameters, allowing researchers to share and build upon other researchers’ work. However, it does not quite achieve our goal of a research closure where all aspects—code, configuration, parameters, etc—are saved into a single object. In addition to research closures, we hope to develop a model zoo, akin to Caffe’s for posting and sharing research. Finally, some kind of system for verifying models, like recomputation.org, would further strengthen the case for MLitB being truly reproducible (and provide backwards compatibility).