Machine learning is an expression increasingly used to refer to automatic machine learning, a sub-branch of the study of artificial intelligence. But what are the uderlying processes behind machine learning and what is it really about?

What is machine learning?

Machine learning is the development and use of computer systems capable of learning from mathematical algorithms and statistical models on data sets, without specific rules. Machine learning is divided into two phases:

  • The first is the machine learning phase.
  • The second is the use of the model created by the machine to perform a task.

The learning phase consists of providing data to the computer system so that it can analyze it and build a model. The model must allow it to perform the desired task without a human having to code it for the machine.

The second phase consists of using the machine designed model to perform a well-defined task. The most popular examples are document translation and recognition of a specific element in a given picture.

Study case

To fully understand the workings of machine learning, let’s study a concrete case.

You want your social photo network to automatically highlight publications that feature a cat. To do this, you first have to teach the machine to identify photos that feature a cat, and then program the highlighting of such photos on your network. The part we are interested in is the automated detection of cats in photos.

Step 1: Design of the learning corpus

In order for your machine to learn what a cat looks like in a picture, you need to build a training corpus. This corpus must be composed of cat pictures but also of pictures of other animals. Thus, the machine will be able to determine from this corpus if there is or not a cat in the picture.

To go further: In this example, you can annotate your corpus to supervise the machine learning by telling it which pictures contain a cat with “Yes” or “True”, and the others with “No” or “False”. It all depends on the type of machine learning you use!

Step 2: Learning phase

The training corpus is provided to the machine which will analyze the data to extract a model. This phase can last from a few minutes to several hours depending on the parameters and the type of machine learning used.

Step 3: Testing phase

In order to determine whether the machine designed model is consistent, it is recommended to test the model. To do this, you need to create a test corpus. Test corpora are similar to the training corpus, except that they are used to test the model and not to develop it. However, in some cases, test corpora may inclued data from the training corpus. The test corpus is then subjected to the model designed by the machine to evaluate it. If the results are satisfactory, the model can be implemented in your social photo network. If not, it is recommended to improve the model either by refining the training data or by changing the training parameters.

Step 4: Implementation of the model in the network

When the results of the model tests are satisfactory, all you have to do is implement the model in your social network and code the highlighting of the photos in which cats will be identified.

The different types of machine learning

As mentioned earlier in the study case, the type of annotation of the training corpus and the duration of the training phase depend on the type of machine learning used.

As a reminder, there are different types of automatic learning (view Key concepts of artificial intelligence):

  • Supervised learning, which consists in supervising the machine learning through an annotated learning corpus;
  • Unsupervised learning, which consists in providing an unannotated learning corpus to the machine and processing them only once;
  • Deep learning, which consists in providing an unannotated learning corpus to the machine, which will analyze it several times;
  • Artificial neural networks (or neural networks), which are based on advanced mathematical and statistical algorithms to analyze the learning corpus.

The choice of the type of machine learning used for your project is important, both financially and in terms of time.

The most advanced technologies, namely deep learning and neural networks, are the most expensive and time-consuming. They require large amounts of data, which impacts their learning time, which can sometimes last several days.

For learning methods that use annotated corpora, it is the preparatory phase of designing and annotating the corpus that can be long and costly, especially if it cannot be automated.

Conclusion

Machine learning is therefore a powerful technology that allows machines to automate the learning and execution of tasks. However, not all types of machine learning are equal and not all types are suitable for all projects. A reflection phase before the project will allow you to identify the type of “machine learning” that best fits your project and your budget.

Camille is a computational linguist. Following two experiences in Parisian start-ups on named entity recognition and callbots, she recently joined the Hubi.ai team at Hub Collab as a chatbot scriptwriter.