Over 1,000 attendees, 50 speakers and 30 exhibitors; this is a brief summary of what I was lucky enough to take part in during the first AI Europe 2016 conference held in London on the 5 and 6 December. The attendee list boasted the biggest names from the world of artificial intelligence such as Microsoft, Bell Labs, Uber, Samsung and Nvidia, as well as several innovative start-ups, the likes of Blippar and DreamQuark whose innovations are based on machine or deep learning models.
Even if we can say with a degree of certainty that further advances in artificial intelligence are yet to come, leading players are in agreement that most AI techniques and technologies are now well-advanced. Therefore, their major preoccupation today is more about the quality of the data sets being used to train and validate their machine and deep learning models.
During the different presentations at AI Europe 2016, the following two key issues were set out:
- The importance of establishing high-quality reusable data sets for learning and validation of machine or deep learning models
- Starting the process by using “small data” solutions. These are more accessible and make it possible to generate interesting models with high added value more quickly.
“Garbage in, garbage out”
Whether it’s Bell Labs or Uber, Microsoft or Blippar, they all have one thing in common: they all agree on the fact that as of now, the quality of the data used in AI for machine learning is of the utmost importance. These companies are going to quite some lengths to build high quality learning and validation data sets. They also intend to create referential databases that can be reused for different purposes.
Bell Labs and Microsoft, for example, are putting together an army of linguists, psychologists and sociologists to assist them with building referential data sets on more abstract notions such as beauty, mood, and creativity. They are then going on to train deep learning models so as to analyse how the feelings behind these abstract notions can be extracted or predicted from texts, images or videos. The first results are indeed encouraging! As an example we can take the case of IBM, which has recently created a neural network able to scan a film and create its trailer. This is how Fox entrusted to Watson (IBM’s artificial intelligence system) choosing the scene selection for the trailer to Morgan.
According to these companies, these good results are more a result of the quality and relevance of the learning data used than of the technique or modelling tools used. Most of the techniques and tools available are ready now. But even with the best techniques, tools, and best data scientists, if the input data is wrong it will be impossible to extract anything conclusive from it.
Of course, all this comes with a hefty price tag. The big players like Bell Labs and IBM are training their neural networks with several hundreds of hidden layers. In a few years time, we’ll be talking about thousands of hidden layers. To achieve this, they’re using massive parallel machines, and even specialist machines such as the ones offered by NVidia. However, these machines are highly expensive and the learning process often long. This cost alone would justify the need to keep the learning process clean by avoiding the use of poor quality data.
Other than the players we’ve already mentioned, there are in fact a multitude of start-ups using deep learning, such as Blippar, which likens itself to “Shazam for pictures”. Using a photograph, their system can recognise objects, animals, plants and even faces. Of course, facial recognition raises the question of user rights and privacy, but this is an example that highlights how artificial intelligence is the here and now.
Think big, start small
Over the two days of conferences the speakers also offered advice to those looking to go into AI: don’t go rushing head first into solutions that are too ambitious,as they may require a high volume of data and a learning phase that can turn out to be long and costly.
Small data solutions use far fewer data and so operate on much shorter learning phases, which makes them much more accessible. These solutions, which are more modest in terms of data sets, make the provision of interesting, high added-value models possible without the heavy investment. These “economical” models may also help to qualify the learning data before going on to use it in more ambitious models. It goes without saying that striving for models designed to meet a particular need must be in line with the said need and, of course, the business model.
After two days of reflecting on artificial intelligence, it gave me an insight into some new perspectives for Sopra Steria and for our clients. Shouldn’t we all, us included, create good quality, referential data sets across a number of verticals and/or of our clients’ primary concerns in these verticals- if not just to qualify or compare different AI solutions which can address these issues.
I’d like to finish this article by quoting Luming Wang speaking at AI Europe 2016, Head of Deep Learning at Uber, and previously Head of Machine Learning at Amazon and Microsoft. He is convinced that in 20 to 30 years time, machines will be more intelligent than humans, and that they will be fully equipped with thought and emotions. He believes that artificial intelligence will be both a threat to humanity and, on the other hand, an incredible opportunity for the evolution of mankind. The more knowledge and hindsight we have of artificial intelligence, the better we’ll be able to get the best out of it.