Making Sense of Big Data

Photo by Enoc Valenzuela (source: Unsplash)

Tensorflow HUB makes available a variety of pre-trained models ready to use for inference. A very powerful model is the (Multilingual) Universal Sentence Encoder that allows embedding bodies of text written in any language into a common numerical vector representation.


How to leverage a powerful pre-trained convolution neural network to extract embedding vectors for pictures.

Photo by Cosmic Timetraveler on Unsplash

In this tutorial, I will show you how to leverage a powerful pre-trained convolution neural network to extract embedding vectors that can accurately describe any kind of picture in an abstract latent feature space.
I will show some examples of using ResNext-WSL on the COCO dataset using the library PyTorch and other conventional tools from the PyData stack.

Why ResNext-WSL?

ResNeXt is the evolution of the well famous ResNet model that adds an additional dimension on top of it called the “cardinality” dimension. Through this improvement, the authors managed to beat the benchmark of ILSVRC classification task with a 15% improvement. Although…


Hands-on Tutorials

How to take a collection of vector embeddings and average them preserving the multi-sense topicality of their manifold structures.

Photo by Beatriz Pérez Moya on Unsplash

This is the third article of the “Embed, Cluster, and Average” series. Before diving deep into this tutorial, I recommend reading first the previous two articles: Extracting rich embedding features from pictures using PyTorch and ResNeXt-WSL and Manifold clustering in the embedding space using UMAP and GMM.

In this tutorial, we will take the embedding extracted from COCO pictures using the ResNext-WSL model, the sparse topic representation provided by the UMAP transformation, the GMM clustering model, and we will produce an embedding representation for collections of pictures (bag of words documents). …


How to reduce the dimensionality of embedding vectors and preserving manifold structures grouped into clusters.

Photo by DIMA VALENTINA on Unsplash

In the previous article Extracting rich embedding features from pictures using PyTorch and ResNeXt-WSL we have seen how to represent pictures into a multi-dimensional numerical embedding space. We have also seen the effectiveness of the embedding space to represent similar pictures closely to each other. In this tutorial, we will see a few clustering techniques that are suitable for discovering and identifying the manifolds in our dataset.
Moreover, the proposed clustering technique is also motivated by the “document embedding averaging” that will be described in the next article.

Dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP)

Dimensionality reduction is not just used for data visualization, but it is…


Over the last decade, I have worked with highly talented data science teams from several different industries, including marketing, advertising, automotive, financial services, and cybersecurity.

I have contributed to most of the lifecycle phases, worked with executives and stakeholders across many different functions, and seen recent advancements in the machine learning field reach an unthinkable level of maturity.

Luckily, one thing I have not dealt with is lacking the required sponsorship and resources. Nonetheless, I have to acknowledge that many organizations still struggle with turning these advances into tangible profits. …


This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.

Symbolic AI (or Classical AI) was one of the first branches of artificial intelligence attempting to explicitly represent human knowledge in a declarative form using symbols and rules.

This process has evident limitations, such as how to explicitly define common sense knowledge and the plethora of multi-dimensional complex relationships. …


This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in — — this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.

The enterprise applications of Machine Learning are converging into a common tech stack as illustrated in the following picture.

Tech stack for AI applications

Python dominates the language arena for most professional Data Scientists due to its extensive set of machine learning libraries, data crunching and visualization tools, general-purpose coding, wrappers around famous frameworks implemented…


Image by Liberal Dictionary

This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.

There is no doubt that the biggest advancements in AI technologies have involved solving generic cognitive tasks with a level of accuracy able to beat humans.

Every cloud provider has a comparable offering for those main cognitive services:


Image by Zaitsava Olga

This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.

“Federated Learning” is a new term for many of us, and it looks like the dawn of a new AI epoch.

Federated Learning refers to machine learning (ML) techniques that can train algorithms across multiple decentralized machines holding different local data samples — all without exchanging data. …


A scene of the movie 2001: A Space Odyssey

This piece is part of a series on 2019 trends in the AI and Machine Learning industry. You can read my full thoughts on the past year in this summary I wrote for the Helixa blog, which also includes links to the other in-depth pieces in this series.

2019 was the year of the ethical considerations in the AI industry, and we need to continue prioritizing this conversation as we move forward and scale the technology further.

Many different companies have used keynotes and other outlets to promote the importance of responsible AI and aligning with established ethical practices.

The…

Gianmario Spacagna

Director of Artificial Intelligence at Brainly

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store