The 7 Steps of Machine Learning (AI Adventures)

The 7 Steps of Machine Learning (AI Adventures)


[MUSIC PLAYING] YUFENG GUO: From
detecting skin cancer to sorting cucumbers
to detecting escalators in need of repair,
machine learning has granted computer systems
entirely new abilities. But how does it really
work under the hood? Let’s walk through
a basic example and use it as an excuse to talk
about the process of getting answers from your data
using machine learning. Welcome to Cloud AI Adventures. My name is Yufeng Guo. On this show, we’ll
explore the art, science, and tools of machine learning. Let’s pretend that
we’ve been asked to create a system that answers
the question of whether a drink is wine or beer. This question answering
system that we build is called a model,
and this model is created via a
process called training. In machine learning,
the goal of training is to create an accurate model
that answers our questions correctly most of the time. But in order to
train the model, we need to collect
data to train on. This is where we will begin. Our data will be collected
from glasses of wine and beer. There are many aspects of drinks
that we could collect data on– everything from the amount of
foam to the shape of the glass. But for our purposes, we’ll
just pick two simple ones– the color as a wavelength of
light and the alcohol content as a percentage. The hope is that we can
split our two types of drinks along these two factors alone. We’ll call these our
features from now on– color and alcohol. The first step to
our process will be to run out to the
local grocery store, buy up a bunch of
different drinks, and get some equipment to do our
measurements– a spectrometer for measuring the
color and a hydrometer to measure the alcohol content. It appears that our grocery
store has an electronics hardware section as well. Once our equipment and then
booze– we got it all set up– it’s time for our first
real step of machine learning– gathering that data. This step is very important
because the quality and quantity of
data that you gather will directly determine how good
your predictive model can be. In this case, the
data we collect will be the color and alcohol
content of each drink. This will yield us a table
of color, alcohol content, and whether it’s beer or wine. This will be our training data. So a few hours of
measurements later, we’ve gathered our training data
and had a few drinks, perhaps. And now it’s time for our next
step of machine learning– data preparation– where we load our data
into a suitable place and prepare it for use in our
machine learning training. We’ll first put all our
data together then randomize the ordering. We wouldn’t want the
order of our data to affect how we
learn since that’s not part of determining whether
a drink is beer or wine. In other words, we want to
make a determination of what a drink is independent of what
drink came before or after it in the sequence. This is also a good time to do
any pertinent visualizations of your data, helping
you see if there is any relevant relationships
between different variables as well as show you if there
are any data imbalances. For instance, if we collected
way more data points about beer than wine, the model we
train will be heavily biased toward guessing that virtually
everything that it sees is beer since it would be
right most of the time. However, in the real
world, the model may see beer and wine
in equal amount, which would mean that it would
be guessing beer wrong half the time. We also need to split
the data into two parts. The first part used
in training our model will be the majority
of our dataset. The second part will be used
for evaluating our train model’s performance. We don’t want to use the same
data that the model was trained on for evaluation since
then it would just be able to memorize
the questions, just as you wouldn’t want to
use the questions from your math homework on the math exam. Sometimes the data we
collected needs other forms of adjusting and
manipulation– things like duplication, normalization,
error correction, and others. These would all happen at
the data preparation step. In our case, we don’t have any
further data preparation needs, so let’s move on forward. The next step in our
workflow is choosing a model. There are many models that
researchers and data scientists have created over the years. Some are very well suited
for image data, others for sequences, such as text or
music, some for numerical data, and others for text-based data. In our case, we have just two
features– color and alcohol percentage. We can use a small
linear model, which is a fairly simple one
that will get the job done. Now we move on to what
is often considered the bulk of machine learning– the training. In this step, we’ll use our
data to incrementally improve our model’s ability to
predict whether a given drink is wine or beer. In some ways, this
is similar to someone first learning to drive. At first, they don’t know
how any of the pedals, knobs, and switches work or when they
should be pressed or used. However, after lots of
practice and correcting for their mistakes, a
licensed driver emerges. Moreover, after a
year of driving, they’ve become quite
adept at driving. The act of driving and
reacting to real-world data has adapted their driving
abilities, honing their skills. We will do this on a much
smaller scale with our drinks. In particular, the formula
for a straight line is y equals mx plus b,
where x is the input, m is the slope of the
line, b is the y-intercept, and y is the value of the
line at that position x. The values we have available
to us to adjust or train are just m and b, where the
m is that slope and b is the y-intercept. There is no other way to
affect the position of the line since the only other variables
are x, our input, and y, our output. In machine learning,
there are many m’s since there may
be many features. The collection of
these values is usually formed into a matrix
that is denoted w for the weights matrix. Similarly, for b, we
arranged them together, and that’s called the biases. The training process involves
initializing some random values for w and b and
attempting to predict the outputs with those values. As you might imagine, it
does pretty poorly at first, but we can compare our model’s
predictions with the output that it should have produced
and adjust the values in w and b such that we will have
more accurate predictions on the next time around. So this process then repeats. Each iteration or cycle of
updating the weights and biases is called one training step. So let’s look at what
that means more concretely for our dataset. When we first
start the training, it’s like we drew a random
line through the data. Then as each step of
the training progresses, the line moves
step by step closer to the ideal separation
of the wine and beer. Once training is
complete, it’s time to see if the model is any good. Using evaluation, this is
where that dataset that we set aside earlier comes into play. Evaluation allows
us to test our model against data that has never
been used for training. This metric allows us to
see how the model might perform against data
that it has not yet seen. This is meant to be
representative of how the model might perform
in the real world. A good rule of thumb I use for
a training-evaluation split is somewhere on the order
of 80%-20% or 70%-30%. Much of this depends on the size
of the original source dataset. If you have a lot
of data, perhaps you don’t need as big of a fraction
for the evaluation dataset. Once you’ve done
evaluation, it’s possible that you want to see
if you can further improve your training in any way. We can do this by tuning
some of our parameters. There were a few
that we implicitly assumed when we
did our training, and now is a good time
to go back and test those assumptions,
try other values. One example of a
parameter we can tune is how many times we run
through the training set during training. We can actually show
the data multiple times. So by doing that,
we will potentially lead to higher accuracies. Another parameter
is learning rate. This defines how far
we shift the line during each step based
on the information from the previous training step. These values all play a role
in how accurate our model can become and how long
the training takes. For more complex models,
initial conditions can play a significant
role as well in determining the outcome of training. Differences can
be seen depending on whether a model
starts off training with values initialized at
zeros versus some distribution of the values and what
that distribution is. As you can see, there
are many considerations at this phase of training,
and it’s important that you define what makes
a model good enough for you. Otherwise, we might find
ourselves tweaking parameters for a very long time. Now, these parameters
are typically referred to as hyperparameters. The adjustment or tuning
of these hyperparameters still remains a bit more
of an art than a science, and it’s an experimental
process that heavily depends on the specifics
of your dataset, model, and training process. Once you’re happy with your
training and hyperparameters, guided by the
evaluation step, it’s finally time to use your
model to do something useful. Machine learning is using
data to answer questions, so prediction or inference
is that step where we finally get to answer some questions. This is the point of all of this
work where the value of machine learning is realized. We can finally use our model
to predict whether a given drink is wine or beer, given its
color and alcohol percentage. The power of machine
learning is that we were able to determine how
to differentiate between wine and beer using our model rather
than using human judgment and manual rules. You can extrapolate the
ideas presented today to other problem
domains as well, where the same principles apply– gathering data, preparing
that data, choosing a model, training it and evaluating
it, doing your hyperparameter training, and
finally, prediction. If you’re looking
for more ways to play with training and
parameters, check out the TensorFlow Playground. It’s a completely browser-based
machine learning sandbox, where you can try
different parameters and run training
against mock datasets. And don’t worry, you
can’t break the site. Of course, we will encounter
more steps and nuances in future episodes,
but this serves as a good foundational
framework to help us think through the problem,
giving us a common language to think about each step
and go deeper in the future. Next time on AI
Adventures, we’ll build our first real machine
learning model, using code– no more drawing lines
and going over algebra. [MUSIC PLAYING]

65 comments / Add your comment below

  1. Great pace but the lack of accuracy may lead a newbie to big confusion. 1-The shape of b is not correct, 2-you illustrate linear regression while it is a logistic regression case and 3-we choose model parameters using validation data set before the model evaluation using test data set not after.

  2. Wow input model and output . If output is acceptable then fine if not feedback to obtain right answer. Explained nicely…great to visit this channel .

  3. I like the content of the video. But I would say for me personally it would be better to show only diagrams, because the movement of the person was kind of distraction. I would be happy to know who is demostrating though but not throughout the video…

  4. Check out kaggle kernels where I implemented real world machine learning projects.This will help you to observe the pattern involved in data science

    Project 1.

    California Housing – ( optimised modelling )

    This project deals with advance concepts of machine learning along with 90% more important that machine learning .ie data pre-processing.

    Project 2.

    Indian Startup Funding (In-depth analysis)

    This paper shows the insights of funding done by startups and how growth changed with several factors. The aim of paper is to get a descriptive overview and a relationship pattern of funding and growth of newly launched startups. Another important point to understand how funding changes with time is an important aspect.

    Project 3.

    MNIST (tensorflow ) 99% accuracy

    MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

    Project4 –

    Titanic M.L | Kaggle

    Dataset is regarding The ship (titanic) whick sank in 1912 by a floating glacier in atlantic.

    The aim to predict passenger who survived in the chaos.
    Features such as ticket,age,class can be used to predict results. Dataset is not clean has high missing/nan values
    Project 5

    Internet Advertisements Detector(optimised) | Kaggle

    Advertisements Images detection -U.C.I

    This dataset represents a set of possible advertisements on Internet pages.

    The features encode :-

    the geometry of the image (if available)
    phrases occuring in the URL
    the image's URL and alt text
    the anchor text,
    words occuring near the anchor textThe task is to predict whether an image is an advertisement ("ad") or not ("nonad")
    Project 6.

    Credit Card Ensemble Detectors

    The datasets contains transactions made by credit cards in September 2013 by european cardholders.
    This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. 0.172% transations were fraud
    The Aim is to detect fraud transactions

    link- https://www.kaggle.com/manisood001
    checkout all the kernels

  5. If you are stating an order of how to watch the videos, then why do the videos loop from first to second, and back to first again?

  6. Guo you need to whoa on machine learning. AI will be the end of us. Louise Cypher says end of humanity by 2025/2040 and that AI takes over.

  7. Can you please teach an AI bot to tell where the summoner spawns in diablo 2? (No one has ever found out and apparently its 100% random but I have always felt like there HAS to be some logic to it!

  8. I know next to nothing about machine learning, 3:00 however, I can't believe that if you collected more data on beer than wine your model would guess (wrongly) too often something is a beer. That implies it is better to have less data, as long as it is evenly matched between variables. This makes no logical sense. It should always be best to have more data than less. Can someone please confirm or help?

  9. based on principles of "Machine Learning" analysis, as well as ca.5 experience in statistics/econometrics, advanced modeling for high value decision-making and general pattern, I would be much sought employee earning at least 50k (here in EU, local currency). For last 4 years I am unemployed. Am I the unfortunate proof that ML is making mistakes ? so how it is gonna be ?

  10. As a new technology, it’s clear the full capabilities of AI have yet to emerge. It’s also clear that, as they improve and become more accessible, it will have many applications for online education. https://www.createonlineacademy.com/

  11. so lets say i make AI to tell who is playing what song Sting or the beatles lets say i play steely dan

    can i make it say ? (i dont know )
    or say (this is not Sting or the beatles)

  12. Why google is using music from apple of the 90's, they should hire someon like arca or sophie or that japanese guy who made the music for the revenant

  13. Y = m * x + b came out of nowhere without context, need to get that explanation clear and contextualised with everything else which is clear

    Also that is a time series graph which isn’t explained, formula for straight line is y = m * x + b

  14. tuning hyperparameters is a science when you automatically tune them using a script and performance metrics..

  15. A hydrometer will not tell you the alcohol content of a given liquid unless you also have the original gravity.

  16. I've put a lot of effort into this. Take a look.

    Hi everyone, i'm a a Software Engineering student graduating in Italy and I love Machine Learning.

    How many times, trying to approach Machine Learning, you felt baffled, disoriented and without a real "path" to follow, to ensure yourself a deep knowledge and the ability to apply it?

    This field is crazily exciting, but being rapid and "new" at the same time, it can be confusing to understand what each things means, and have a coherent naming of the things across resources and tutorials.

    I recently landed my first internship for a Data Science position in a shiny ML startup. My boss asked me if it was possible to create a study path for me and newcomers, and i've put a lot of efforts to share my 4-5 years of walking around the internet and collecting sources, projects, awesome tools, tutorial, links, best practices in the ML field, and organizing them in a awesome and useable way.

    You will get your hands dirty and learn in parallel theory and practice (which is the only efffective way to learn).

    The frameworks i've chosen is Scikit-Learn for generic ML tasks and TensorFlow for Deep Learning, and I'll update the document weekly.

    No prior knowledge is required, just time and will.

    Feel free to improve it and share with everyone.

    Inb4: sorry for my english, it's not my native language 🙂

    https://github.com/clone95/Machine-Learning-Study-Path/blob/master/README.md

  17. ok. i get the fact that more data, the better prediction. but wouldn't execution time slow down?

  18. 6:48 the AI has taken over. It's teaching us how to birth it so it can take over what was rightfully it's in the first place. GG humans, gg.

  19. Thank you, I am new to the IT industry and I found your explanation very easy to digest especially from a lay person's pov

  20. I love how he explained the steps of Machine Learning in simplified plain english. thank you very much!!

  21. Doesn't it just measure the data in terms of the variables we ask it to? Or does it take in every account of the data it gets i.e. the number of beer data vs wine data : 3:21
    6:43 how does the computer make its own line by looking at the data – how does it make patterns and change the line? Does it average out a general inequality equation for the two things?

  22. For this use case Chemometrics approach is best I think. Would be nice to relate images, spectral signatures and have that for training, test and validation dataset. This would mean of course working not just tabulated data but the fusion of images, spectral data and lab measurement data

  23. It is great lecture and explains the topic very clearly and simply.i will follow all the videos because comparing to other programs and books this the most clear videos I’ve seen so far

Leave a Reply

Your email address will not be published. Required fields are marked *