
一天理解深度学习资料.ppt
304页一天理解深度学习,Hung-yi Lee,Outline,Lecture I: Introduction of Deep Learning,Outline,,Machine Learning ≈ Looking for a Function,Speech Recognition Image Recognition Playing Go Dialogue System,“Cat”,“How are you”,“5-5”,“Hello”,“Hi”,(what the user said),(system response),(next move),Framework,A set of function,“cat”,“dog”,“money”,“snake”,Model,“cat”,Image Recognition:,Framework,A set of function,“cat”,Image Recognition:,Model,Training Data,Goodness of function f,Better!,“monkey”,“cat”,“dog”,function input:,function output:,,Supervised Learning,Framework,A set of function,“cat”,Image Recognition:,Model,Training Data,Goodness of function f,“monkey”,“cat”,“dog”,Pick the “Best” Function,Using,“cat”,,,Training,Testing,,,,Step 1,Step 2,Step 3,Three Steps for Deep Learning,,Neural Network,Neural Network,,,,,…,,bias,weights,Neuron,…,…,…,A simple function,Activation function,Neural Network,,,,bias,Activation function,weights,Neuron,1,4,0.98,Neural Network,,Different connections lead to different network structures,Weights and biases are network parameters 𝜃,The neurons have different values of weights and biases.,,,Fully Connect Feedforward Network,,,,,,,,,,,,,,,1,-1,1,-2,1,-1,,1,,0,4,-2,0.98,0.12,,,Fully Connect Feedforward Network,,,,,,,,,,,,,1,-2,1,-1,4,-2,0.98,0.12,2,-1,-1,-2,3,-1,4,-1,0.86,0.11,0.62,0.83,,,1,-1,Fully Connect Feedforward Network,,,,,,,,,,,,,1,-2,1,-1,,1,,0,0.73,0.5,2,-1,-1,-2,3,-1,4,-1,0.72,0.12,0.51,0.85,,0,,0,,-2,,2,𝑓 0 0 = 0.51 0.85,Given parameters 𝜃, define a function,𝑓 1 −1 = 0.62 0.83,,,0,0,This is a function.,Input vector, output vector,Given network structure, define a function set,,Output Layer,Hidden Layers,,,Input Layer,Fully Connect Feedforward Network,Input,Output,,,,……,……,……,……,……,y1,y2,yM,Deep means many hidden layers,neuron,Why Deep? Universality Theorem,Reference for the reason: ,Any continuous function f,Can be realized by a network with one hidden layer,(given enough hidden neurons),Why “Deep” neural network not “Fat” neural network?,Logic circuits consists of gates A two layers of logic gates can represent any Boolean function. Using multiple layers of logic gates to build some functions are much simpler,Neural network consists of neurons A hidden layer network can represent any continuous function. Using multiple layers of neurons to represent some functions are much simpler,Logic circuits,Neural network,,,less data?,More reason: ,Why Deep? Analogy,,8 layers,19 layers,22 layers,AlexNet (2012),VGG (2014),GoogleNet (2014),16.4%,7.3%,6.7%,http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf,Deep = Many hidden layers,AlexNet (2012),VGG (2014),GoogleNet (2014),152 layers,3.57%,Residual Net (2015),16.4%,7.3%,6.7%,Deep = Many hidden layers,Special structure,,,,,,,,,,,,,,,,,Output Layer,Softmax layer as the output layer,Ordinary Layer,,,,In general, the output of network can be any value.,May not be easy to interpret,,Output Layer,Softmax layer as the output layer,,,,Softmax Layer,,,,,3,-3,1,2.7,20,0.05,0.88,0.12,≈0,Probability: 1 𝑦 𝑖 0 𝑖 𝑦 𝑖 =1,Example Application,Input,,Output,,16 x 16 = 256,,,,,……,,,,Ink → 1 No ink → 0,Each dimension represents the confidence of a digit.,is 1,is 2,is 0,……,0.1,0.7,0.2,,The image is “2”,Example Application,Handwriting Digit Recognition,Machine,,,“2”,is 1,is 2,is 0,……,What is needed is a function ……,Input: 256-dim vector,output: 10-dim vector,Neural Network,Output Layer,Hidden Layers,,,Input Layer,Example Application,Input,Output,,,,……,……,……,……,,“2”,is 1,is 2,is 0,……,A function set containing the candidates for Handwriting Digit Recognition,You need to decide the network structure to let a good function in your function set.,FAQ,Q: How many layers? How many neurons for each layer? Q: Can we design the network structure? Q: Can the structure be automatically determined? Yes, but not widely studied yet.,Convolutional Neural Network (CNN) in the next lecture,Highway Network,Residual Network,Highway Network,Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385,Training Very Deep Networks https://arxiv.org/pdf/1507.06228v2.pdf,,,,,,,,,,,,,,,+,copy,copy,Gate controller,Input layer,output layer,Input layer,output layer,Input layer,output layer,Highway Network automatically determines the layers needed!,Three Steps for Deep Learning,,Training Data,Preparing training data: images and their labels,The learning target is defined on the training data.,“5”,“0”,“4”,“1”,“3”,“1”,“2”,“9”,,Learning Target,16 x 16 = 256,,,,,,,,……,,……,……,……,……,,,,Ink → 1 No ink → 0,,……,y1,y2,y10,y1 has the maximum value,The learning target is ……,Input:,,y2 has the maximum value,Input:,,is 1,is 2,is 0,,,,Softmax,Loss,,,,,,,,,……,,……,……,……,……,……,y1,y2,y10,,Loss 𝑙,“1”,,,,,,……,Loss can be square error or cross entropy between the network output and target,target,Softmax,As close as possible,A good function should make the loss of all examples as small as possible.,Given a set of parameters,Total Loss,NN,NN,NN,……,……,,,,𝑦 1,𝑦 2,𝑦 𝑅,,,,𝑙 1,……,……,NN,,𝑦 3。
