• @RennederOPM
    link
    210 months ago
    •  Researchers have discovered that gradient descent has a theory that is not fully understood. 
    
    •  Gradient descent uses the cost function to find the lowest value on the graph. 
    
    •  Gradient descent algorithms pick a point and calculate the slope of the curve around it. 
    
    •  Conventional wisdom is to take small steps, but automated testing techniques have shown that large steps can be optimal. 
    
    •  Research has shown that the optimal stride length depends on the number of steps in the sequence. 
    
    •  The cyclic approach to gradient descent may be more efficient, but will not change its current use. 
    
    •  The results of the study raise a theoretical puzzle about the structure that governs the best decisions.