Neural Networks Suppose we are given data where xi E Rd, . E R, E R”h and wt e Rh and a 0’s the sigmoid function) Let us consider the following non, network objective function.
(w. • w2) = t(a — wi c(lq xal)2 • • What is the gradient of this function? • the function convex or nonconvex in (wi , /V2)? Please show your proof.
(2)
3 Gradient Descent (40 points) Let us consider gradient descent on the least squares problem: L(w) = • 11, — Xwe, where Y ;.) E RN, X E RN.’ is the data mat. where eve, row corresponds to a data feature of dimen.sion d, and w E Rd. Gradient descent is the update rule: Let At,…, A, be the eigenvEdues of 4rXTX in descending order .
• In terms of the aforementional eigenvalues, what is the threshold stepsixe such that for any rl above this threshold, gradient descent diverges, and for any I, below this threshold, gradient descent convoges? You must provide a technicaUy correct proof.
• Set such that liwk, —w.II 5 exp(—x))Iwk —,II, where s is some positive scalar. In particular, set I, such that is as large as possible. Mat is the value of g you used and what is tr.? You nutst provide a proof. You should be able to upper bound yottr expression so that you can state it in terms of the maximal eigenvalue A, and minimal eigenvalue Ad.
• Now suppose that you want your parameter to be e-close to the optimal one, i.e., you want — 5 e. How many itaations do you need to on the algorithm to guarantee this?
Last Completed Projects
topic title | academic level | Writer | delivered |
---|