*Amir Ajalloeian; Department of Electrical, Computer, and Energy Engineering; University of Colorado Boulder*

**Inexact Online Proximal-gradient Method for Time-varying Convex Optimization**

This paper considers an online proximal-gradient method to track the minimizers of a composite convex function that may continuously evolve over time. The online proximal-gradient method is "inexact,'' in the sense that: (i) it relies on an approximate first-order information of the smooth component of the cost; and, (ii)~the proximal operator (with respect to the non-smooth term) may be computed only up to a certain precision. Under suitable assumptions, convergence of the error iterates is established for strongly convex cost functions. On the other hand, the dynamic regret is investigated when the cost is not strongly convex, under the additional assumption that the problem includes feasibility sets that are compact. Bounds are expressed in terms of the cumulative error and the path length of the optimal solutions. This suggests how to allocate resources to strike a balance between performance and precision in the gradient computation and in the proximal operator.

*Maddela Avinash; Department of Electrical, Computer, and Energy Engineering; University of Colorado Boulder*

**Semidefinite Relaxation technique to solve Optimal power flow Problem**

I would like to discuss about using the convex relaxation technique to find the optimal solution for cost function of a power distribution system. Conventional optimal power flow problem is a nonconvex problem. Traditional Newton-Rpahson method has a convergence issue when the system reaches its limit. Semi definite programming makes an approximation to the power flow constraints by increasing the boundaries of the feasible set to make it a convex problem. This convex problem can therefore be solved to minimize the total cost of generation, transmission and distribution of Electric Power.

*Ayoub Ghriss, Department of Computer Science, University of Colorado Boulder*

**Hierarchical Deep Reinforcement Learning through Mutual Information Maximization**

As it’s the case of the human learning, biological organisms can master tasks from extremely small samples. This suggests that acquiring new skills is done in a hierarchical fashion starting with simpler tasks that allow the abstraction of newly seen samples. While reinforcement learning is rooted in Neuroscience and Psychology, Hierarchical Reinforcement Learning (HRL) was developed in the machine learning field by adding the abstraction of either the states or the actions. Temporally abstract actions, our main focus, consists of top-level/manager policy and a set of temporally extended policies (low-level/workers). At each step, a policy from this set is picked by the manager and continues to run until a set of specified termination states is reached. The decision making in this hierarchy of policies starts by top-level policy that assigns low-level policies to different domains of the state space. These low-level policies operate as any other monolithic policy on the assigned domain. In this talk, we introduce HRL and present our method to learn the hierarchy: we use Information Maximization to learn the top-level policies with on-policy method (Trust Region Policy Optimization) to learn the low-level policies.