 are from Toku University, Japan, and there I'm gonna talk about with the title, Non-Native Binary Matrix Factorization by Continuous Reactivation and Reversed Annuary. And here is the outline for our talk, and we focus on reverse annealing. Reverse annealing is a method for quantum annealing and it is specialized for local search using initial solution. Wait a moment, please. Okay, then matrix factorization is one of the suitable application for reverse annealing, but it is known its performance is not reaching exact methods. So here we aim to improve reverse annealing performance and additionally we experimentally analyze the factors to get good initial estimations for reverse annealing. And here is our method. Our method is based on solution estimation by continuous using continuous relaxation for reverse annealing. We validate our methods for two data sets. What is for facial data sets? And for this data set we obtained improvement for in learning performance and we found these results are strongly related to the initial estimation quality. And to investigate furthermore, we tested for random matrix. And for this data set we found the estimation accuracy did not depend on problem size, but this accuracy varies depending on the distribution of reverse solution. So let's get started. Non-negative matrix factorization is a technique to approximate given matrix with a product of two small matrices. Mathematical expression is like this and we have V as the input and we calculate W and H and each elements of W and H must be non-negative elements. And matrix factorization is classified into unsupervised machine learning methods and it is used to learn features from data matrix. Here's a example of matrix factorization. Here's the original image data and expressed as matrix V. And from this data we calculate features and its weight. And as a result, as you can see reconstruction, we can approximate the original data with the small two matrices. And matrix factorization has wide application such as image processing or audio processing. But in this study we consider matrix factorization with binary constraints. With the binary constraints, these weights for features only takes binary variables, which is zero or one. And in this constraints, that data is reconstructed with a smaller number of non-zero elements in H comparing to normal matrix factorization. One algorithm for matrix factorization alternating these two square methods for matrix factorization is known to be effective. Matrix factorization can be translated into optimization program. As you can see this minimization function. And because this W is continuous variable but and H is binary variable so we cannot optimize in a time. So we decompose this optimization program. First, randomly initialize H elements and with fixed H we optimize in terms of W. This is a continuous optimization. And from these results we optimize in terms of H. This is a binary optimization. And this optimization problem in terms of H can be decomposed several optimization program as you can see here. And this optimization corresponds to the column of H and we can solve this problem independently. However, the sub-problem as you can see here these problems contains a well known and be hard problem that is subsets some problem. So there is no polynomial exact algorithm for solving this problem. So first heuristics algorithm are effective to solve this problem approximately. So from these algorithm we are interested in quantum annealing. Quantum annealing is meta heuristics for binary optimization problem. And we have original problem and we converted it to Ising model and by doing annealing on this model we have solutions of original problem. And during this annealing process the system exploits quantum fluctuation to efficiently search in solution space. The model of Ising, a formulation of Ising model can be seen here and each variables have two, each variable takes minus one or plus one. And by adding the term or four adding quantum fluctuation and slowly decreasing the effect of quantum effect finally we can get ground state of original problem. And reverse annealing is one of the methods for quantum annealing. Reverse annealing is specialized in local search in the vicinity of given initial solution. And reverse annealing can be used for local refinement of known solutions obtained by heuristics. Or we can use it for utilizing previous solutions in the iterative algorithm. Here's a illustration of reverse annealing. We control annealing parameter S and unlike QA reverse annealing starts with a classical solution and we gradually add quantum effects to meet strengths. And then normally we anneals the system. Then finally we have a better solution in the vicinity of initial solution. Here's a preceding study and reverse annealing is actually utilized for matrix fluctuation. And for initial states for reverse annealing they used the solution from normal quantum annealing or the solution from previous iteration in the alternating algorithm. And such system yields better performance than optimizing quantum annealing alone. But as a challenge is even we use these reverse annealing methods, the optimization performance is not good comparing to exact methods. So then another motivation is recently recent studies claim classical methods such as greedy algorithm is effective for reverse annealing and in typical cases. So here's our aim. Is there any way to improve reverse annealing for this matrix fluctuation? And another aim is to get a better understanding for reverse annealing. This is, which is how can we prepare the effective initial states for reverse annealing. So here's overview of our methods. We use continuous relaxation for initializing reverse annealing. And at first we solve this relaxed problem. Each variable takes continuous variable from zero to one. And this problem needs continuous and convex so we can solve this problem really fast. And to get the feasible solution for binary problem, we just round the result obtained by this continuous problem. And we use this rounded solution as initial solution for reverse annealing. And for this continuous optimization we can employ projected gradient descent method this for bounded continuous optimization. Here's a projected gradient method descent. PGD is a gradient method for optimization with bounded constraints. Here's a general problem here and each variable has bounded constraints. And we use gradient updating and in this updating rule, we use projection to obtain feasible solution. And this algorithm is known to be fast convergence for this alternating least squares method. In matrix factorization. And additionally for each variable X, only thing we have to do is just give upper bound for zero to one and then we can have feasible solution for relaxed problems. Okay, this is, that was our methods and we did experiments to validate our methods. We used facial images to learn features. The image of size is 99, 19 times 19 pixels. And then matrix V, the size of matrix V is like this. And from these matrix, we try to learn features. The number of features are set to 35. To compare our methods, we use different solvers. PGD is a reactivation of heuristics, just be introduced. And to know how optimized, you know, how correctly optimized our method, we use exact method. And for quantum and reverse annealing, we employ de-waves quantum annealer. And you can see settings for there's views parameter. Then here's the results of error convergence. And x-axis is a iteration number and y-axis corresponds to error. The small is better. And as you can see, for annealing is, means standard quantum annealing. And the orange line is a result from for annealing, but it's not good. And even if we do reverse annealing from this for annealing, it's not good comparing to other methods. The pink magenta line is the result of reverse annealing utilizing the initial solution obtained by previous iteration. But it has slow convergence. The green line is our reactivation heuristics. And it has a good convergence, but it's not reaching the exact methods line. From reactivation estimation, we use reverse annealing and this method get good result, which is comparable performance to exact methods. So to investigate why this reactivation and reverse annealing performance better. We observed humming distance from optimal solution expressed as this. Humming distance zero corresponds to the solution is optimal. And these figures correspond to each iteration. And you can see for first iteration, quantum annealing and reverse annealing methods is not close to optimal in terms of humming distance, but reactivation methods are already close to optimal in terms of humming distance. And then if we do reverse annealing from these results, the results force to move optimal and the amount of optimal solutions increased. And as you can see, if we iterate this algorithm, the number of optimal solutions is increased as iteration increases. So why this reactivation strategy has good estimation? So we consider there is several factors that influence that affect reverse annealing. Sorry, reverse annealing, not reactivation strategy. The one is the dependency. You can see this figure and we solved relaxed problem. And here's a distribution of each elements. And as you can see, there are many elements that equals to zero or one. And a small number of not equals to zero or one. So the question is, do this distribution of react solution values affect the accuracy of estimation? There's one question. And the other factor is a problem size dependency. The generally hardness of matrix factorization is related to problem size. So this, do the problem size affects the estimation accuracy? That is one question. So we investigate the problem dependencies in terms of these factors and we analyze the quality of relaxation strategy for randomized matrix. This is an explanation about the data generation. We intentionally change the distribution of react solution with using role as a parameter. At first, we generate H and W as an oracle and we generate V. And from this V, we estimate H value. And this distribution with a parameter role, you can see this figure and with small role, there are many zero or one values as a react solution. And for bigger role, there is a few number of elements that is not zero or one. So if there is a many zero or one, it seems the problem is easy. And if the number of zero or one is few, the problem seems to be hard. So here is a result of estimation accuracy of our relaxation methods. As you can see, the y-axis corresponds to approximation ratio, which defined as this. And as you can see, smaller role, which is with many zero or one values, the relaxation methods can estimate the solution accurately. However, for bigger role, I mean, if there is few number of zero or one values in a react solution, the accuracy of relaxation heuristics is not tends to be worse. So let me conclude my presentation. We introduced a relaxation strategy to matrix factorization and we utilize the estimated solution to initialize real sunning. And for experiments in learning facial images, we confirmed improved performance by real sunning that is almost equivalent to exact optimization. And these results was associated with the closeness of initial states to optimality and our relaxation methods lead high accuracy for facial images. And, but for randomized matrix, we found a deterioration in estimation for programs with few number of zero or one values in the react solution. And here's our future work. And one is for the problem that the relaxation has bad estimation. I mean, there are few number of zero and one values. In the cases we interested in does reverse sunning still performs better than quantum annealing? And what I'm only interested in statistical statistic analysis for general performance of reverse sunning in matrix factorization. Thank you for your attention. Okay, thank you very much. Two questions. I have a question. It looks like on your slide that your relaxed function f of x, you're assuming that it's perhaps locally linear. Are you assuming that it's linear in order to get this relaxed solution to perform well, the relaxed computation? And how important is that assumption about the form of f of x there, back one? So your question is how this relaxed solution is important. If you look at the next slide, slide 10. Okay. That's the f of x at the top there that I was thinking about. And the diagram looks like you're assuming that it's locally linear. And I'm wondering how important the linearity of x, the assumption of linearity of x is to the quality of solutions you're gonna get from a gradient descent type algorithm. Okay. Sorry, please repeat your question. Oh, sure. Is it important? Is the shape of f of x important to the success of this heuristic? Yes, exactly. We have quadratic function for these effects. And still with the cases, PGD has good performance for our program. Thank you. Okay, other questions? Thanks for the nice talk. I was a bit confused on what you meant by iteration. I suppose that for normal quantum annealing is like the number of annealing steps, but then you also have a learning here. So there's a time step for you to learn your parameters and then you do annealing. I was a bit confused. So your question is this optimization is how? Slice 12, I think. Slice 12? Yes, so. I suppose here that your PGD has two timescale, kind of one with the iteration where you learn, you do kind of supervised learning and then you have also where you do annealing. Yeah. So this PGD is the case not using reverse annealing, just using the estimation for optimization results. And this PGD and RA correspond to if we use these results for reverse annealing. And this is. So if you only do PGD, iteration means gradient descent steps? Sorry. So if you only use PGD, are you just doing gradient descent? Are you just minimizing your? Yeah, for the continuous matrix factorization, you are right, but we are now considering a binary problem. So maybe this is a bit different from normal gradient descent for this problem. Okay, I actually read that. Okay. Okay, I think we have to move forward. So let's thank our speaker again.