 Oh, yes, he's ready. Hey, thank you for introduction. I'll talk about... Stop. Can I start? Sure, please start. Okay, so I'll talk about Lossy Compression of Matrices by Rockbox Optimization. Mix Integer, Non-Linear Programming. I'm Tadashi Kadowaki from Tenso, and this work is a joint work with Mitsura Anbei. So Lossy Matrix Compression, or Metrics. The composition is a task that the target matrix is divided into the product of the matrix, integer matrix, M and real matrix, C. So we want to minimize this the difference between two matrices in terms of M integer and real matrices. So this is Mix Integer, Non-Linear Programming, and it's NP-hard. And application, this matrix, the composition, we demonstrated the image recognition as well as voice recognition. For image recognition, the memory size is one-third, and the execution time is one over thirty-seven, which is good for edge computing. So how to solve this matrix, the composition? The most popular approach is, for example, it's already demonstrated by Yoon, and his colleague in actually last year, they optimized the real variables first after that, then optimized integer bytes and repeat these until it converges. And one more also demonstrated another approach, which is one-rank approximation, the outer product of two vectors, make a matrix, but it's poor approximation, so we can repeat this one-rank approximation to the integer of the matrix. Again, to have good approximation. Here in this presentation, we propose another approach by data-driven approach, and if we generate first the relationship between matrix and cost, and if we have a dataset, we model the relationship of the data. And once we have a model, we can optimize that model. And the optimization finds new candidate and we can repeat this, and the dataset is growing, and we have better modeling and we have better optimization. So the first step we want to remove continuous barrier, and that means we translate, convert the mixed integer programming to integer programming. So if the M is fixed, C, the matrix C is calculated like this, and instituting this relationship, we have P of M. So now we have the optimization of M with this non-linear integer programming. So again, this is what we want to do. However, the cost function here is very complicated. So we forget it. I mean that we take a data-driven approach. So we don't directly optimize this complicated function, but we generate data using this function. Again, we input the matrix M and we have a dataset and we model the relationship of the data matrix M and cost of the data matrix. And we model in the cubo form this case so that we can optimize the model using Ising Solva. So Ising Solva outputs the next candidate to be calculated cost. And repeat again, so we will have a better solution. Okay, so we have a general proposal of black box optimization for binary variables. One is the VOCS proposed by Battista and Procek. And FMQA is proposed by Tai and his colleagues and it's presented in AQC. The difference between two algorithms is the model generation. So VOCS utilize Beijing inference to infer the distribution of cubo. But we don't have the technique to directly optimize the distribution of cubo so we need a specific cubo matrix. So we sample from this distribution. So it's kind of Thompson sampling and these algorithms randomize the algorithm. For FMQA it utilize factorization machine so it's point estimation and deterministic algorithm. And I'd like to mention that the VOCS QA version of VOCS is also reported in Koshikawa and colleagues in AQC. Okay, so there are several potential variations of these algorithms because VOCS has prior. The original prior is a horseshoe prior vanilla VOCS we consider to another additional prior, normal prior and normal gamma prior. Those have hyperparameters so I optimized it before. And FMQA it has a hyperparameter K associated with the dimension of related vectors. So I applied KQ8 and 12. And we also test random sampling. For easing solvers in addition to the conventional Schmittet annealing as well as quantum annealing we test the Schmittet quenching which is the quenching the temperature to zero immediately. Before moving to the results I'd like to share energy landscape of this problem. So the product and the so in variant and the permutation columns like this, the order is changed or sub-signs expressed in the cartoon. So there are in this case 48 4K or 3 these 48 solution is spread here and this is a clustering analysis of those solutions. And I cut here into the four groups which we used in the right analysis and as this is degenerated the data augmentation can be applied. So this is a result of the algorithm comparison. So among those algorithms you see first this is iteration of step and this is the best result error obtained during the iteration. So you see the vanilla B O C S and normal flyer B O C S so the best performance. So I tested 10 random instances and summarize in the table. And you see the end B O C S which is the best success rate is 36% for the solution accuracy and for execution time it also fast compared to other algorithms. Okay now the data augmentation. You see the data augmentation does not improve the results because the QA data fitting QA model can't approximate the cost function globally that causes this poor performance. And next the comparison among using solvers you see a significant difference among using solvers including quenching the table again. So I add this part so you see a tiny link with flyer B O C S and QA version SQ version and you see the best performance among all algorithms I think there's no significant difference between 3D algorithms. However you see the execution time QA takes a long time because QPU time itself is very short but you see the very large overhead as Catherine explained in the first day. Okay the final result is that so I visualize the relationship between the balance between exploration and exploitation among four groups. So in the random sampling there are four colored lines almost identical that means it's completely biased towards exploration a similar figure we see in the data augmentation algorithm so that means it's the poor performance and vanilla B O C S also tend to be exploit. Okay other side so for FMQA this is extreme case the early period it biased towards exploitation so it just focusing on very small area of the solution space in the early time so it biased towards exploration and the best 3D performance algorithm it's well balanced between exploration and exploitation. Okay summarize the results of the the mixed integer non-linear program is transferred into the non-linear integer program program and solved by binary B O and the normal prior shows the best performance. We have some insights for further development the first one is the data to make the dataset cover globally in such space however we simplify the fitting model it can't approximate the cost function globally and the second one is that there's no significant difference using solvers this is non-trivial because SQ is greedy optimization the surrogate model might be easy compared to the express form of the original cost function visualization of the balance between exploration and exploitation help our understanding. Finally the advantage of quantum annealing in the binary black box of my decision that's it thank you for your attention questions and comments thanks for the talk I'm a bit confused on why the documentation makes the result worse I mean you are kind of encoding the symmetries so if it doesn't help I would have expected not to change the result but not making it worse could you comment on that yes that is related to this and that is my thought at this time there's no evidence but the strengths of the black box optimization with relatively simple modeling is that we first explored the large space solution space without bias but if we get some biased data sampling and it's around global minimum basin we should focus on the very small area and intend to get samples and we get fine tuned cost function representation on this area and we will get good results but it's very difficult to fit with Cuba very limited to explore the function to fit global with many global minimum that type of structure cost function it's very difficult that's why in this case the documentation doesn't work well does it make sense so is it like those states are separated by I don't know they are far away or separated by barriers or something I still don't get it yeah that's about I don't study well but the structure the reason why we have many degeneracy is like you see these vectors constructing one matrix and if you change the order of this case the red and blue change and we also change the order of red and blue here that is equivalent so this causes a very complicated landscape cost function and I have no idea how hard the cost landscape looks like but that is a future study I should contact thanks I have a question did you so you you transform the original program to the another program and how what is the accuracy of the solution obtained by new formulation okay so again the tubal data fitting is not perfect but in this case this is the residual error that means this line is the second best so below this line this is among the best but if the curve touch this line that means some of the runs it's the exact solution so that means this approach is not but it's pretty good approach I think okay thank you are the questions if not let's thank you