セミナリティクリサンプリングの仕事です。トモユキオブチとタカハシの組み合わせです。バックランドのモチベーションを始めましょう。ラッフリースピーキングはマシンラインプローブレムがフォローされています。データセットで、インプルとアウトペアの数のインプルとアンペアのタスクレーズをパラメータに移動し、データセットでオプテマリーに適用されています。このセットで2つのディスクラッションの問題です。1.アルゴリズムのデベルトメントです。2.アルゴリズムのデベルトメントです。3.アルゴリズムのデベルトメントです。このデータで、バックプロバゲーションやフォローラティクリサンプリングのコンベクスリアクションのデータのリアクションのコンベクスリアクションにアトロンマシンラインプペアのコンベクスリアクションのコンベクスリアクションのコンベクスリアクションにフォローラティクリサンプリングをBG Debatteを呼ばせね。このデータのコンベクスリアクションは彼らがとてもインプルプロバゲーションをこのデータでコンビデアと多いなと思います。同時に、2回目は、このエヴァリュエーションには多くのプロポーゼルがあります。まずはクロスバリデーション、インフォメーションクライテリアン、エンピカルベース、リサンプリング、これをここにフォーカスしています。フォーマーを比べると、去年のディケイドでは、実はこれが重要ですが、最初のディケイドでは、私はそれを読めることができしませんが、すぐに読むことができますし、私が信じることはできません。なので、ディサンプリング、これが hour bootstrapのこのディケイドです。1981年符読は、赤いネメリカルのメモを求めたリランムーヌードのリアルプラメタの自動注文。そのため、 single datasetでエラバーを定義できることはできる外覧のブトスラップのリサーンプレイはとても簡単ですデッセットのDMを見ると、このギリギリで、エンヴィリカルディスタビューションを作ることができます just history.and but Rach's graph just repeat this process. So evaluate parameter distribution by repeating the following stuff many times, just repeat many times. So first one is the generation of the bootstrap sample. So from the given dataset, we resample the data of the set, mostly the same size.And for the generated data we land the parameter.Just it.これは多くの準備をしていますこのクラムを見ることができるのでこの理解を持っていますこのバッグポイントの理解を順番に必要だと思いますそのため良いポイントはスターブにある問題など良いポイント自体は原宿性化簡単なポイントですねそのため、データセットやランクアルゴリズムを使用することは、アイデアサンプションが受け入れます。そうすると、それを行うことができます。アルゴリズムの簡単な技術は、リサンプリングとラーニングを再現することを再現します。とても簡単です。でも、当然、バットポイントがあります。まずは、コンピュータションのバットポイントです。そのため、リサンプリングが多くのラーニングを再現することは、コンピュータションが多くのラーニングやインファレンスがユメリカリを再現することを再現します。次に、サポートの適切な技術の適切な技術を再現します。なので、サポートの適切な技術によって、全実的に知れなめらかにご確認しました。ただ、全実的に知らない技術の適切な技術を再現しています。今回はリサンプルトラッシュ時に、COITの勉強技術をを再現しています。でも、確実な技術と実際の交換技術の実際、一般的な技術の少しは不満点な技術を重ねます。そのため、この技術の適切な技術を再現するのはこの問題は常に大きな問題があると思うしかしここで最初の問題を探していますコンピューティションを減らせることはできませんこのデータを減らせるためにこのデータを減らせるために大きなバイアスを減らせますこの話の目標は一部のモデルです一部のモデルを減らせるためにアプロクシュメートアルゴリズムをスタッティスティックを減らせるために実際にリサンプリングを減らせるためにレプリカメソドとメッセージパッシングタイプアルゴリズムを減らせるためにレプリカメソドを減らせるためにマルザーン&オッパー2003のリサンプリングを最初にマルザーン&オッパー2003のリサンプリングを熊田裁券を切るために2階のリサンプリングを減らせるために工事波の数を減らせるために本規定の2階のリサンプリングを使用するために2階のリサンプリングを減らせるためにオッパー裁券を減らせるためにオッパーロンハーチ ラソンについてマルザーン&オッパー2003のリサンプリングを減らせるためにこのような状況は、YとXのデータセットに関して数字の数字はMと数字の数字の数字の数字はNとYは何かの目的な数字ですこのような状況は、DとPとXとYに関してDに関してPとPとPとPとPとPとPとPとPとPとPとPとPと全ての自己でだと要するにやっぱり違いはプラクティカルなデータの目的な数字が異常な分析のどのような概要についてでたらなければならない現在は、料理について約25年の前がティブシーラーニーのプロフェーサーのティブシーラーニーはこの問題により良いプロクレイメーションを提供していますL1ペナルティを使用しています次のコストファンションは基本的にはL1ペナルティとL1ペナルティを使用していますこのコンビックスプロフェーサーはとても簡単ですそしてこのコンビックスプロフェーサーをプロフェーサーを使用していますその場合はコンビックスプロフェーサーを使用していますそしてこのコンビックスプロフェーサーの少しL1ペナルティの意味ですそしてここではラストに適切な パラメータの区別の他様ですそしてその場合はラストはこれはつまり、彼らのコリレーションは非常に少ないです。ラソウは実際にバリアブルを選択することができます。しかし、このコリレーションは、エクスプラネートリバリアブルは無いのです。このコリレーションは、このコリレーションのプロファイルについて、バーサスの力のレグラリーションのパラメートについて、このコリレーションのパースです。このコリレーションのパースは、不穏な behaviorとして、バリアブルを選択することができます。このコリレーションは、マイシャンセンやブルーマンは10年前です。このコリレーションのプロファイルについて、リボフラビンデータのプロダクションのリボフラビンデータです。このコリレーションは、データの数は100です。エクスプラネートリバリアブルは4000です。このコリレーションは6バリアブルの レイトです。コリレーションのパラメートにより強く コリレーションについて、このテーマのモノトニック 하고は、このクリームのパラメートについて、それから、それからそのバリアブルを選択することができます。このコリレーションのパラメートについて、ラソはそのままの素晴らしいものではありませんそうですこの方法は、バラブスの選択を使用するために良いメソードを持っていますリサンプリングを使用するためにこのレースはションレイドですラソのリサンプリングを使用するためにこのレースはシューレスレースを使用するためにラソのリサンプリングを使用するためにマスケマティカルのディスクリスションを使用しますこのディスクリスションのリプレゼントをお聞かせしますこのディスクリスションのリプレゼントをお聞かせしますボツスローフ・サンプルでのディズニアデータを生成させるのです鑑賞のデータはありませんそのデータとの取り組みをしていますこのデータを改正するために株式に作られたデータでこのディスクリスションの場合作られたデータでのデータに作られたデータでのデータを作られますx2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 y2 this one is a sample twice, okay?x3 y3 this pair is sampled once, x4 y4 sampled once, but this is not sampled, okay?In this case we can characterize or specify this sample by the number of being sampled, okay?In this case this and this vector, counting vector, 0211 is equivalent, okay?So we use this notation for representing the resampled data, okay?Then we can show that the distribution of the resampled data can be represented by theマルチノミアル distribution, like this form, right?This is first shown by the Marzano opa, the 2003, and for large, when the mb or the m, okay, the number of the site of the data is large, this can be well approximated by the independent person distribution, right?So the distribution of the resampled data is easy to handle, actually.It is just a product of the independent person distribution, okay?Then we can formally link the boost of average and replica method by this formula, okay?So we want to, okay, for the given resampled data c, this c special resampled, each resampled data, then the error function for the lasso is given by this form, okay?Then we can introduce the partition function here, right?And in the beta infinity limit we can get the lasso solution, okay?So we want to take an average with respect to the resampling of the outside, from the outside.For this, we use replica formula like this.For fixed d, okay, this is given data, we valid the boost of average of the free energy using the replica method in conjunction with, I mean, field approximation, okay?So this is just a replica formula, okay?So if we can do this computation for fixed d, we can evaluate the boost of average.But of course, for given d, computing the partition function is difficult.So resolving this difficulty, we use certain kind of refilled approximation, okay?This is the idea.So for the first one is to use belief propagation, so cavity method, okay?So first idea is to resolve the difficulty of computing the partition function by the cavity method.For this, let, okay, this is distribution, conditioned by the boost resampled data, a single resampled data, okay?Then we can draw this kind of bipartite graph. This is the beginning of the belief propagation, okay?So then for given this graph, we can perform a belief propagation.But without doing this, we do the energy replicate system and take the boost of average in this stage, right?But the replication does not change the structure of the graph.So the graph is combated to this form, okay?And the factor here is easily computed by, or analytically computed by the independent force on distribution, right, in this form.Then we can construct the belief propagation algorithm for the replicated graph, okay?So this is computing the cavity bias and computing the cavity distribution and after completing this and having the convergent solution, we can get the belief, marginals, okay, for finite n.But we have to take n0 limit.So how can we do that?For this, we introduce a replication metric and that's, okay?This, in this case, we can state as follows, parameterize the message in the following form employing the replication metric and that's, or the exchangeability of the system and central remix theory.Then we can limit the functional form of the message to this form.We're using three parameters, A, B, C, right?Then this has a form that's functionalized with replica.So we can derive, we can analytically continue the expression to the real n and take n0 limit, okay?This is the afterline of the algorithm follows, okay?Then taking n0 limit and beta infinity, we can get algorithm to compute the three parameter A, B, C for each component, AI, AI for each entries, okay?Okay.This is the first result of our talk, right?Okay, validation by the sensitive data.So to justify the obtained algorithm, we checked the result between our derived algorithm and direct simulation of the resumption.This is for the first moment, this is for the second moment and this is done by the n equal1,000 and the data size is 5,500 and the number of non-zeros in true signals is 200, okay?K noise and lambda here.And you can see that both cases, for both first and second moments, we can get very accurateアプリクシメイト for the bootstrap statistics, okay?Okay.Then for what purpose do we use this technique, okay?The second part is the application of the developed algorithm.So this is known to the stability selection.So we examine the practical usefulness of the developed algorithm by the application to a new, recently proposed variable selection method for LASSO known as stability selection.This is proposed about 10 years ago by Einstein and Buhlman.So what is a stability selection?Okay.This is a robust variable selection method based on LASSO and resampling.Basic ideas are also simple, okay?We combine the bootstrap and randomization of the L1 strings, lambda.And this provides distribution of estimators.Then for each variable, we can evaluate the probability of this is active, the component is non-zero or not, okay?By doing many resampling, we can evaluate the histogram of the zero or non-zero for each component.And if this pii, so active probability is sufficiently large, we judge this component is non-zero.So select it as a relevant parameter, and otherwise we discard it, okay?There are several parameters, three parameters for this method.First one is that how to randomize the lambda.So we introduce a dependence of lambda for each component, right?So normally in the proposal, this is this form.So this is original strength lambda, okay?And by probability p, we change it lambda over a, and often a equal to one-half is used.The other parameter is that, okay, the size of the bootstrap sample.It is also, this parameter one-half of the original size is used, okay?Then this is from their papers, and this is the synthetic experiment, right?Actually, there are 200 explanatory variables, and blocks are irrelevant ones.So not has an influence for the output, the X observed variable, and these are all, okay, this.And blue also irrelevant, so not has an influence to the output, but has some correlation for the relevant variable, red one, okay?So in the case of that, we do not introduce the randomization of lambda. This has some peculiar behavior in the solution passes, stability passes. So this is plotted by the active probability, right?So although this is irrelevant variable, the blue one, so it has a very high probability around here.So it is difficult to distinguish this is irrelevant or irrelevant, but introducing the randomization of lambda, the curve clearly separated from the red one. So we can clearly distinguish relevant one and irrelevant one.So this makes the lasso estimation more robust than the naive one.So good, but computationally demanding to this method.Actually evaluating accurately the active probability needs a lot of resampling, usually more than 1000.So the naive doing this requires 1000 times of that of naive lasso.So we want to reduce this computational cost by using our approximation of algorithm.So we, yes, based on our algorithm, we developed the resam, semi-analytic resampling methodfor the stability selection like this.And check its usefulness by the new mega experiment.The first one is the synthetic data. We assume that this model, so just linear model with this, right?And we set the stability section parameter like the proposals and compared semi-analytic resampling with our method and direct resampling.Our method is coded by the MATLAB and not optimized.Numerical direct simulation is GLMNet, this is an optimized one, and the stability path, we use the 1000 time resampling for the direct simulation.Then we can get it.The first one is the comparison of the comparison with the active probability between the approximate algorithm and the resampling.Actually there are two data approaches here, but almost overlapped.We cannot distinguish the two.So this means that the approximate algorithm is actually very accurate estimate for this synthetic data at least.And you can show that the two curves, this is relevant entries, this is four or irrelevant data set.And you can see that these two groups are clearly separated.So we can accurately select the relevant variables from this type of plot.And this panel shows the actual computational time.And this, yes, as the list shows that four, the direct simulation.And the other is semi-analytic one, and this straight line is slope two.Actually the Lasso method requires all the probabilities in square.But using the model, they have some trick.Still, it has lower computational cost, but probably if n is very large, this provides this slope too.But then still we have a lot of gain in the computational cost, we can save.The cost of computation by the approximate algorithm.The second validation is by the real data.We took the wine quality data set from this repository.This is composed by the objective value of 10 grades evaluation for white wine states.This is given by the blind check by a professional sommelier.And the size is quite large, about 5,000 only white wine.And the explanatory variables are chemical constituents such as density, acid, sugar, something.Right?This is the result, the active probability of the 11 entries.And some could denote our approximate algorithm and as a result of the sampling.You can see that this same color shows the same component.You can see that all the cases.We can classify the entries into two classes by a certain criterion.But both cases, you can show that very good agreement with this.And some of it has discrepancies.But totally fairly good agreement with between the approximate algorithm and directory sampling.So, we can say that the developed approximate algorithm exhibits fairly good approximation accuracy when correlations between the explanatory variables are not so strong.But unfortunately, still it can fail in providing accurate estimates of bootstrap averages when correlations of explanatory variables are not negligible.So, we actually applied our methods, algorithm for the data of the riboflamma, of the original paper.Then we found that accuracy is not so good or bad.This is the first moment, second moment, and active probability.So, if our algorithm accurately evaluates the directory sampling, we should have the straight line here.But it has a very large outliers and also it has outliers and scattered, so not so good.So, we extend our methods to more accurate algorithm.So, for handling the correlated cases, we extends the replica-based algorithm to expectation propagation, or the vector-approximated message-passing bump.And this is because these are empirically known to offer more accurate estimates than BP or MPand the correlations are rather strong.In the rest of the time, we introduced its outline.So, what is the expectation propagation, or bump, vector-approximated message-passing?This is originally proposed by the MINCA, 2001.Lovely speaking, this is a combination of the belief propagation and the approximation by the exponential family.And the exponential in most of the case, Gaussian.And the empirically is accurate inference, even when the couplings are statistically correlated.Let's explain by a simple example of the Ising spin.Normally, if you have this model, you draw this type of bipartite graph.But we have another choice, this choice.In this expression, we hand the bunch of the factors, we denote the bunch of the factors by a single node.Here, the correction of these are in this figure, just represented by a single node.And also, the bunch of these factors, factorized factors are denoted by a single node.And the correction of the variables here are represented by the single node.So, we can use another expression of the graph.So, in the expectation propagation, we use this graph without, instead of this one.The good point of this graph is that if we employed BP, this graph is exact result.It is trivial because this graph represents exact calculation.So, it is trivial that it is exact result.But unfortunately, this is computational heart since we have the interaction.And the non-trivial distribution, non-linear distribution, even if this is factorized.But the computational difficulty can be resolved by a factorized Gaussian approximation.This is the idea.So, the computational difficulty is resolved by the Gaussian approximation.So, to do this, we introduce three types of approximation.This is the original form.And here, we replace this factor by the factorized approximation.And the second approximation, we replace the coupling factors by the factorized Gaussians.A third one is combining these two.Okay?Then what we should do is how to determine the parameters lambda and gamma for each approximation.For this, we impose moment matching requirement.So, three approximations, we can get expression of the first and second moments and there should be equal.So, we import this requirement for this.And this provides the condition to determine these parameters for the approximate message.Then we can get the expression propagation for the simplified system like this.So, just repeat one, two, four, many times until convergence.Right?Then, after having the convergence solution, we can get the first moment like this.Basically, we apply the algorithm for the generalized rhythm model, but there is a remaining problem.Okay?For employing expectation propagation, the target distribution should be of the formof exponential of some quadratic form times factorized distribution.The target distribution should be of the form of this.But unfortunately, original GLM is not of this form.Okay?This is, okay, this is like this.So, not exponential of quadratic, but we can convert the expression for this form by introducing the delta function and expression of the Fourier by the Fourier transform.Okay?Then we finally get the form of the exponential quadratic or this is bilinear form, but the bilinear form special case of the quadratic form.So, no problem.That's factorized.It's firstly introduced the technique by Oppa and Wiener 2001.Then we can introduce the EP or vector approximate message passing for this GLM.So, we first draw this bar-by-tatograph under replication plus average with respect to the sum, pring, and randomization of lambda.So, and take the limit n0 and beta infinity, finally we get this complicated algorithm.Okay?But, principally like this.Okay?Check.So, to validate the utility of our algorithm, we apply our method for the synseq data and real data.The synseq data is a random selection of the discrete cosine transform.Okay?It is a problem of the complex sensing.And the other one is the Devoflaming data from the Mannstein-Bruhman.Then you can see that both cases,Bruhman, the synseq data by changing the number of theデーサイズ.And the red one is the Devoflaming for the real data.So, both cases, you can see the exponential decay.So, very fast convergence in the iteration.Okay?The second one is that this panel shows the comparison between the result of the approximate algorithm and directly sampling.And top is for the synseq data and bottom, the Devoflaming data, real data.So, you can see that for the first moment, second moment, active probability, all the cases for both of the synseq data and the real data, we have a very good, considerably good approximation accuracy.Yes.So, I think, so we think that this is promising approach for the real data, not only forthe synseq data, artificial data.So, summary.So, in this talk, we develop semi-analytic resampling algorithm for generalizing the real model based on the message patch, two types of message-passing algorithm.First one is the belief propagation, and second one is the expectation propagation.And both are combined with the replica method.And the good point, bad point.The good point of the first one is the low computational cost.It is work by order mn, the size, time, the number of explanatory variables per update.So, first.But not necessarily calculate when the explanatory values are strongly correlated.So, these of these drawbacks, we extend the method for the EP.So, a good point is that empirically, a considerably good approximation accuracy, even for the real data.But also, we have a bad point, drawback, high computational cost.It requires, in general, order n-cube of updates.So, maybe depending on the situation, we should select either of them.Okay.And we showed a usefulness of our algorithm for the application to the stability selection method.And this provides a stable value of selection method based on resampling.And we have a future study that application to various real data analysis.Okay.Thank you for listening.This is the reference of our work.