Geometric Pooling: A User’s Guide

Much of our information comes to us indirectly, in the form of conclusions others have drawn from evidence they gathered. When we hear these conclusions, how can we modify our own opinions so as to gain the beneﬁt of their evidence? In this paper we study the method known as geometric pooling. We consider two arguments in its favour, raising several objections to one, and proposing an amendment to the other.

Some of your evidence about the world you gather yourself, but much is gathered by others.Sometimes you obtain the second sort of evidence directly, when one of your fellows describes it explicitly.But often you learn only the effect it has had on their opinions.For instance, you might learn your doctor's view about what is causing your symptoms without learning all the background knowledge and detailed test data that underpins it.Or you might learn your fellow researcher's new probabilities for the hypotheses you're both investigating, rather than the data they've just collected, and on which they've just updated their probabilities.
When you encounter someone who has gathered their own evidence, and you learn not the evidence itself but only the opinions they now have, how should you update your own opinions?The Bayesian says you should treat such second-order evidence just like you treat any other evidence, and update using Bayes' rule; and they have many arguments in favour of this prescription. 1But sometimes you can't do that.After all, in this situation, Bayes' rule requires you to have prior probabilities in different hypotheses concerning the opinions of the other person, and likelihoods given those hypotheses; and you might simply not have set these.So we seek an alternative method.
In this paper, we consider a particular proposal: you should combine your probabilities with your fellow's using a method known as geometric pooling.We begin in Section 2 by raising several objections to a recent argument in favour of geometric pooling due to Baccelli and Stewart (2023).
Then we turn, in Sections 3 and 4, to what we take to be better arguments in its favour.Finally, in Section 5, we consider a further argument in favour of geometric pooling that appeals to its utility in social settings, and we use our results from earlier in the paper to suggest an amendment.

The framework
Throughout, we'll assume that the opinions of you and your fellow concern different ways the world might be, and different hypotheses about the objective chance of its turning out each of these ways.This is an extremely common situation, both in the context of scientific research and in our everyday lives.For instance, the hypotheses might concern the value of a particular parameter in a scientific theory, such as the basic reproduction number (R value) for an infectious disease, while the states of the world might be distinguished by some observable feature, such as the pattern of illness in a given population.Each hypothesis about the R value tells you the chance of observing a given pattern of illness.
For ease of exposition, we will assume these hypotheses concern the objective chance of a particular coin landing heads, which we'll call its bias.And the states of the world are distinguished by different sequences of heads and tails that might result when the coin is tossed repeatedly.We'll assume the hypotheses form a partition, call it H; and the states of the world form another, call it S.You and your fellow each assign probabilities over H and over S, and the Principal Principle (Lewis, 1980) says how they should relate: your probability for a sequence, conditional on a hypothesis about the chances, should be whatever probability that hypothesis assigns to that sequence. 2ow suppose you observe some tosses of the coin, your fellow observes some different tosses, and you each update your opinions in the light of this new evidence.You meet and your fellow tells you their new probabilities.How should you incorporate that information?It turns out it depends on which of their probabilities they share.Here are three sets of probabilities you might learn from your fellow: (1) their probabilities in each of the possible sequences of tosses; (2) their probabilities in each of the chance hypotheses; (3) their probability that the next toss will land heads.We'll take these in turn.
But first, let's survey two ways you might incorporate the information about your fellow's probabilities.The linear or arithmetic pool of probability functions P and Q splits the difference between the credences that each assigns.So the pooled probability in A is If P and Q are both probability functions, so is their arithmetic pool.
Where the arithmetic pool of P and Q appeals to the arithmetic mean of the credences they assign, their geometric pool appeals to the geometric mean.The arithmetic mean of p and q is (p + q)/2; their geometric mean is √ pq.However, we cannot take the pooled probability in A to be P(A)Q(A) because doing so does not always deliver a probability function.Instead, we must use the normalized geometric mean rather than the geometric mean itself.But that means we can define the geometric pool of P and Q only over a specified partition.If A = {A 1 , . . ., A n } is a finite partition, then the pooled probability in A i is .
Note that this is defined only if there is In this case, we say that P and Q have an overlapping support in A. (We define geometric pooling over infinite partitions in the Appendix.)3

Pooling probabilities of sequences
Now suppose you meet your fellow and they tell you their probabilities for each of the possible sequences of coin tosses.How should you update your probabilities in those sequences?Baccelli and Stewart (2023) argue that geometric pooling is a good strategy in this case.Let's suppose you and your fellow shared the same prior probability function P before you started collecting evidence.The crucial point is that the evidence you have collected since then takes the form of a disjunction of states of the world from S; and similarly for your fellow.If you witnessed heads, tails, then heads, then your evidence is the disjunction of every state of the world in which the sequence of tosses begins in that way.
We write E for your evidence and F for your fellow's.Then Baccelli and Stewart note the following striking fact: Proposition 1.If P(EF) > 0, then, for any state S in S, That is, pooling your posterior probabilities with your fellow's gives the same probabilities over the sequences as updating your shared prior with the aggregate evidence.
What's more, if we use Jeffrey conditionalization to extend the geometric pool over S so that it assigns probabilities also to the chance hypotheses in H, Proposition 1 extends as well. 4ote that arithmetic pooling very much does not boast the feature described in Proposition 1. Suppose the coin will be tossed just two times, so that there are four possible sequences, HH, HT, TH, and HH.And suppose you and your fellow both assign prior probabilities as follows: HH HT TH TT P(−) 1/3 1/6 1/6 1/3 Suppose you observe that the first toss lands heads and your fellow observes that the second toss lands tails.Then your shared prior updated on the aggregate evidence places all probability on HT: But the arithmetic mean of your posteriors does not: However, while the feature described in Proposition 1 distinguishes geometric pooling from arithmetic pooling, the former is not unique in having it, and so it's not clear that the result provides quite the support for geometric pooling that Baccelli and Stewart suggest.Consider harmonic pooling, for instance: if A = {A 1 , . . ., A n } is a partition, and P and Q have overlapping support among A, the harmonic pool of P and Q is defined as .
Then we have the following result, analogous to Proposition 1: 4 In order to apply Jeffrey conditionalization, we need conditional probabilities for each chance hypothesis H i given each sequence S.These will be different for P, P(− | E) and P(− | F), but it doesn't matter which of these we use.If we use P, then Jeffrey conditionalization says that your new probability in where the equality follows from Proposition 1.And the right-hand side here reduces to That is, if you take your prior updated on your evidence and your fellow's prior updated on theirs, and then combine them using geometric pooling, you get the same result as if you had combined your prior and your fellow's using geometric pooling and then updated on the aggregate evidence.But you might only be interested in your fellow's opinion to extract the evidence that informs it, preferring to retain your own prior rather than pool with theirs.
In fact, there are pooling operations that deliver this: This rule ignores everything about your fellow's opinions, except whether they are zero or nonzero.Really, it just uses the zeros to identify their evidence, then conditionalizes your posterior on that evidence.Small wonder, then, that we have the following analogue of Proposition 1: if And indeed this rule will also usually favour your way of interpreting the evidence and ignore your fellow's, which geometric pooling does not.That is, if Q is regular over S, so that Q(S) > 0 for all sequences S in S, and if P(EF) > 0, then This observation points up a third problem with an argument for geometric pooling based on Proposition 1.In order to use geometric pooling this way, you have to know all your fellow's opinions about the sequences.They need to tell you Q(S | F) for every sequence S in S.But if they've communicated all that, and if their prior Q is regular, they've already told you what their evidence F is.So you might as well just conditionalize on F. It's strictly easier to ask what their evidence is and conditionalize on that than it is to get their full set of opinions over the possible sequences and take the geometric pool of those with yours.

Pooling probabilities of chance hypotheses
Next, suppose you meet your fellow and they tell you, not their probability for each possible sequence, but their probability for each chance hypothesis.How should you update your probabilities in those hypotheses then?
We will begin by describing how geometric pooling behaves in this case, extending some observations from the literature.Then, in Section 5, we will use what we've discovered to scrutinise an argument that appeals to geometric pooling's performance in computer simulations.
As before, suppose you and your fellow begin with a common prior.Then you each observe your own series of coin flips, and you each conditionalize on what you've observed.Then your fellow shares their posterior opinions, but this time about the coin's bias.
In Section 2, we saw that if you geometrically pool posterior credences in the different possible sequences, then you can incorporate your fellow's evidence perfectly.But we noted that it's in fact easier for them just to tell you their evidence, than to tell you their credence in each sequence.It's not so difficult, however, for them to share their credences in the different chance hypotheses, particularly if these are given by a standard probability distribution determined by a few parameters, such as a Beta distribution.5So suppose they do share that.What happens when you pool your probabilities with theirs?
We saw that, if you pool your credences in the possible sequences of coin tosses, arithmetic pooling performs poorly; it also performs poorly if you pool your credences in the chance hypotheses. 6To illustrate, suppose you and your fellow both start with a uniform prior over the possible biases.You then observe 70 heads out of 100 flips, while they observe 30 heads out of a different 100 flips.If you then take the arithmetic pool of your two distributions over the possible biases, the result will be the bimodal distribution (pink solid line) in Figure 1.But the desired result is the unimodal distribution (blue dashed line), since that is what conditionalizing your uniform prior on the aggregate data would give.
However, when you use geometric pooling in this case, it performs considerably better.It won't be exactly the same as conditionalizing your shared prior on the aggregate evidence, but it will be similar.The proportions will be right, but some information will be lost: it will be as if the sample sizes were cut in half.To illustrate, suppose again that you observe 70 heads out of 100 flips, while your fellow observes 30 heads out of a separate 100 flips.If you pool your posteriors over the biases geometrically, the result will be the same as if you had conditionalized your shared prior on 50 heads out of 100 flips.Figure 2 illustrates this example in the case where the shared prior was uniform over the possible biases. 7The geometric mean (pink solid line) only approximates conditionalizing on the aggregate evidence (blue dashed line).But it does much better than arithmetic averaging did: compare Figure 1.
The general result being illustrated in Figure 2 is that pooling opinions about the coin's bias geometrically is equivalent to conditionalizing on half the aggregate data.If you observe k heads out of m flips, and your fellow observes l heads out of n flips, then the geometric pool of your posteriors is the same as conditionalizing on (k + l)/2 heads out of (m + n)/2 flips.8Proposition 4. Let X be the number of heads in the first m flips, Y the number of heads in the next n flips, and Z the number of heads in some sequence of (m + n)/2 flips.Where, recall, H is the partition of possible biases, One way to think about this result is that taking the geometric mean over the possible biases correctly gleans the direction the aggregate evidence points in, but with its force or magnitude understated.
What's more, Proposition 4 extends to propositions beyond the chance hypotheses.Once geometric pooling fixes your new probabilities in the possible biases of the coin, the Principal Principle steps in and does the rest, determining your new probabilities in each of the possible sequences in such a way that your new probability in a sequence is the probability you'd assign if you were to conditionalize your prior on the halved aggregate sample. 9nother way to think about Proposition 4 connects to arithmetic pooling, but applied to the next coin toss.Suppose you meet your fellow and they tell you their probability in H, the prediction that the next toss will land heads.How should you update your probability in H?
Suppose P is your prior probability function and E is your evidence, while Q is your fellow's prior and F is their evidence.In many cases, the arithmetic pool of P(H | E) and Q(H | F) will approximate updating your prior on the aggregate evidence using Bayes' rule.After all, if E and F give the frequencies of heads in two large, disjoint samples of equal size, then P(H | E) will closely match the frequency in your sample, and Q(H | F) will closely match the frequency in your fellow's.And in that case, the arithmetic mean (P(H | E) + Q(H | F))/2 will closely match the overall frequency in the aggregated sample you've amassed between you, and P(H | EF) will closely match that too.
Why only "closely" match, why not exactly?Because we have to account for the influence of priors.Your opinion P(H | E) doesn't just reflect the frequency of heads in your sample, it also reflects your beliefs about the coin from before you observed that sample, which are encoded in P. Likewise for your fellow's opinion, Q(H | F).But if the sample is large, and your priors treated the flips as independent and identically distributed, then the observed frequency must be very close to the resulting opinion.In which case the arithmetic mean of opinions will closely match the opinion you would have if you were to conditionalize your prior on the full, aggregated sample.Now, speaking very loosely, Proposition 4 tells us that taking the geometric pool over the biases is the same as taking the arithmetic mean of the underlying, aggregate data.Rather than combining your positive observations with your fellow's, you split the difference: (k + l)/2.This suggests that pooling the probabilities over the chance hypotheses geometrically will have the effect of pooling the probabilities over the predictions arithmetically, at least approximately.In fact, when the two agents' samples are of equal size (m = n), it has exactly this effect for a wide range of priors known as the Beta distributions-which include the uniform distribution.
Proposition 5. Let X, Y, Z, and S be as in Proposition 4, and let H be the event of heads on an unobserved flip.If m = n and P has a Beta distribution over the possible biases in the partition H, then We'll state a general theorem that captures this phenomenon momentarily (see Theorem 6).
Second, what if you and your fellow begin with different priors?Once again, Proposition 4 generalizes in a natural way: the result of geometric pooling is still the same as conditionalizing on the average data, except that the prior being conditionalized is the geometric pool of your prior and your fellow's.Informally, conditionalizing and then pooling has the same effect as pooling and then conditionalizing on the averaged data. 10his generalization is formally elegant, but it won't always be philosophically satisfying.The issue is the same one we raised for Proposition 3: you might only be interested in your fellow's opinion for the data that informs it.In which case, you'll want to conditionalize your own prior, not the geometric pool of your respective priors.
Still, we are often interested in others' opinions for more than just the evidence that informs them.And if you think there's something to your fellow's way of interpreting data, then you might want to incorporate some of that into your own prior.Suppose specifically that, if you were to learn what their prior was, then you would adopt the geometric pool of your respective priors.Then this generalization is just right.It amounts to conditionalizing on what your fellow's prior was, and then conditionalizing on the average data.
Third and finally, what would it take to get the full body of aggregate data, rather than the average data where the sample size is cut in half?It helps here to revisit the definition of geometric pooling, with the radical sign rewritten as an exponent.That is, we write √ x as x 1/2 : A natural thought is that the sample size gets cut in half because of the 1/2 exponent.And this thought suggests a more general definition, where we let the exponent be any proportion 0 ≤ α ≤ 1: For a given choice of α, call the resulting pooling rule multiplicative α pooling.
Our third generalization says that α is the proportion of the aggregate data that gets conditionalized on.In terms of coin flips, if you observe k heads out of m tosses and your fellow observes l heads out of n tosses, then multiplicative α pooling over the biases is the same as conditionalizing on α(k + l) heads out of α(m + n) tosses.That is, where Z is the number of heads in some sequence of α(m + n) tosses.
The answer to our third question then is: set α = 1, rather than 1/2.Then, pooling with your fellow will give the same result as conditionalizing on the full body of aggregated data, theirs and yours together.In Figure 2 for example, the dotted orange and dash-dotted green curves will then combine to give the dashed lavender curve, rather than the solid pink approximation to it. 11hy bother with geometric pooling then?Why would we ever want to set α = 1/2 instead of 1?Because we often want to pool with the same person on more than one occasion, and in that case choosing α = 1 can be disastrous.Every time you pool with your fellow using α = 1, all their data gets counted anew.And this means their data gets double counted the second time you pool with them, triple counted the third time, and so on.
For example, suppose you and your fellow each do one flip, pool, then flip again and pool again.And let's imagine the sequence you observe is HT, while they observe TT.If α = 1, then the second round of pooling will effectively double count the flips from the first round.The final result after two rounds of pooling will be as if you had observed the sequence HTHTTT.So the sample size will actually be inflated, with two non-existent flips; and the frequency will be off too, with 1/3 heads instead of the true 1/4.Further iterations will compound this effect, yielding even more distorted results.
Choosing α = 1/2, however, avoids this problem entirely.The frequency of heads and tails in the total, aggregate data is then always accurately reflected in the results of pooling.After repeated iterations of observing-and-pooling, if 1/3 of the observed flips have been heads, then the agents' posteriors will be as if they had conditionalized on observing a sequence that is 1/3 heads.
So there is a tradeoff here.When pooling just once, α = 1 avoids leaving evidence on the table.But for repeated pooling, only α = 1/2 avoids double-counting.The catch is that the sample size is cut in half.In the next section we'll look at an application where this tradeoff plays a crucial role.
Let's now bring together the three points of this section.First, the result of Proposition 4, which we initially presented in terms of coin tosses, generalizes to die rolls and other events with more than two possible outcomes.Second, it also generalizes elegantly to the case where you and your fellow have different priors-although the philosophical significance of this generalization depends on your appraisal of your fellow's way of interpreting data.Third and finally, the result also generalizes to multiplicative α pooling: pooling over the chance hypotheses amounts to conditionalizing on a similar sample whose size is α of the true sample's size.
The following result incorporates all three generalizations, phrased in terms of rolls of an s-sided die.Theorem 6.Let X = (X 1 , . . ., X s ) be the vector of counts from the first m rolls, Y the vector of counts from the next n rolls, and Z the vector of counts from some sequence of α(m + n) rolls.If P and Q have overlapping supports on the partition of possible biases H, then Informally put, taking the multiplicative α pool of your posterior with your fellow's has the same result as conditionalizing the multiplicative α pool of your respective priors on a sample of α(m + n) rolls, where the observed counts are given by α(k + l).
for responding to social evidence.And it turns out that the best packages, in terms of accuracy, include geometric pooling but not conditionalization.What we're about to see is that Proposition 4 not only explains this result.It also identifies the best possible package: a way of updating on private evidence which, when paired with geometric pooling, is provably optimal with respect to accuracy.But we need to understand the details of these simulations first.
Douven studies several alternatives to conditionalization, inspired in various ways by the method of inference to the best explanation.But for ease of exposition, we will discuss only the one he dubs EXPL. 13The EXPL rule is very similar to Bayes' rule, but with greater emphasis on fit-to-thedata.When updating on a body of evidence E, EXPL adds a bonus quantity c to the hypothesis with the highest likelihood, P(E | H i ).The new probability of hypothesis H i , call it P (H i ), is , where , and c i = 0 otherwise.The value we choose for c determines how much our simulated agents emphasize fit-to-the-data.At the start of each simulation, we'll fix a value of c and keep it constant throughout.But we'll experiment with different values in different simulations, to see what works best.Following Douven, we'll try values from 0 to 1, in 0.1 increments.Notice that, when c = 0, the EXPL formula just is Bayes' theorem, so EXPL has conditionalization as a special case.
In addition to c, two other parameters also need to be chosen at the start of each simulation.One is the actual bias of the coin, p, which is what the agents are trying to discover.Here again we'll experiment with values in the range from 0 to 1 in increments of 0.1.Thus there are 11 hypotheses for our simulated agents to consider: H i is the hypothesis that the true bias is p = i/10, where i ∈ {0, 1, . . ., 10}.
Finally, a third parameter, , controls how "open-minded" agents are.In Douven's simulations, agents only pool with those whose opinions are within a certain distance of their own.The distance between opinions is measured by the sum of absolute differences: Since the maximum possible distance is 2, we will consider values of in the range from 0 to 2, in increments of 0.1.(Douven only considers values up to 1, but we'll see that this omits important results.)So here's how each simulation works.At the start, we pick values for each of the three parameters c, p, and .Then we create 50 agents, all with a uniform prior over the possible biases.Each agent performs one flip of the coin privately, and updates on the result using the EXPL rule with the chosen value of c.Then they pool geometrically with everyone within distance of their own opinion.This flip-update-pool cycle then repeats, for a total of 500 cycles.
At the end of each cyle, after pooling, we gauge the community's accuracy.More precisely, each individual agent's inaccuracy is evaluated using the Brier score, 10 where V(H i ) = 1 if H i is true, and 0 otherwise.The average Brier score of all 50 agents is then calculated, and at the end of all 500 cycles these averages are summed to generate an overall score for that simulation.
Figure 4 shows the results. 15Each square represents a choice of values for c, p, and .The square's colour shows the expected inaccuracy for a community using that combination of values, based on 50 simulations averaged together.Lower scores are better, and the best ones in each panel are marked with an asterisk.
The conditionalizers are the bottom row of each panel, where c = 0.But the communities with the best scores are consistently those with high values of c, and also high values of .That is, the most accurate communities have agents who strongly favour hypotheses that fit their private data, but are also very "open-minded" in that they pool even with those whose opinions differ greatly from their own.
A natural thing to wonder about this result is: how is it possible?According to an influential theorem of Greaves & Wallace (2006), conditionalization minimizes expected Brier score.So how can conditionalization be dominated by an alternative rule, as it is here?
The answer is that the c = 0 agents only conditionalize on their private evidence.They do not conditionalize on the opinions of their fellows; instead they use geometric pooling.And Proposition 4 tells us they suffer a kind of "data loss" as a result.With two agents, geometric pooling effectively cuts each agent's sample in half.When 50 agents pool geometrically, 15 Code for Figures 4 and 5 is available on GitHub.Compare Figure 3 in Douven (2019)  and Figure 7.3 in Douven (2022).Note that our results look slightly different, for two reasons.First, we explore the full range of possible values for .Second, our simulations use the definition of EXPL Douven states in the text, which differs from the implementation in the accompanying code.Nothing we will say hangs on this second difference; either way, the results are the same in the respects that matter here.their samples are effectively scaled down by 1/50.So the "conditionalizers" in these simulations are actually leaving a lot of evidence on the table.
This analysis suggests a better way for these agents to update on their private evidence.Instead of using EXPL, they should conditionalize on their data but scaled up by a factor of 50.That is, if an agent observes 1 head, they should conditionalize on the proposition that they observed 50 heads instead.Ditto for tails.If we then set = 2, so that all 50 agents always pool with each other, the result will be the same as if they had all conditionalized on the aggregate evidence, by Proposition 4.
We could test this method by running new simulations, but we don't have to.By Greaves & Wallace's theorem, conditionalizing on the aggregate evidence is the optimal strategy for minimizing expected Brier score. 16ince our proposal is equivalent, it too is optimal. 17It won't be optimal within every panel, i.e. for every value of p.But no feasible rule can be.It does best on average though, since it has the lowest expected Brier score relative to a uniform prior over p.If we do repeated simulations, picking a random value for p each time, no procedure can do as well on average.
So we agree with Douven that, when conditionalizing on your social evidence is unavailable, there is a way to compensate for the loss of expected accuracy that results: change how you update on your private evidence.But we disagree on how best to do that.Douven says you should use a rule like EXPL with a high value of c. 18 We say you should conditionalize on a scaled up version of the sample you in fact witnessed.This proposal is provably better in expectation.
Proposition 4 also explains why a high-c strategy does best in Figure 4. Updating on private evidence using EXPL with a high c approximates our proposal of conditionalizing on a scaled up sample.Both are ways of "overfitting" your private evidence-of giving special favour to the hypotheses that best fit your actual private evidence.Coupled with a high value of and geometric pooling then, a high c approximates conditionalization on the aggregate evidence.
One consequence of this insight is that, as the number of other agents in the group decreases, so does the optimal value of c.This is because the optimal way to update, if you're then going to pool the results geometrically, is to update on your private evidence scaled up by the size of the group.And as the group size diminishes, using EXPL with lower values of c will approximate that optimal solution more closely.Indeed, as we see in Figure 5, if there are just five agents then the optimal value of c is zero, which is just conditionalization.

Conclusion
How well, then, does geometric pooling serve as a means to combine your probabilities with your fellow's, when you wish to gain the benefit of their evidence?
If your fellow shares their probabilities in the possible sequences of coin tosses, geometric pooling does serve you well, as long as you shared a prior measure their accuracy after updating on the private evidence and then measure it again after they've updated on the social evidence.If we were to do that, neither our proposal nor Douven's would be optimal; if we were to do that, the only optimal approach would be to update by Bayes' Rule on both private and social evidence.By measuring accuracy only after both private and social updates have taken place, we can get the accuracy benefits of Bayes' Rule by using a strategy for private update and a strategy for social update that, when combined, match Bayes' Rule perfectly. 18Actually, Douven finds that another rule he calls "Popper's Rule" outperforms EXPL.But the same conclusion applies: our proposal is provably better in expectation, because it is equivalent to conditionalizing on the aggregate evidence.with your fellow or you wish to pool their prior with your own.But many pooling operators do this, including harmonic pooling.And if you do not wish to pool your priors, there are alternatives that are better.What's more, it would be easier simply to share the evidence itself than to share the credences in the sequences to which it has given rise.
If your fellow instead shares their probabilities in the chance hypotheses, geometric pooling leads to a posterior that points in the same direction as the aggregate evidence, but with less conviction than the pooled evidence warrants.It's as if you've updated on half the evidence.
This observation allows us to see why the abductive updating rules studied by Douven seem to do well as a private updating rule, when paired with geometric pooling as your social updating rule: abductive updating approximates the posterior you'd obtain by conditionalizing on an appropriately scaled up version of your private evidence.But it also allows us to see a private updating rule that will do better on average than Douven's.
So, in the end, it's a mixed bag.Geometric pooling has some features that make it attractive when your purpose is to extract evidence from your fellow's probabilities.And it outperforms arithmetic pooling in some cases, though not all.But it must be handled with care.
Thus for S ∈ EF, where c is the requisite normalizing constant, (2) While for S ∈ EF we have M α S (P(− | E), Q(− | F))(S) = 0. Now observe that the numerators in equations ( 1) and ( 2) are the same.For both distributions, the nonzero probability masses are proportional to (P(S)Q(S)) α when S ∈ EF, while for S ∈ EF both assign mass 0. Hence these must actually be the same distribution.This establishes Proposition 3, which is the special case where α = 1/2.Proposition 1 then follows as the further special case where P = Q.
Next, we prove Proposition 2, which we restate here for convenience.
Proposition 2 (restated).If E and F are subsets of S such that P(EF) > 0, then for any state S in S, That establishes the three results of Section 2, where the topic was pooling over sequences.Next we turn to the results for pooling over chance hypotheses, which were the topic of Sections 3 and 4.
We first give a more formal statement of Theorem 6.Here we consider only the continuous case, as the discrete case runs closely parallel.
Theorem 6 (formal).Let A 1 , . . ., A m+n be categorical random vectors of length s.Fix a value 0 ≤ α ≤ 1 such that α(m + n) is an integer, and let Let f , g, and h be probability density functions such that the A i are i.i.d. with parameter vector T = (T 1 , . . ., T s ).Write f T for the marginal distribution of f over T, and similarly for g T and h T . (3) Proof.We first analyze the left-hand side of equation (3).By Bayes' theorem and our i.i.d.assumption, g T|Y (t | l) = c 2 g(t) ∏ i t l i i , where c 1 and c 2 are appropriate normalizing constants.By the definition of M α , where c 3 is another normalizing constant: Now we analyze the right-hand side of equation (3).By hypothesis, where c 4 is the appropriate normalizing constant.So, by Bayes' theorem and the i.i.d.assumption, where c 5 is again an appropriate normalizing constant.Since equations ( 4) and ( 5) are proportional, these must actually be the same distribution.
Observe that, in the special case of Theorem 6 where α = 1/2 and s = 2, equation (3) becomes This is the statement of Proposition 4 in the continuous case.
Finally, we prove Proposition 5, whose formal statement is as follows.
Proposition 5 (formal).Let A 1 , . . ., A m+n and H be Bernoulli random variables, with X, Y, and Z as in Theorem 6.Let P be a probability function such that the A i and H are i.i.d. with shared parameter T, with f the associated p.d.f.And let R be a probability function with h its associated p.d.f., such that H ∼ Bern(T) and Now, by the law of total probability, the probability of a Bernoulli random variable like H is the expected value of T. Since the expected value of a Beta(x, y) distribution is x/(x + y), this gives us Taking the arithmetic average yields Now we consider equation ( 6)'s left-hand side.By assumption, which we can rewrite using Theorem 6 with α = 1/2 as:

Figure 1 :
Figure 1: The blue dashed line gives the distribution obtained by conditionalizing your shared prior on the aggregate evidence, while the pink solid line gives the arithmetic pool of your posterior and your fellow's.The orange dotted line gives the result of updating the shared uniform prior on an observation of 30 heads and 70 tails, while the turquoise dash-dotted line gives the result of updating instead on an observation of 70 heads and 30 tails.

Figure 2 :
Figure 2: The blue dashed line gives the distribution obtained by conditionalizing your shared prior on the aggregate evidence, while the pink solid line gives the geometric pool of your posterior and your fellow's.As before, the orange dotted line gives the result of updating the shared uniform prior on an observation of 30 heads and 70 tails, while the turquoise dash-dotted line gives the result of updating instead on an observation of 70 heads and 30 tails.

Figure 3 :
Figure 3: Some examples of Beta distributions

Figure 4 :
Figure 4: Simulation results for communities using EXPL and geometric pooling.Each square shows the expected Brier score for a choice of c, p, and , based on 50 runs averaged together.The white asterisks mark the best scores in each panel.

Figure 5 :
Figure 5: The same setup as Figure 4, but with 5 agents rather than 50. with
h T = G f T|X (− | k), f T|Y (− | l) .If m = n and f T is Beta(a, b), then R(H) = P(H | X = k) + P(H | Y = l)We begin by analyzing the right-hand side of equation (6).Because of the conjugate relationship between the Beta and binomial distributions, f T|X (− | k) and f T|Y (− | l) have the following Beta distributions:T | X = k ∼ Beta(a + k, b + m − k), T | Y = l ∼ Beta(a + l, b + m − l).
P(H i | EF).If instead we use P(− | E) or P(− | F), similar reasoning gives the same result.