In this section, we’ll focus on the following main topic:

Question: How does the sticker price of a booster box compare to the value of the cards within it?

We can write up a simulation for the act of opening packs, but how do we get a good estimate for the value of the cards? This value assessment isn’t as straightforward to make as one would think.

The Main Issue

A card’s value can vary tremendously depending on its condition; thus, in order to have any chance at getting a good estimate of the value, we need to be able to estimate the condition well. To quantify the condition, we’ll use the PSA scale, which assigns a rating from 1 to 10 (with some exceptions) to a card [1]. Thus, we have the following first objective:

Goal: For any card pulled from a pack, estimate the PSA grade $G$ that it will earn if sent to PSA today.

The PSA grade is another example of a random variable, so we again want to understand the distribution of $G$. It is important to distinguish between two different sources of randomness here. Before you open a pack, you don’t know what cards you are going to get, but you also don’t know the condition of the cards inside. To understand $G$, we focus on this second source of randomness.

This randomness is asking us to consider the quality of the set printing, the care of the cards up to now, and the harshness of the PSA grader assigned to the card, just to name a few examples. These considerations determine the frequency and perception of cards with bent corners, scratches, whitening on edges, centering issues, etc.

A Simple But Unsatisfactory Solution

We could come up with a simple approximation; since we are assuming we are opening fresh packs, we could err on the side of caution and provide a fairly low, constant estimate. In this case, we could say that every card will grade a PSA 7, which according to the scale in [1], corresponds to a Near Mint designation. Certainly, there will be cases where we get a card that would grade lower due to production errors, and this would not be captured by this proposed distribution; however, we may choose to assume that the probability of these events is close to 0 and give them exactly 0 probability in our model.

The big issue is that because this approach also places 0 probability on the PSA 8-10 range, this will likely force our analysis to almost always output that the card value from a box is much, much lower than the cost of the box. At the time of writing, the value difference between a PSA 10 and PSA 8 1st Edition Blue Eyes White Dragon is around $38,000 [2]. The PSA 7 is an ever worse grade, so it might not fetch even the PSA 8 price; if we are off by that much, our analysis will essentially be close to meaningless.

Thus, we want to make use of data to come up with a good approximate distribution. What data is out there? There are a lot of ways to collect data, some more tedious or expensive than others and each with their own amount of bias that we need to keep in mind.

For this analysis, we will focus on the official PSA data for number of cards awarded particular grades found in [3] and for card values found in [4]. We’ll focus on the first data source now.

When you open this data for a particular set, such as in [2], this data source has count information for submitted cards that earned each PSA grade. There are multiple ways to use this PSA data, but before that, there are a few sources of bias that we should think about.

Bias Issues With the Data

First, when you send in cards to get graded, you may withhold some of the cards you believe could get grades lower than 9 or 10 since you might not see it as worth the investment; thus, the grades shown by the data might be inflated by a selection bias.

Furthermore, cards generally cannot increase in grade by becoming better in condition, but they can certainly drop if they manage to get damaged over time (for instance, in the binder that they are stored in). Thus, if a lot of potential 9 and 10 grade cards ended up getting lower grades, this would make us deflate our grades.

Lastly, there is the temporal aspect to grading. Have grading standards changed through the years that this data has been accumulated and is today’s distribution properly represented in the historical data? Thus, it is unclear whether grades today are inflated, deflated, or neither.

These points can each affect the distribution, and it would be a big claim to say that they cancel each other out perfectly. Thus, if we believe any of these to greatly impact our models, it would be wise to intervene and incorporate our own knowledge in place of the solely data-driven approach like in the Bayesian statistics approach; see section 3.2 in [5] for a quick refresher of this idea.

Estimating the Distributions

With this in mind, we now want to move towards using this data to estimate the PSA grade distribution of the card. One questions arises, however; even if we choose to use the data without modifications, what if we have low or missing counts for a particular card?

To get around this, the first approach we could use is to use cumulative totals over a whole set to form one single distribution for the set, which we then use for every card. This approach might not be a good choice if certain rarities or certain cards differ greatly from each other and importantly, from the aggregate distribution. One example where this might be an issue is with the 1st Edition The Winged Dragon of Ra card, where the Ghost Rare variant has 4 cards earning the maximum grade out of 538 submissions ($\approx 0.7\%$) while the Ultra Rare variant has 35 out of 224 ($\approx 15.6\%$) (thank you to Ruxin34 for informing me of this card) [6]. There are benefits to this approach, though. We generally wouldn’t have to deal with a single card having all or almost all of the entries missing as our all of our counts would be reasonably big when we aggregate. Futhermore, if card quality trends do operate at the level of a set, intuitively, we’d be amplifying the consistent signal across observations and attenuating the noise from individual observations.

A second approach would be to aggregate the counts by rarity. This would lean on the idea that cards are printed in sheets by rarity as seen here for example. This would hopefully ameliorate the low/missing count issues while incorporating some knowledge about the production of the cards. It’s not perfect in that it doesn’t model the case in which a particular card is especially difficult to grade and unlike those within its rarity category.

A third approach would be to look at individual card counts. This would model trends and quirks at the card level, but it could potentially be noisier as counts may not be very high. Given the popularity of PSA, the low count issue might not be a big problem for a lot of the cards that many people care about. For less prevalent cards, counts could be missing for every single grade, so with this approach, we’d want to have a default distribution. For instance, we could default to a uniform distribution or another distribution that we deem appropriate.

A fourth approach would be to declare a prior distribution and use the counts (in any of the three ways above) to update that distribution. This would let someone buy unopened items, check their condition, and assess how the cards inside might look (for instance, if they think some packs could be bent and thus greatly affect the cards). This could also lead someone make a better-tuned decision for how pessimistic or optimistic they want to be when they make an investment. This sounds great, but if done without much thought, it may seem like someone is inserting their own biases which are not present in any concrete data.

From this discussion, it seems pretty clear that there are benefits and drawbacks to each approach, and we have to think before we use our data.

Some of the issues stem from the particular data source we are considering right now, so we may want to entertain the possibility of what perfect data looks like. In this case, the argument that I want to make is that it is incredibly difficult to get high quality data without a ton of foresight, money, or power.

At one (currently unattainable) extreme, we could go back in time and grab a much larger, random sample of sealed items we care about studying. We would then store them until now, open the items, grade the cards inside, and use those grades to form empirical distributions. Even this idealized scenario would not really capture what we want perfectly, as the path to today is streamlined and isn’t representative of the many exchanges a card product experiences in normal conditions that can harm the cards and greatly affect their condition.

As an alternative, we could buy and open a lot of sealed items today to get our distributions. Obviously, there are a few issues with this. Firstly, there aren’t as many sealed items left today, so we might not be able to get a large enough sample. Secondly, this would cost a huge amount of money and would really defeat the point of trying to assess value before choosing to invest.

Merging with Value Data

Once we do settle on how to estimate the condition, we then want to find values associated to each of these conditions. In a parallel fashion, PSA has price value data for the 8, 9, and 10 grades of certain cards as in [2].

This data can be seen as incomplete since it doesn’t have prices for cards with grades lower than 8. This creates an issue when we want to use the distribution data from the previous section with these prices. We have a few choices to make depending on how we want to model the problem. One approach is to include a model assumption that pack-fresh cards will not receive grades lower than 8; this would overall inflate the grades since these lower grades have 0 probability mass. To counteract this inflation, we could, alternatively, assume small probabilities for the lower grades akin to Laplace Smoothing [7, Pp. 44-45].

Because this price data is published by PSA, one might argue that they directly benefit from inflating the prices listed; after all, it would benefit their business to provide as much value as possible (and they also charge based on the price of the card). These points should encourage one to check other sources to verify that the prices are accurate.

Keeping everything we’ve mentioned so far in mind and combining the two PSA data sources we’ve mentioned, we can do the following. For each card, we can now fully specify the distribution of the condition and the values associated with each condition; thus, we have enough information to simulate the experiment we mentioned at the start of this page.

Results

I wrote the previous sections to encourage someone who wants to do a similar analysis to understand their data enough to make good use of it. Plenty of choices need to be made which will impact how you interpret your results. I’ll present some results now under one set of choices. The following plot shows a distribution of the value of the cards within a box for one set:

In this case, we see an interesting distribution seemingly with three different peaks that we might not have expected. It would be wise to look more into specific data points placed in each of these sections of the distribution to see if there are any important patterns in the cards, which end up heavily shaping the distribution.

Now, it’s up to us to frame some questions that we might care about. For instance, we could look up the market price for the set and compute how much of the probability mass is above this value; this would tell us how likely we are to make a profit at this point in time if we were to purchase the item at that price. We could also set thresholds for when it’s worth it to buy an item, given the level of risk that we are willing to assume. We could tweak assumptions to see how robust our results are to slightly different settings, which could be other real-world scenarios we could encounter.

Through this page, I mainly wanted to highlight the thinking process behind each of the steps up to this point; I’d be glad to chat about more specific details with anyone interested, so please do reach out to me if you have any thoughts about this.

References

[1]“Grading Standards.” PSA, Accessed: Aug. 20, 2023. [Online]. Available at: https://www.psacard.com/resources/gradingstandards#cards.
[2]“2002 Yu-Gi-Oh! LOB-Legend of Blue Eyes White Dragon.” PSA, Accessed: Aug. 20, 2023. [Online]. Available at: https://www.psacard.com/priceguide/non-sports-tcg-card-values/2002-yu-gi-oh-lob-legend-blue-eyes-white-dragon/2875.
[3]“PSA Population Report.” PSA, Accessed: Aug. 20, 2023. [Online]. Available at: https://www.psacard.com/pop.
[4]“PSA Price Guide.” PSA, Accessed: Aug. 20, 2023. [Online]. Available at: https://www.psacard.com/priceguide.
[5]“Basics of Bayesian Statistics.” Accessed: Aug. 20, 2023. [Online]. Available at: https://www.stat.cmu.edu/ brian/463-663/week09/Chapter%2003.pdf.
[6]“2020 YU-GI-Oh! Legendary Duelists: Rage of RA.” PSA, Accessed: Aug. 20, 2023. [Online]. Available at: https://www.psacard.com/pop/tcg-cards/2020/yu-gi-oh-legendary-duelists-rage-ra/181443.
[7]A. Ng and T. Ma, “CS229 Lecture Notes.” Stanford University, Accessed: Aug. 20, 2023. [Online]. Available at: http://cs229.stanford.edu/main_notes.pdf.