As an enterprise SEO, you likely have countless keywords to target, but minimal time and resources. To maximize the ROI of your efforts, it’s crucial to prioritize keywords based on how difficult it will be to rank for them. 

But considering that most keyword difficulty scores from SEO tools exhibit little more correlation to rankings than a coin flip, how can you trust the insight they provide?

As Google’s criteria for determining keyword rankings grow increasingly complex, SEOs require a more accurate keyword difficulty score that factors in more than just domain and page authority.

After two years of dedicated work, we built a robust model based on billions of data points from our Research Grid dataset. Our Keyword Difficulty Score is anchored in our proprietary Page Strength metric, derived from extensive data analysis of over 1 billion URLs, encompassing rankings, search volume, backlinks, trends, and more. 

In the following data analysis, we unveil the relationship between "Page Strength" and "Rank Position” to provide real evidence of the accuracy of our Keyword Difficulty Scores.

Table of Contents:

 

Background on Page Strength

Page Strength covers all factors influencing your page's position in Google's search results—a vital tool for assessing keyword ranking difficulty.

The purpose of Page Strength is to give SEOs the ability to accurately gauge how difficult it is to rank for their target keywords. After all, understanding keyword difficulty starts with comprehending the Page Strength of the leading content. If those pages wield significantly more strength than yours, the path to ranking just got tougher.

We calculate Page Strength using seoClarity's vast platform, including a 500+ million keyword-strong Research Grid dataset, billions of data points and metrics, and six years of historical data.

Unlike one-time correlations, Page Strength undergoes daily validation across 10 million keywords. This enables us to assess keyword difficulty for new pages based on a page’s existing topic authority and refine accuracy over time with additional metrics from our Research Grid dataset.

 

Summary of Results

  • We have established a robust correlation between page strength and page rank, with a correlation coefficient of approximately -0.6 for raw data and remarkably close to 1 (0.99) for aggregated data (averaging page ranks across units of page strength from 0 to 100).
  • Our findings are supported by a mixed-effect model with random intercepts, which is considered the most appropriate approach. It demonstrates a positive and statistically significant impact of page strength on page rank, with an average rank score decrease of 0.78 for every 10-unit increase in page strength.
  • This significant positive effect of page strength on page ranking holds true, assuming rank 1 is the most desirable.

In short, our study confirms a significant and robust connection between the effect of page strength on page rank. 

 

Does "Page Strength" Strongly Indicate "Rank Position"?

 

Data Quality Check

To conduct our study, we began with a dataset of 10,969,861 observations and ensured that there were no missing values.

Then, we looked at how the data was distributed using skewness and kurtosis. These measures help us see if the data has a normal distribution. This is important because it means our data behaves in a way that we can understand and make predictions about.

We determined that the absolute values of skewness and kurtosis for page strength were 1.35 and 1.40 accordingly and the absolute values for rank position were 0.01 and 1.23 accordingly. These numbers are smaller than 2 and 7, which means the data closely resembles a normal distribution.

 

Choosing the Right Method to Determine the Correlation

When beginning our study, we analyzed the data in several different ways to determine the correlation between page strength and rank position such as:

  • The Spearman correlation
  • Monotonic trend
  • Linear trend and direction
  • Logarithmic trend

The results all revealed a strong correlation.

But these analyses have some methodological concerns. Most correlation and regression models assume that each observation is independent. However, it’s common to use the same keyword on multiple pages, making the observations not entirely independent. This introduces a multilevel aspect.

To put it simply, let's consider two keywords, "aaa" and "bbbb," each occurring on three pages with varying page strengths. For "aaa," page strength 20 corresponds to the top rank (1), while for "bbbb," page strength 20 corresponds to the lowest rank (3). 

But the overall trend remains the same: higher page strength leads to a higher rank (1). This is why a multilevel approach is the most appropriate in this scenario.

 

Determining the Correlation Between Page Strength and Rank Position

The multilevel approach allows us to directly measure correlation even when observations are not independent (i.e. when the same keywords occur on multiple pages). In this scenario, we treat each keyword as if it provides repeated measures, where the page strength varies, resulting in different rank positions.

We can assume that measures originating from the same keyword share a common factor (the keyword itself) and differ from measures associated with other keywords, similar to fitting a mixed-effect model and assessing the coefficient of determination (as discussed by Alexander, Tropsha, Winkler, 2015).

For the purpose of this study, we selected the first 10,000 keywords and computed the repeated measures correlation coefficient:

  • Repeated Measures Correlation (r): -0.5670224
  • Degrees of Freedom: 8755
  • p-value: <0.001
  • 95% Confidence Interval: -0.5810666 to -0.5526407

Once again, we observe a strong association (r = -0.57, exceeding 0.5) between page strength and rank position. This correlation significantly deviates from zero (p < 0.001). 

To confirm accuracy, we can compare this with the Spearman correlation coefficient for the entire dataset (10,969,861 observations), which is R = -0.58, or the Pearson's correlation coefficient, which is R = -0.55. 

Additionally, we can estimate correlations for the same dataset using only the first 10,000 keywords: Rpearson = -0.57 and RSpearman = -0.61.

From these findings, we draw two conclusions:

  1. There's no imperative need to work with an excessively large dataset. A dataset comprising 10,000 keywords captures the overall trend sufficiently well.
  2. While strictly speaking, observations within the same keyword may not be entirely independent, the bias in correlation estimates is minimal. Spearman correlation coefficient can be employed as a quick and robust check in such cases.

 

Utilizing Mixed-Effect Models to Assess Page Strength's Significance on Page Rank

We can directly determine whether page strength influences page rank by considering the potential effects.

We've employed a mixed-effect model with random intercepts, with the following results:

Random Effects:

  • Groups Name Variance Std.Dev.
  • keyword (Intercept) 0.00 0.000
  • Residual 5.79 2.406

Number of observations: 10,969,861, grouped by keyword (1,134,426 groups).

Fixed Effects:

  • Intercept: Estimate = 7.796e+00, Std. Error = 1.295e-03, t value = 6018, p-value < 2e-16 ***
  • url_svb: Estimate = -7.814e-02, Std. Error = 3.601e-05, t value = -2170, p-value < 2e-16 ***

The fixed effect (a common effect for all keywords) is negative and statistically significant. The equation is as follows:

Page rank = 7.80 - 0.078 * (Page strength)

The intercept (7.8) is an aggregated value for all keywords. Each keyword has its own intercept (random effect). 

If page strength is 100, we observe that page rank theoretically approaches 0. This model does not account for the possibility of a minimum rank of 1, but the overall trend is correct: higher page strength corresponds to a lower rank.

We can also explore a logarithmic model (log10).

Random Effects:

  • Groups Name Variance Std.Dev.
  • keyword (Intercept) 0.000 0.00
  • Residual 5.711 2.39

Number of observations: 10,969,861, grouped by keyword (1,134,426 groups).

Fixed Effects:

  • Intercept: Estimate = 13.17, Std. Error = 3.546e-03, t value = 3715, p-value < 2e-16 ***
  • log10_page_strength: Estimate = -5.575e+00, Std. Error = 2.512e-03, t value = -2219, p-value < 2e-16 ***

The effect of page strength on rank position is again significant and negative, indicating that as page strength increases, the rank gets closer to 1. The equation for this model is as follows:

Page rank = 13.17 - 5.58 * log10(Page strength)

For instance, if page strength is 100 (maximum), the estimated page rank is approximately 2. If page strength is 10 (a weaker page), the estimated page rank is approximately 7.6. 

In summary, when page strength is close to 100, the page rank falls between 1 and 3 (averaging around 2). However, if page strength is 10 or less, the page rank on average is greater than 7.

 

Methodology:

We started with a dataset of 10,969,861 observations, ensuring there were no missing values. Then we assessed the data's distribution using skewness and kurtosis to confirm its resemblance to a normal distribution.

To analyze the correlation between page strength and rank position, we selected the first 10,000 keywords and calculated the repeated measures correlation coefficient. To ensure accuracy, we compared the results with the Spearman correlation for the entire dataset.

Recognizing that our observations weren't entirely independent due to the use of the same keywords across multiple pages, we adopted a mixed-effect model with random intercepts. This approach accounts for the multilevel nature of the data, where keywords are a common factor but have different outcomes.

These methodological steps allowed us to thoroughly analyze the relationship between page strength and rank position, ensuring the reliability of our findings.

 

Ready to say goodbye to outdated keyword difficulty checkers? Schedule a free demo of the most reliable way to determine keyword difficulty today!

Schedule a Demo