====== Perplexity statistics ====== Mean PPL(Q) : 24.967196 ± 0.239198 Mean PPL(base) : 22.656280 ± 0.216110 Cor(ln(PPL(Q)), ln(PPL(base))): 97.50% Mean ln(PPL(Q)/PPL(base)) : 0.097126 ± 0.002137 Mean PPL(Q)/PPL(base) : 1.101999 ± 0.002355 Mean PPL(Q)-PPL(base) : 2.310915 ± 0.055814 ====== KL divergence statistics ====== Mean KLD: 0.194299 ± 0.000681 Maximum KLD: 7.302684 99.9% KLD: 2.411403 99.0% KLD: 1.223937 99.0% KLD: 1.223937 Median KLD: 0.115222 10.0% KLD: 0.002397 5.0% KLD: 0.000468 1.0% KLD: 0.000029 Minimum KLD: -0.000071 ====== Token probability statistics ====== Mean Δp: -1.814 ± 0.028 % Maximum Δp: 87.005% 99.9% Δp: 52.936% 99.0% Δp: 28.197% 95.0% Δp: 11.558% 90.0% Δp: 5.344% 75.0% Δp: 0.316% Median Δp: -0.027% 25.0% Δp: -2.650% 10.0% Δp: -12.214% 5.0% Δp: -20.885% 1.0% Δp: -41.395% 0.1% Δp: -70.258% Minimum Δp: -96.511% RMS Δp : 10.877 ± 0.050 % Same top p: 79.025 ± 0.105 %