torrent_health.py 7.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. # author: David Harrison
  2. # In an attempt for us to have consistent notions of torrent health
  3. # across search and our various tools, I thought it made sense
  4. # to define health stats in BTL.
  5. def reciprocity( downloaders ):
  6. # measure of the effectiveness of tit-for-tat.
  7. # This is heuristic for lack of a better model.
  8. low_thresh = 5 # traditional tit-for-tat algo makes no sense when there are <= 5.
  9. high_thresh = 50
  10. # '+1' includes the local host if it were to join the swarm.
  11. # This retains consistency with the use of 'downloaders+1' used in the
  12. # download_rate_health metric.
  13. if downloaders + 1 < low_thresh:
  14. gamma = 0.
  15. elif downloaders + 1 < high_thresh:
  16. gamma = (downloaders + 1. - low_thresh)/(high_thresh - low_thresh)
  17. else:
  18. gamma = 1.
  19. return gamma
  20. def download_rate_health( seeders, downloaders, nats = 0 ):
  21. """
  22. This health metric preserves order based on expected download rates.
  23. It is not necessarily proportional to expected download rate.
  24. Let downrate_i = download rate from torrent i.
  25. Let uprate_i = upload rate to torrent i.
  26. Let downloaders_i = number of downloadeers in torrent i.
  27. Let seeders_i = number of seeders in torrent i.
  28. E[downrate_i] = E[downrate from tit-for-tat_i] + E[downrate from seeders_i] (1)
  29. Assumption 1: downrate from tit-for-tat is related to uprate by a function gamma_i.
  30. Let reciprocity gamma_i = effectiveness of tit-for-tat.
  31. E[downrate from tit-for-tat_i] = gamma_i * uprate_i (2)
  32. where uprate_i is the local uplink capacity that can be committed to torrent i.
  33. Assuming uprate is invariant across the torrents that I might join, (2) becomes
  34. E[downrate from tit-for-tat_i] = gamma_i * uprate (3)
  35. Assumption 2: my expected downrate for any torrent i is equal to the
  36. average download rate due to seeders across the torrent swarm for i. Due to
  37. conservation of bits. The sum download rate due to seeders must equal
  38. the sum upload rate from seeders. Thus
  39. Sum_k uprate from seeder k
  40. E[downrate from seeders_i] = -----------------------------
  41. downloaders_i
  42. seeders_i * E[uprate from seeder]
  43. -------------------------------- (4)
  44. downloaders_i
  45. Assumption 3: E[update from seeder] = uprate
  46. This expectation might reasonably hold when the local user is a typical
  47. residential customer and all seeders are also residential customers.
  48. This is likely to be far off when their are many infrastructure seeds.
  49. E[downrate from seeders_i] = seeders_i * uprate / downloaders_i
  50. and thus combining with (1) and (3) yields
  51. E[downrate_i] = gamma_i * uprate + seeders_i * uprate / downloaders_i (5)
  52. Our objective is a health metric H such that
  53. H_i > H_j ---> E[downrate_i] > E[downrate_j] for all i,j
  54. Because uprate >= 0,
  55. E[downrate_i] E[downrate_j]
  56. ------------ > ------------- ---> E[downrate_i] > E[downrate_j] for all i,j
  57. uprate uprate
  58. We thus define rate health metric Hr_i
  59. seeders_i
  60. Hr_i = gamma_i + ------------- (6)
  61. downloaders_i
  62. To take into account the local user in the health metric, assuming the user is a
  63. potential downloader, this becomes
  64. seeders_i
  65. Hr_i = gamma_i + ----------------- (7)
  66. downloaders_i + 1
  67. The +1 also eliminates the potential divide by zero error that would arise in
  68. (6) were downloaders_i = 0.
  69. If gamma_i = gamma_j for all i,j then we can simplify (7) while preserving order
  70. by subtracting out gamma_i. This leads to
  71. seeders_i
  72. Hr_i = ----------------- (8)
  73. downloaders_i + 1
  74. From experience, we know that the tit-for-tat algorithm works poorly when there
  75. are few downloaders in a swarm. At the extreme when there are less than 5,
  76. the algorithm does nothing: the traditional tit-for-tat algorithm unchokes
  77. the best 4 fastest plus 1 randomly chosen 'optimistic' unchoke. Reciprocity
  78. should be some function of the number of downloaders in the swarm. For lack
  79. of a better study, I suggest the following
  80. downloaders_i +1
  81. / ---------------- for downloaders_i < thresh
  82. | thresh
  83. gamma_i <
  84. | 1 otherwise
  85. \
  86. Thus the effectivness of reciprocity increases linearly until it hits a threshold
  87. and reciprocation becomes perfect above that threshold.
  88. @param seeds: number of non-natted peers that have the entire file
  89. and are still in the torrent swarm.
  90. @param downloaders: number of non-natted downloaders in the swarm
  91. @param nats: number of natted downloaders in the swarm.
  92. """
  93. assert downloaders >= 0
  94. assert seeders >= 0
  95. gamma = reciprocity(downloaders)
  96. Hr = health = gamma + seeders / (downloaders + nats + 1.)
  97. return health
  98. def download_time_health( seeders, downloaders, nats, filesize ):
  99. """
  100. Health metric that perserves order based on download times.
  101. Smaller is better. (Confusing. I couldn't decide whether
  102. "smaller is better" or "bigger is better" is more appropriate.)
  103. Let downtime_i = download time for the entire file from torrent i.
  104. Thus
  105. Ht_i > Ht_j --> E[downtime_i] > E[downtime_j] (9)
  106. filesize_i
  107. E[downtime_i] = ------------
  108. E[downrate_i]
  109. Using the same definitions ofr downrate_i as used in derive the
  110. health metric for download_rate_health, (9) becomes
  111. filesize_i
  112. E[downtime_i] = ------------------------------------------------------
  113. gamma_i * uprate_i + seeders_i * uprate_i / downloaders_i
  114. Because uprate_i is positive,
  115. E[downtime_i] > E[downtime_j] <----
  116. filesize_i filesize_j
  117. ------------------------------ > ------------------------------
  118. gamma_i + seeders_i / downloaders_i gamma_i + seeders_j / downloaders_j
  119. Including +1 and using the same gamma as used from download_time_health,
  120. we define
  121. filesize_i
  122. Ht_i = --------------------------------------
  123. gamma_i + seeders_i / (downloaders_i+1)
  124. @param seeds: number of non-natted peers that have the entire file
  125. and are still in the torrent swarm.
  126. @param downloaders: number of non-natted downloaders in the swarm.
  127. @param nats: number of natted downloaders in the swarm.
  128. """
  129. assert downloaders >= 0
  130. assert seeders >= 0
  131. assert filesize >= 0
  132. gamma = reciprocity(downloaders)
  133. Hr = gamma + seeders / (downloaders+nats+1)
  134. Ht = filesize / Hr
  135. return Ht