# author: David Harrison

# In an attempt for us to have consistent notions of torrent health
# across search and our various tools, I thought it made sense
# to define health stats in BTL.


def reciprocity( downloaders ):
    # measure of the effectiveness of tit-for-tat.
    # This is heuristic for lack of a better model.
    low_thresh = 5    # traditional tit-for-tat algo makes no sense when there are <= 5.
    high_thresh = 50
    # '+1' includes the local host if it were to join the swarm.
    # This retains consistency with the use of 'downloaders+1' used in the
    # download_rate_health metric.
    if downloaders + 1 < low_thresh:
        gamma = 0.
    elif downloaders + 1 < high_thresh:
        gamma  = (downloaders + 1. - low_thresh)/(high_thresh - low_thresh)
    else:
        gamma = 1.
    return gamma

    
def download_rate_health( seeders, downloaders, nats = 0 ):
    """
     This health metric preserves order based on expected download rates.
     It is not necessarily proportional to expected download rate.

     Let downrate_i = download rate from torrent i.

     Let uprate_i = upload rate to torrent i.

     Let downloaders_i = number of downloadeers in torrent i.

     Let seeders_i = number of seeders in torrent i.

       E[downrate_i] = E[downrate from tit-for-tat_i] + E[downrate from seeders_i]  (1)

     Assumption 1: downrate from tit-for-tat is related to uprate by a function gamma_i.
     Let reciprocity gamma_i = effectiveness of tit-for-tat.

       E[downrate from tit-for-tat_i] = gamma_i * uprate_i                          (2)

     where uprate_i is the local uplink capacity that can be committed to torrent i.
     Assuming uprate is invariant across the torrents that I might join, (2) becomes

       E[downrate from tit-for-tat_i] = gamma_i * uprate                            (3)

     Assumption 2: my expected downrate for any torrent i is equal to the
     average download rate due to seeders across the torrent swarm for i.  Due to
     conservation of bits.  The sum download rate due to seeders must equal
     the sum upload rate from seeders.  Thus
     
                                       Sum_k uprate from seeder k
       E[downrate from seeders_i] =  -----------------------------                 
                                             downloaders_i

                                     seeders_i * E[uprate from seeder]
                                     --------------------------------                (4)
                                             downloaders_i

     Assumption 3: E[update from seeder] = uprate
     This expectation might reasonably hold when the local user is a typical
     residential customer and all seeders are also residential customers.
     This is likely to be far off when their are many infrastructure seeds.

       E[downrate from seeders_i] = seeders_i * uprate / downloaders_i

     and thus combining with (1) and (3) yields

       E[downrate_i] = gamma_i * uprate + seeders_i * uprate / downloaders_i         (5)

     Our objective is a health metric H such that

       H_i > H_j  --->  E[downrate_i] > E[downrate_j]  for all i,j

     Because uprate >= 0,
     
       E[downrate_i]     E[downrate_j]          
       ------------   >  -------------   ---> E[downrate_i] > E[downrate_j]  for all i,j
         uprate             uprate

     We thus define rate health metric Hr_i

                          seeders_i
       Hr_i = gamma_i + -------------                                                 (6)
                        downloaders_i

     To take into account the local user in the health metric, assuming the user is a
     potential downloader, this becomes

                          seeders_i
       Hr_i = gamma_i + -----------------                                             (7)
                        downloaders_i + 1

     The +1 also eliminates the potential divide by zero error that would arise in
     (6) were downloaders_i = 0.

     If gamma_i = gamma_j for all i,j then we can simplify (7) while preserving order
     by subtracting out gamma_i.  This leads to
     
                 seeders_i
       Hr_i = -----------------                                                       (8)
              downloaders_i + 1

     From experience, we know that the tit-for-tat algorithm works poorly when there
     are few downloaders in a swarm.  At the extreme when there are less than 5,
     the algorithm does nothing: the traditional tit-for-tat algorithm unchokes
     the best 4 fastest plus 1 randomly chosen 'optimistic' unchoke.  Reciprocity
     should be some function of the number of downloaders in the swarm.  For lack
     of a better study, I suggest the following

                       downloaders_i +1          
                 /     ----------------     for downloaders_i < thresh
                 |          thresh
       gamma_i  <
                 |            1             otherwise
                 \

      Thus the effectivness of reciprocity increases linearly until it hits a threshold
      and reciprocation becomes perfect above that threshold.

      @param seeds: number of non-natted peers that have the entire file
          and are still in the torrent swarm.
      @param downloaders: number of non-natted downloaders in the swarm
      @param nats: number of natted downloaders in the swarm. 
    """
    assert downloaders >= 0
    assert seeders >= 0
    gamma = reciprocity(downloaders)
    Hr = health = gamma + seeders / (downloaders + nats + 1.)
    return health

def download_time_health( seeders, downloaders, nats, filesize ):
    """
      Health metric that perserves order based on download times.
      Smaller is better.  (Confusing.  I couldn't decide whether
      "smaller is better" or "bigger is better" is more appropriate.)

      Let downtime_i = download time for the entire file from torrent i.
      Thus
          Ht_i > Ht_j --> E[downtime_i] > E[downtime_j]                      (9)

                          filesize_i
        E[downtime_i] =  ------------
                         E[downrate_i]

        
      Using the same definitions ofr downrate_i as used in derive the
      health metric for download_rate_health, (9) becomes

                                        filesize_i
        E[downtime_i] = ------------------------------------------------------
                        gamma_i * uprate_i + seeders_i * uprate_i / downloaders_i


      Because uprate_i is positive,
                                               
        E[downtime_i] > E[downtime_j]  <----

                             filesize_i                             filesize_j                
                    ------------------------------      > ------------------------------     
                    gamma_i + seeders_i / downloaders_i   gamma_i + seeders_j / downloaders_j

      Including +1 and using the same gamma as used from download_time_health,
      we define

                          filesize_i                
         Ht_i = --------------------------------------     
                gamma_i + seeders_i / (downloaders_i+1)

      @param seeds: number of non-natted peers that have the entire file
          and are still in the torrent swarm.
      @param downloaders: number of non-natted downloaders in the swarm.
      @param nats: number of natted downloaders in the swarm. 
    """
    assert downloaders >= 0
    assert seeders >= 0
    assert filesize >= 0
    gamma = reciprocity(downloaders)
    Hr = gamma + seeders / (downloaders+nats+1)
    Ht = filesize / Hr
    return Ht