Gradient Accumulation

Gradient accumulation and …

In the previous chapter, we observed that the KoLeo loss successfully spreads embeddings in the representation space. However, we noted that the KoLeo loss is intrinsically dependent on batch size: it computes the minimum distance between each embedding and all other embeddings within the same …

Gradient Accumulation

Gradient accumulation and …

Gradient accumulation and …

Effect of KoLeo loss on …

Training a siamese …