Gradient Accumulation

Gradient accumulation and KoLeo loss - Part 3/3
Gradient accumulation and …

In the previous chapter, we observed that the KoLeo loss successfully spreads embeddings in the representation space. However, we noted that the KoLeo loss is intrinsically dependent on batch size: it computes the minimum distance between each embedding and all other embeddings within the same …