A differentiable function from binary integer to one-hot representations

Matthew Finlayson

I would like to define a differentiable function f:{0,1}logv{0,1}vf:\{0,1\}^{\log v}\to\{0,1\}^{v} that converts binary number representations of logv\log v bits into one-hot vectors. This can be accomplished by using fuzzy logic operators to convert f(x)i=𝟏[i=x]f(x)_i=\bm1[i=x] into f(x)i=j=1logv(ijxj)+((1ij)(1xj))(ijxj)((1ij)(1xj))f(x)_i = \prod_{j=1}^{\log v}(i_jx_j)+((1-i_j)(1-x_j))-(i_jx_j)((1-i_j)(1-x_j)) using definitions of product \top-norms and \top-conorms and the fact that 𝟏[a=b]=(ab)(¬a¬b).\bm1[a=b]=(a\land b)\lor(\lnot a\land\lnot b).

Why did I ask this? My previous post explored a generalized version of the cross-entropy minimization assumption from my recent paper. This function could be used to make that assumption about the input IDs hh to a language model while keeping the dimension of 𝚕𝚘𝚐𝚒𝚝𝚜(h)\nabla\texttt{logits}(h) small.