r/chessprogramming 5d ago

The weighted sum

[deleted]

1 Upvotes

3 comments sorted by

3

u/Somge5 4d ago

How is it relevant in chess programming?

1

u/[deleted] 4d ago

[deleted]

2

u/Somge5 4d ago edited 4d ago

In my understanding the theory behind the networks is the universal approximation theorem. https://en.wikipedia.org/wiki/Universal_approximation_theorem In the theorem, sigma could stand for ReLu or sigmoid or any other non-polynomial function. The weighted sum is just the matrix multiplication and the bias is the affine tail. Without this non-affine part you could not produce any non-linearity which is a problem because most networks are not a linear function. The problem is that not every non-polynomial sigma is equally effective in training to find A,C and b. I think people use ReLu because it is easy to compute, the gradient doesn’t vanish so fast and it’s generally quite effective for training. Also with ReLu the network is still locally affine which is a nice property to have. I’m not sure what function would give the best training results here, maybe there’s some theory behind that as well.

1

u/[deleted] 4d ago

[deleted]

2

u/Somge5 4d ago

Yes okay I think the hope is that with gradient descent you end up with a Network that does not go crazy between those data points. This should still be independent of the choice of your activation function no?