> You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset.