Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.
Original question: When training a model for image classification it is common to use pooling layers to reduce the dimensionality, as we only care about the final node values corresponding to the categorical probabilities. In the realm of VAEs on the other hand, where we are attempting to reduce the dimensionality and subsequently increase it again, I have rarely seen pooling layers being used. Is it normal to use pooling layers in VAEs? If not, whats the intuition here? Is it because of their injective nature?
Original answer:
I’ve seen pooling layers used in convolutional VAEs. This paper contains a few examples of network architectures for image data: https://www.sciencedirect.com/science/article/pii/S1319157821000227 (Reference: Zilvan, V., Ramdan, A., Heryana, A., Krisnandi, D., Suryawati, E., Yuwana, R. S., Kusumo, R. B., & Pardede, H. F. (2022). Convolutional variational autoencoder-based feature learning for Automatic Tea Clone Recognition. Journal of King Saud University - Computer and Information Sciences, 34(6), 3332–3342.)