GPTQ & GGML allow PostgresML to fit larger models in less RAM. These algorithms perform inference significantly faster on NVIDIA, Apple and Intel hardware. Half precision floating point, and quantization optimizations are now available for your favorite LLMs downloaded from Huggingface.