FeaturesΒΆ
Compatibility MatrixΒΆ
The tables below show mutually exclusive features and the support on some hardware.
The symbols used have the following meanings:
- β = Full compatibility
- π = Partial compatibility
- β = No compatibility
- β = Unknown or TBD
Note
Check the β or π with links to see tracking issue for unsupported feature/hardware combination.
Feature x FeatureΒΆ
| Feature | CP | APC | LoRA | SD | CUDA graph | pooling | enc-dec | logP | prmpt logP | async output | multi-step | mm | best-of | beam-search |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CP | β | |||||||||||||
| APC | β | β | ||||||||||||
| LoRA | β | β | β | |||||||||||
| SD | β | β | β | β | ||||||||||
| CUDA graph | β | β | β | β | β | |||||||||
| pooling | π * | π * | β | β | β | β | ||||||||
| enc-dec | β | β | β | β | β | β | β | |||||||
| logP | β | β | β | β | β | β | β | β | ||||||
| prmpt logP | β | β | β | β | β | β | β | β | β | |||||
| async output | β | β | β | β | β | β | β | β | β | β | ||||
| multi-step | β | β | β | β | β | β | β | β | β | β | β | |||
| mm | β | β | π ^ | β | β | β | β | β | β | β | β | β | ||
| best-of | β | β | β | β | β | β | β | β | β | β | β | β | β | |
| beam-search | β | β | β | β | β | β | β | β | β | β | β | β | β | β |
* Chunked prefill and prefix caching are only applicable to last-token pooling.
^ LoRA is only applicable to the language backbone of multimodal models.
Feature x HardwareΒΆ
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU |
|---|---|---|---|---|---|---|---|---|
| CP | β | β | β | β | β | β | β | β |
| APC | β | β | β | β | β | β | β | β |
| LoRA | β | β | β | β | β | β | β | β |
| SD | β | β | β | β | β | β | β | β |
| CUDA graph | β | β | β | β | β | β | β | β |
| pooling | β | β | β | β | β | β | β | β |
| enc-dec | β | β | β | β | β | β | β | β |
| mm | β | β | β | β | β | β | β | β |
| logP | β | β | β | β | β | β | β | β |
| prmpt logP | β | β | β | β | β | β | β | β |
| async output | β | β | β | β | β | β | β | β |
| multi-step | β | β | β | β | β | β | β | β |
| best-of | β | β | β | β | β | β | β | β |
| beam-search | β | β | β | β | β | β | β | β |
Note
Please refer to Feature support through NxD Inference backend for features supported on AWS Neuron hardware