{"id":1800,"date":"2023-11-16T04:38:07","date_gmt":"2023-11-16T04:38:07","guid":{"rendered":"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/"},"modified":"2023-11-16T04:38:07","modified_gmt":"2023-11-16T04:38:07","slug":"ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt","status":"publish","type":"post","link":"https:\/\/patris.ai\/en\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/","title":{"rendered":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT"},"content":{"rendered":"<h1>KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT<\/h1>\n<p>KI-Modelle wie gro\u00dfe Sprachmodelle (LLMs) werden immer leistungsf\u00e4higer. Doch ihre Bereitstellung war bisher aufwendig und teuer. NVIDIA \u00e4ndert das jetzt mit einem Update f\u00fcr TensorRT-LLM.<\/p>\n<h2>TensorRT-LLM optimiert LLMs f\u00fcr den Einsatz<\/h2>\n<p>TensorRT-LLM ist eine Open-Source-Bibliothek von NVIDIA, die die Inferenzleistung von LLMs auf NVIDIA-Hardware beschleunigt und optimiert. Mit dem neuen Update 0.6.0 kommt Unterst\u00fctzung f\u00fcr weitere LLMs wie Mistral 7B und Nemotron-3 8B hinzu. Und dank Verbesserungen an DirectML l\u00e4uft die Inferenz jetzt bis zu 5x schneller.<\/p>\n<h2>LLM-APIs wie ChatGPT jetzt lokal nutzbar<\/h2>\n<p>Ein Wrapper f\u00fcr die OpenAI Chat API erm\u00f6glicht es jetzt, Anfragen an LLMs lokal auf dem Rechner auszuf\u00fchren. Das erh\u00f6ht die Privatsph\u00e4re und vermeidet Latenzzeiten durch Cloud-Kommunikation. Lokale Datenbest\u00e4nde k\u00f6nnen die Genauigkeit durch Retrieval-Augmented Generation verbessern.<\/p>\n<h2>Jetzt auf \u00fcber 100 Millionen PCs nutzbar<\/h2>\n<p>Durch die Unterst\u00fctzung aller GeForce RTX GPUs ab 8 GB VRAM kann TensorRT-LLM jetzt auf \u00fcber 100 Millionen Windows-Rechnern genutzt werden. KI-Entwicklung und -Inferenz wird damit erschwinglich und skalierbar. Die Zusammenarbeit zwischen NVIDIA und Microsoft macht diese gro\u00dfe Verf\u00fcgbarkeit m\u00f6glich.<\/p>\n<p>Mit TensorRT-LLM steht KI-Entwicklern und Anwendern eine leistungsstarke Inferenz-Engine zur Verf\u00fcgung. Lokale und private Nutzung von LLMs wird einfacher und schneller. Das Update ebnet den Weg f\u00fcr neue KI-Anwendungsf\u00e4lle im Mittelstand.<\/p>","protected":false},"excerpt":{"rendered":"<p>KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT KI-Modelle wie gro\u00dfe Sprachmodelle (LLMs) werden immer leistungsf\u00e4higer. Doch ihre Bereitstellung war bisher aufwendig und teuer. NVIDIA \u00e4ndert das jetzt mit einem Update f\u00fcr TensorRT-LLM. TensorRT-LLM optimiert LLMs f\u00fcr den Einsatz TensorRT-LLM ist eine Open-Source-Bibliothek von NVIDIA, die die Inferenzleistung von LLMs auf NVIDIA-Hardware beschleunigt und optimiert. Mit dem [&hellip;]<\/p>","protected":false},"author":1,"featured_media":1799,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","inline_featured_image":false,"footnotes":""},"categories":[22],"tags":[],"dipi_cpt_category":[],"class_list":["post-1800","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ki-mittelstandsjournal"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.12 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/patris.ai\/en\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris\" \/>\n<meta property=\"og:description\" content=\"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT KI-Modelle wie gro\u00dfe Sprachmodelle (LLMs) werden immer leistungsf\u00e4higer. Doch ihre Bereitstellung war bisher aufwendig und teuer. NVIDIA \u00e4ndert das jetzt mit einem Update f\u00fcr TensorRT-LLM. TensorRT-LLM optimiert LLMs f\u00fcr den Einsatz TensorRT-LLM ist eine Open-Source-Bibliothek von NVIDIA, die die Inferenzleistung von LLMs auf NVIDIA-Hardware beschleunigt und optimiert. Mit dem [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/patris.ai\/en\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/\" \/>\n<meta property=\"og:site_name\" content=\"patris\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-16T04:38:07+00:00\" \/>\n<meta name=\"author\" content=\"patrisadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"patrisadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/\",\"url\":\"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/\",\"name\":\"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris\",\"isPartOf\":{\"@id\":\"https:\/\/patris.ai\/#website\"},\"datePublished\":\"2023-11-16T04:38:07+00:00\",\"dateModified\":\"2023-11-16T04:38:07+00:00\",\"author\":{\"@id\":\"https:\/\/patris.ai\/#\/schema\/person\/935c3e0824e7d20aaeda8d2067367bdb\"},\"breadcrumb\":{\"@id\":\"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\/\/patris.ai\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/patris.ai\/#website\",\"url\":\"https:\/\/patris.ai\/\",\"name\":\"patris\",\"description\":\"Einfache KI L\u00f6sung f\u00fcr den Mittelstand\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/patris.ai\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/patris.ai\/#\/schema\/person\/935c3e0824e7d20aaeda8d2067367bdb\",\"name\":\"patrisadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/patris.ai\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1b9bfe077f0b85aa7e9312b80d645ef69e4a72040c389d0211eb846749c86341?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1b9bfe077f0b85aa7e9312b80d645ef69e4a72040c389d0211eb846749c86341?s=96&d=mm&r=g\",\"caption\":\"patrisadmin\"},\"sameAs\":[\"https:\/\/patris.ai\"],\"url\":\"https:\/\/patris.ai\/en\/blog\/author\/patrisadmin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/patris.ai\/en\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/","og_locale":"en_US","og_type":"article","og_title":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris","og_description":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT KI-Modelle wie gro\u00dfe Sprachmodelle (LLMs) werden immer leistungsf\u00e4higer. Doch ihre Bereitstellung war bisher aufwendig und teuer. NVIDIA \u00e4ndert das jetzt mit einem Update f\u00fcr TensorRT-LLM. TensorRT-LLM optimiert LLMs f\u00fcr den Einsatz TensorRT-LLM ist eine Open-Source-Bibliothek von NVIDIA, die die Inferenzleistung von LLMs auf NVIDIA-Hardware beschleunigt und optimiert. Mit dem [&hellip;]","og_url":"https:\/\/patris.ai\/en\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/","og_site_name":"patris","article_published_time":"2023-11-16T04:38:07+00:00","author":"patrisadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"patrisadmin","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/","url":"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/","name":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT - patris","isPartOf":{"@id":"https:\/\/patris.ai\/#website"},"datePublished":"2023-11-16T04:38:07+00:00","dateModified":"2023-11-16T04:38:07+00:00","author":{"@id":"https:\/\/patris.ai\/#\/schema\/person\/935c3e0824e7d20aaeda8d2067367bdb"},"breadcrumb":{"@id":"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/patris.ai\/blog\/ki-inferenz-erhaelt-leistungsschub-durch-nvidia-tensorrt\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/patris.ai\/"},{"@type":"ListItem","position":2,"name":"KI-Inferenz erh\u00e4lt Leistungsschub durch NVIDIA TensorRT"}]},{"@type":"WebSite","@id":"https:\/\/patris.ai\/#website","url":"https:\/\/patris.ai\/","name":"patris","description":"Einfache KI L\u00f6sung f\u00fcr den Mittelstand","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/patris.ai\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/patris.ai\/#\/schema\/person\/935c3e0824e7d20aaeda8d2067367bdb","name":"patrisadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/patris.ai\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1b9bfe077f0b85aa7e9312b80d645ef69e4a72040c389d0211eb846749c86341?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1b9bfe077f0b85aa7e9312b80d645ef69e4a72040c389d0211eb846749c86341?s=96&d=mm&r=g","caption":"patrisadmin"},"sameAs":["https:\/\/patris.ai"],"url":"https:\/\/patris.ai\/en\/blog\/author\/patrisadmin\/"}]}},"_links":{"self":[{"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/posts\/1800","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/comments?post=1800"}],"version-history":[{"count":0,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/posts\/1800\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/media\/1799"}],"wp:attachment":[{"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/media?parent=1800"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/categories?post=1800"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/tags?post=1800"},{"taxonomy":"dipi_cpt_category","embeddable":true,"href":"https:\/\/patris.ai\/en\/wp-json\/wp\/v2\/dipi_cpt_category?post=1800"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}