各种AI框架之间的关系-iaspnetcore.com

sherpa-onnx icefall k2 体系 funasr PyTorch 框架之间的关系

大框架
ai 模型分类
- OpenAI公司开发的模型
- google
ai智能体 skill mcp ai cli ai agent 区别

大框架

深度学习底层框架
│
├── PyTorch
│
├── k2  （图计算 / FSA工具）
│     │
│     └── icefall （ASR训练框架）
│            │
│            └── sherpa / sherpa-onnx （推理部署框架）
│
└── FunASR （另一条 ASR 体系）

PyTorch：定位通用深度学习框架，所有模型最终都是基于它训练出来的。是所有体系共同底层。

如：在语音领域

icefall 使用 PyTorch
FunASR 使用 PyTorch
Whisper 使用 PyTorc

k2 体系：k定位语音识别专用数学计算库

https://k2-fsa.github.io/sherpa/onnx/python/install.html#method-1-from-pre-compiled-wheels-cpu-only

icefall：定位基于 k2 的 ASR训练框架。ASR训练平台

如：小米团队的Zipformer asr模型基于它。

sherpa ：定位ASR 部署推理框架。负责把训练好的模型实际运行

sherpa-onnx：定位 sherpa 的轻量跨平台推理版本

FunASR：阿里语音识别完整体系，是一整套独立语音体系。

FunASR包含：
训练框架
推理框架
模型集合

FunASR = 训练 + 推理 + 模型一体化

ai 模型分类

现在模型太多了，很容易混乱。模型越来越多，每家公司都在发模型

ai模型分商用模型和开源模型

所有模型 = 3大类 + 1个平台

① 通用大模型（多模态） -GPT‑5.4，Claude 4.5
② 专用模型（专项）- GPT‑5.3‑Codex（写代码）
③ 开源模型（自己可控）- LLaMA 3
④ 平台（把上面三种用好，内部调度多个模型）-Cursor，GitHub Copilot，ChatGPT

还有视觉，asr，编程，写文档，多模态等等，各家公司的

所有公司都在做这6件事，只是强项不同

AI能力 = 6大模块

① 文本/推理（大脑）
② 编程（工程能力）
③ 视觉（看图/视频）
④ 语音（ASR + TTS）
⑤ 多模态（统一理解）-  模型正在融合上面的所有能力
⑥ Agent（自动干活）

OpenAI公司开发的模型

GPT 系列
├─ GPT-5.5
├─ GPT-5.2
├─ GPT-5.1
└─ GPT-5

推理系列
├─ o3
├─ o3-pro
├─ o4-mini
└─ o4-mini-high

编程系列
├─ GPT-5 Codex
├─ GPT-5.1 Codex
└─ GPT-5.2 Codex

图像系列
└─ DALL·E 3

语音系列
├─ Whisper
├─ TTS
└─ Realtime

向量系列
├─ Embedding-3-small
└─ Embedding-3-large

google

谷歌的模型分成两条线：闭源商用（Gemini 系列）和开源免费（Gemma 系列）

Gemini系列（谷歌的闭源商用版）
Gemini Ultra

Gemini Pro

Gemini Flash

Gemini Nano：运行在手机芯片（NPU）上，完全不联网

Gemma 系列（谷歌的开源免费版）

ai智能体 skill mcp ai cli ai agent 区别

1. Tool (工具) — The Atomic Action

A Tool is a single, executable function exposed to an LLM

2. Skill (技能) — The Specific Capability of an Agent

A Skill is a modular package of domain-specific logic, prompts, and Tools bundled together.

Example: A "Git DevOps Skill" (which bundles git_clone, git_commit, and git_push tools along with prompt templates on how to write good commit messages).

3. AI Agent (AI 智能体) — The Autonomous System

An AI Agent is an autonomous system driven by an LLM

decide which Skills or Tools to use

4.AI CLI (命令行界面) — The Terminal Interface

Example: Running ai "check docker container logs for errors" in your terminal.

Skill、Tool、MCP、CLI、Agent 属于不同层级

┌─────────────────────────────┐
│         AI Agent            │  ← 大脑
├─────────────────────────────┤
│          Skills             │  ← 工作流程/SOP
├─────────────────────────────┤
│     Tools / MCP Tools       │  ← 能力接口
├─────────────────────────────┤
│     CLI / API / Browser     │  ← 实际执行层
└─────────────────────────────┘

Table of Contents

大框架

ai 模型分类

OpenAI公司开发的模型

google

ai智能体 skill mcp ai cli ai agent 区别