[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-GokuMohandas--Made-With-ML":3,"tool-GokuMohandas--Made-With-ML":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":75,"owner_website":79,"owner_url":81,"languages":82,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":23,"env_os":103,"env_gpu":104,"env_ram":105,"env_deps":106,"category_tags":118,"github_topics":119,"view_count":132,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":166},223,"GokuMohandas\u002FMade-With-ML","Made-With-ML","Learn how to develop, deploy and iterate on production-grade ML applications.","Made-With-ML 是一个面向实战的开源项目，旨在帮助开发者系统掌握从设计、开发到部署和迭代生产级机器学习应用的完整流程。它解决了许多人在学习机器学习时“会训练模型但不会上线”的痛点，强调将软件工程最佳实践与 ML 技术结合，构建可靠、可维护的端到端系统。\n\n该项目特别适合三类人群：一是希望将模型真正落地的开发者（包括软件工程师、数据科学家）；二是刚毕业、想补齐工业界所需技能的学生；三是需要理解技术边界以更好推动产品的技术管理者或产品经理。\n\nMade-With-ML 的亮点在于注重第一性原理讲解，避免盲目调包；同时覆盖 MLOps 关键环节（如实验跟踪、模型测试、服务部署、CI\u002FCD 等），并支持在 Python 生态内平滑扩展训练与推理任务，无需切换语言或复杂基础设施。课程内容结构清晰，配有详细代码示例和视频导览，兼顾理论深度与工程实用性。","\u003Cdiv align=\"center\">\n\u003Ch1>\u003Cimg width=\"30\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_aecae356305b.png\">&nbsp;\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F\">Made With ML\u003C\u002Fa>\u003C\u002Fh1>\nDesign · Develop · Deploy · Iterate\n\u003Cbr>\nJoin 40K+ developers in learning how to responsibly deliver value with ML.\n    \u003Cbr>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fmadewithml.com\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-40K-brightgreen\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGokuMohandas\u002FMade-With-ML.svg?style=social&label=Star\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fgoku\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstyle--5eba00.svg?label=LinkedIn&logo=linkedin&style=social\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Ftwitter.com\u002FGokuMohandas\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FGokuMohandas.svg?label=Follow&style=social\">\u003C\u002Fa>\n    \u003Cbr>\n    🔥&nbsp; Among the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\" target=\"_blank\">top ML repositories\u003C\u002Fa> on GitHub\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\u003Chr>\n\n## Lessons\n\nLearn how to combine machine learning with software engineering to design, develop, deploy and iterate on production-grade ML applications.\n\n- Lessons: https:\u002F\u002Fmadewithml.com\u002F\n- Code: [GokuMohandas\u002FMade-With-ML](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML)\n\n\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F#course\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_f16d31b7ff13.png\" alt=\"lessons\">\n\u003C\u002Fa>\n\n## Overview\n\nIn this course, we'll go from experimentation (design + development) to production (deployment + iteration). We'll do this iteratively by motivating the components that will enable us to build a *reliable* production system.\n\n\u003Cblockquote>\n  \u003Cimg width=20 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F0\u002F09\u002FYouTube_full-color_icon_%282017%29.svg\u002F640px-YouTube_full-color_icon_%282017%29.svg.png\">&nbsp; Be sure to watch the video below for a quick overview of what we'll be building.\n\u003C\u002Fblockquote>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FAWgkt8H8yVo\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_04d2f4d3e599.jpg\" alt=\"Course overview video\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n- **💡 First principles**: before we jump straight into the code, we develop a first principles understanding for every machine learning concept.\n- **💻 Best practices**: implement software engineering best practices as we develop and deploy our machine learning models.\n- **📈 Scale**: easily scale ML workloads (data, train, tune, serve) in Python without having to learn completely new languages.\n- **⚙️ MLOps**: connect MLOps components (tracking, testing, serving, orchestration, etc.) as we build an end-to-end machine learning system.\n- **🚀 Dev to Prod**: learn how to quickly and reliably go from development to production without any changes to our code or infra management.\n- **🐙 CI\u002FCD**: learn how to create mature CI\u002FCD workflows to continuously train and deploy better models in a modular way that integrates with any stack.\n\n## Audience\n\nMachine learning is not a separate industry, instead, it's a powerful way of thinking about data that's not reserved for any one type of person.\n\n- **👩‍💻 All developers**: whether software\u002Finfra engineer or data scientist, ML is increasingly becoming a key part of the products that you'll be developing.\n- **👩‍🎓 College graduates**: learn the practical skills required for industry and bridge gap between the university curriculum and what industry expects.\n- **👩‍💼 Product\u002FLeadership**: who want to develop a technical foundation so that they can build amazing (and reliable) products powered by machine learning.\n\n## Set up\n\nBe sure to go through the [course](https:\u002F\u002Fmadewithml\u002F#course) for a much more detailed walkthrough of the content on this repository. We will have instructions for both local laptop and Anyscale clusters for the sections below, so be sure to toggle the ► dropdown based on what you're using (Anyscale instructions will be toggled on by default). If you do want to run this course with Anyscale, where we'll provide the **structure**, **compute (GPUs)** and **community** to learn everything in one day, join our next upcoming live cohort → [sign up here](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml)!\n\n### Cluster\n\nWe'll start by setting up our cluster with the environment and compute configurations.\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n  Your personal laptop (single machine) will act as the cluster, where one CPU will be the head node and some of the remaining CPU will be the worker nodes. All of the code in this course will work in any personal laptop though it will be slower than executing the same workloads on a larger cluster.\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  We can create an [Anyscale Workspace](https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started) using the [webpage UI](https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fworkspaces\u002Fadd\u002Fblank).\n\n  ```md\n  - Workspace name: `madewithml`\n  - Project: `madewithml`\n  - Cluster environment name: `madewithml-cluster-env`\n  # Toggle `Select from saved configurations`\n  - Compute config: `madewithml-cluster-compute-g5.4xlarge`\n  ```\n\n  > Alternatively, we can use the [CLI](https:\u002F\u002Fdocs.anyscale.com\u002Freference\u002Fanyscale-cli) to create the workspace via `anyscale workspace create ...`\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>Other (cloud platforms, K8s, on-prem)\u003C\u002Fsummary>\u003Cbr>\n\n  If you don't want to do this course locally or via Anyscale, you have the following options:\n\n  - On [AWS and GCP](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index). Community-supported Azure and Aliyun integrations also exist.\n  - On [Kubernetes](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fkubernetes\u002Findex.html#kuberay-index), via the officially supported KubeRay project.\n  - Deploy Ray manually [on-prem](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) or onto platforms [not listed here](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Fcommunity\u002Findex.html#ref-cluster-setup).\n\n\u003C\u002Fdetails>\n\n### Git setup\n\nCreate a repository by following these instructions: [Create a new repository](https:\u002F\u002Fgithub.com\u002Fnew) → name it `Made-With-ML` → Toggle `Add a README file` (**very important** as this creates a `main` branch) → Click `Create repository` (scroll down)\n\nNow we're ready to clone the repository that has all of our code:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\n```\n\n### Credentials\n\n```bash\ntouch .env\n```\n```bash\n# Inside .env\nGITHUB_USERNAME=\"CHANGE_THIS_TO_YOUR_USERNAME\"  # ← CHANGE THIS\n```\n```bash\nsource .env\n```\n\n### Virtual environment\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  python3 -m venv venv  # recommend using Python 3.10\n  source venv\u002Fbin\u002Factivate  # on Windows: venv\\Scripts\\activate\n  python3 -m pip install --upgrade pip setuptools wheel\n  python3 -m pip install -r requirements.txt\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n  > Highly recommend using Python `3.10` and using [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv) (mac) or [pyenv-win](https:\u002F\u002Fgithub.com\u002Fpyenv-win\u002Fpyenv-win) (windows).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  Our environment with the appropriate Python version and libraries is already all set for us through the cluster environment we used when setting up our Anyscale Workspace. So we just need to run these commands:\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n\u003C\u002Fdetails>\n\n## Notebook\n\nStart by exploring the [jupyter notebook](notebooks\u002Fmadewithml.ipynb) to interactively walkthrough the core machine learning workloads.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_f635c53d8169.png\">\n\u003C\u002Fdiv>\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # Start notebook\n  jupyter lab notebooks\u002Fmadewithml.ipynb\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  Click on the Jupyter icon &nbsp;\u003Cimg width=15 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F3\u002F38\u002FJupyter_logo.svg\u002F1200px-Jupyter_logo.svg.png\">&nbsp; at the top right corner of our Anyscale Workspace page and this will open up our JupyterLab instance in a new tab. Then navigate to the `notebooks` directory and open up the `madewithml.ipynb` notebook.\n\n\u003C\u002Fdetails>\n\n\n## Scripts\n\nNow we'll execute the same workloads using the clean Python scripts following software engineering best practices (testing, documentation, logging, serving, versioning, etc.) The code we've implemented in our notebook will be refactored into the following scripts:\n\n```bash\nmadewithml\n├── config.py\n├── data.py\n├── evaluate.py\n├── models.py\n├── predict.py\n├── serve.py\n├── train.py\n├── tune.py\n└── utils.py\n```\n\n**Note**: Change the `--num-workers`, `--cpu-per-worker`, and `--gpu-per-worker` input argument values below based on your system's resources. For example, if you're on a local laptop, a reasonable configuration would be `--num-workers 6 --cpu-per-worker 1 --gpu-per-worker 0`.\n\n### Training\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n### Tuning\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\nexport INITIAL_PARAMS=\"[{\\\"train_loop_config\\\": $TRAIN_LOOP_CONFIG}]\"\npython madewithml\u002Ftune.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --initial-params \"$INITIAL_PARAMS\" \\\n    --num-runs 2 \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftuning_results.json\n```\n\n### Experiment tracking\n\nWe'll use [MLflow](https:\u002F\u002Fmlflow.org\u002F) to track our experiments and store our models and the [MLflow Tracking UI](https:\u002F\u002Fwww.mlflow.org\u002Fdocs\u002Flatest\u002Ftracking.html#tracking-ui) to view our experiments. We have been saving our experiments to a local directory but note that in an actual production setting, we would have a central location to store all of our experiments. It's easy\u002Finexpensive to spin up your own MLflow server for all of your team members to track their experiments on or use a managed solution like [Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite), [Comet](https:\u002F\u002Fwww.comet.ml\u002F), etc.\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  If you're running this notebook on your local laptop then head on over to \u003Ca href=\"http:\u002F\u002Flocalhost:8080\u002F\" target=\"_blank\">http:\u002F\u002Flocalhost:8080\u002F\u003C\u002Fa> to view your MLflow dashboard.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  If you're on \u003Ca href=\"https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started\" target=\"_blank\">Anyscale Workspaces\u003C\u002Fa>, then we need to first expose the port of the MLflow server. Run the following command on your Anyscale Workspace terminal to generate the public URL to your MLflow server.\n\n  ```bash\n  APP_PORT=8080\n  echo https:\u002F\u002F$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN\n  ```\n\n\u003C\u002Fdetails>\n\n### Evaluation\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\npython madewithml\u002Fevaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results\u002Fevaluation_results.json\n```\n```json\n{\n  \"timestamp\": \"June 09, 2023 09:26:18 AM\",\n  \"run_id\": \"6149e3fec8d24f1492d4a4cabd5c06f6\",\n  \"overall\": {\n    \"precision\": 0.9076136428670714,\n    \"recall\": 0.9057591623036649,\n    \"f1\": 0.9046792827719773,\n    \"num_samples\": 191.0\n  },\n...\n```\n\n### Inference\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npython madewithml\u002Fpredict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"Transfer learning with transformers\" \\\n    --description \"Using transformers for transfer learning on text classification tasks.\"\n```\n```json\n[{\n  \"prediction\": [\n    \"natural-language-processing\"\n  ],\n  \"probabilities\": {\n    \"computer-vision\": 0.0009767753,\n    \"mlops\": 0.0008223939,\n    \"natural-language-processing\": 0.99762577,\n    \"other\": 0.000575123\n  }\n}]\n```\n\n### Serving\n\n\u003Cdetails>\n  \u003Csummary>Local\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # Start\n  ray start --head\n  ```\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  Once the application is running, we can use it via cURL, Python, etc.:\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n  ```bash\n  ray stop  # shutdown\n  ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  In Anyscale Workspaces, Ray is already running so we don't have to manually start\u002Fshutdown like we have to do locally.\n\n  ```bash\n  # Set up\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  Once the application is running, we can use it via cURL, Python, etc.:\n\n  ```python\n  # via Python\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n\u003C\u002Fdetails>\n\n### Testing\n```bash\n# Code\npython3 -m pytest tests\u002Fcode --verbose --disable-warnings\n\n# Data\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\npytest --dataset-loc=$DATASET_LOC tests\u002Fdata --verbose --disable-warnings\n\n# Model\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npytest --run-id=$RUN_ID tests\u002Fmodel --verbose --disable-warnings\n\n# Coverage\npython3 -m pytest tests\u002Fcode --cov madewithml --cov-report html --disable-warnings  # html report\npython3 -m pytest tests\u002Fcode --cov madewithml --cov-report term --disable-warnings  # terminal report\n```\n\n## Production\n\nFrom this point onwards, in order to deploy our application into production, we'll need to either be on Anyscale or on a [cloud VM](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) \u002F [on-prem](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem) cluster you manage yourself (w\u002F Ray). If not on Anyscale, the commands will be [slightly different](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Frunning-applications\u002Fjob-submission\u002Findex.html) but the concepts will be the same.\n\n> If you don't want to set up all of this yourself, we highly recommend joining our [upcoming live cohort](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml){:target=\"_blank\"} where we'll provide an environment with all of this infrastructure already set up for you so that you just focused on the machine learning.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_4fd7296b3b49.png\">\n\u003C\u002Fdiv>\n\n### Authentication\n\nThese credentials below are **automatically** set for us if we're using Anyscale Workspaces. We **do not** need to set these credentials explicitly on Workspaces but we do if we're running this locally or on a cluster outside of where our Anyscale Jobs and Services are configured to run.\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from Anyscale credentials page\n```\n\n### Cluster environment\n\nThe cluster environment determines **where** our workloads will be executed (OS, dependencies, etc.) We've already created this [cluster environment](.\u002Fdeploy\u002Fcluster_env.yaml) for us but this is how we can create\u002Fupdate one ourselves.\n\n```bash\nexport CLUSTER_ENV_NAME=\"madewithml-cluster-env\"\nanyscale cluster-env build deploy\u002Fcluster_env.yaml --name $CLUSTER_ENV_NAME\n```\n\n### Compute configuration\n\nThe compute configuration determines **what** resources our workloads will be executes on. We've already created this [compute configuration](.\u002Fdeploy\u002Fcluster_compute.yaml) for us but this is how we can create it ourselves.\n\n```bash\nexport CLUSTER_COMPUTE_NAME=\"madewithml-cluster-compute-g5.4xlarge\"\nanyscale cluster-compute create deploy\u002Fcluster_compute.yaml --name $CLUSTER_COMPUTE_NAME\n```\n\n### Anyscale jobs\n\nNow we're ready to execute our ML workloads. We've decided to combine them all together into one [job](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) but we could have also created separate jobs for each workload (train, evaluate, etc.) We'll start by editing the `$GITHUB_USERNAME` slots inside our [`workloads.yaml`](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) file:\n```yaml\nruntime_env:\n  working_dir: .\n  upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fjobs  # \u003C--- CHANGE USERNAME (case-sensitive)\n  env_vars:\n    GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- CHANGE USERNAME (case-sensitive)\n```\n\nThe `runtime_env` here specifies that we should upload our current `working_dir` to an S3 bucket so that all of our workers when we execute an Anyscale Job have access to the code to use. The `GITHUB_USERNAME` is used later to save results from our workloads to S3 so that we can retrieve them later (ex. for serving).\n\nNow we're ready to submit our job to execute our ML workloads:\n```bash\nanyscale job submit deploy\u002Fjobs\u002Fworkloads.yaml\n```\n\n### Anyscale Services\n\nAnd after our ML workloads have been executed, we're ready to launch our serve our model to production. Similar to our Anyscale Jobs configs, be sure to change the `$GITHUB_USERNAME` in [`serve_model.yaml`](.\u002Fdeploy\u002Fservices\u002Fserve_model.yaml).\n\n```yaml\nray_serve_config:\n  import_path: deploy.services.serve_model:entrypoint\n  runtime_env:\n    working_dir: .\n    upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fservices  # \u003C--- CHANGE USERNAME (case-sensitive)\n    env_vars:\n      GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- CHANGE USERNAME (case-sensitive)\n```\n\nNow we're ready to launch our service:\n```bash\n# Rollout service\nanyscale service rollout -f deploy\u002Fservices\u002Fserve_model.yaml\n\n# Query\ncurl -X POST -H \"Content-Type: application\u002Fjson\" -H \"Authorization: Bearer $SECRET_TOKEN\" -d '{\n  \"title\": \"Transfer learning with transformers\",\n  \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n}' $SERVICE_ENDPOINT\u002Fpredict\u002F\n\n# Rollback (to previous version of the Service)\nanyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME\n\n# Terminate\nanyscale service terminate --name $SERVICE_NAME\n```\n\n### CI\u002FCD\n\nWe're not going to manually deploy our application every time we make a change. Instead, we'll automate this process using GitHub Actions!\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_298a6c8e97a7.png\">\n\u003C\u002Fdiv>\n\n1. Create a new github branch to save our changes to and execute CI\u002FCD workloads:\n```bash\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002F$GITHUB_USERNAME\u002FMade-With-ML.git  # \u003C-- CHANGE THIS to your username\ngit checkout -b dev\n```\n\n2. We'll start by adding the necessary credentials to the [`\u002Fsettings\u002Fsecrets\u002Factions`](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fsettings\u002Fsecrets\u002Factions) page of our GitHub repository.\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # retrieved from https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fcredentials\n```\n\n3. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:\n\n```bash\ngit config --global user.name $GITHUB_USERNAME  # \u003C-- CHANGE THIS to your username\ngit config --global user.email you@example.com  # \u003C-- CHANGE THIS to your email\ngit add .\ngit commit -m \"\"  # \u003C-- CHANGE THIS to your message\ngit push origin dev\n```\n\nNow you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens\u002Fnew) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.\n\n4. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](\u002F.github\u002Fworkflows\u002Fworkloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_57eb341e3ec9.png\">\n\u003C\u002Fdiv>\n\n5. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](\u002F.github\u002Fworkflows\u002Fserve.yaml) which will rollout our new service to production!\n\n### Continual learning\n\nWith our CI\u002FCD workflow in place to deploy our application, we can now focus on continually improving our model. It becomes really easy to extend on this foundation to connect to scheduled runs (cron), [data pipelines](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fdata-engineering\u002F), drift detected through [monitoring](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fmonitoring\u002F), [online evaluation](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fevaluation\u002F#online-evaluation), etc. And we can easily add additional context such as comparing any experiment with what's currently in production (directly in the PR even), etc.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_35e790de5108.png\">\n\u003C\u002Fdiv>\n\n## FAQ\n\n### Jupyter notebook kernels\n\nIssues with configuring the notebooks with jupyter? By default, jupyter will use the kernel with our virtual environment but we can also manually add it to jupyter:\n```bash\npython3 -m ipykernel install --user --name=venv\n```\nNow we can open up a notebook → Kernel (top menu bar) → Change Kernel → `venv`. To ever delete this kernel, we can do the following:\n```bash\njupyter kernelspec list\njupyter kernelspec uninstall venv\n```\n","\u003Cdiv align=\"center\">\n\u003Ch1>\u003Cimg width=\"30\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_aecae356305b.png\">&nbsp;\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F\">Made With ML\u003C\u002Fa>\u003C\u002Fh1>\n设计 · 开发 · 部署 · 迭代\n\u003Cbr>\n加入 40,000 多名开发者，学习如何负责任地通过机器学习（ML）交付价值。\n    \u003Cbr>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fmadewithml.com\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSubscribe-40K-brightgreen\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGokuMohandas\u002FMade-With-ML.svg?style=social&label=Star\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fgoku\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstyle--5eba00.svg?label=LinkedIn&logo=linkedin&style=social\">\u003C\u002Fa>&nbsp;\n    \u003Ca target=\"_blank\" href=\"https:\u002F\u002Ftwitter.com\u002FGokuMohandas\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FGokuMohandas.svg?label=Follow&style=social\">\u003C\u002Fa>\n    \u003Cbr>\n    🔥&nbsp; GitHub 上\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\" target=\"_blank\">顶级 ML 仓库\u003C\u002Fa>之一\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\u003Chr>\n\n## 课程内容\n\n学习如何将机器学习与软件工程相结合，以设计、开发、部署并迭代生产级（production-grade）的机器学习应用。\n\n- 课程网站：https:\u002F\u002Fmadewithml.com\u002F\n- 代码仓库：[GokuMohandas\u002FMade-With-ML](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML)\n\n\u003Ca href=\"https:\u002F\u002Fmadewithml.com\u002F#course\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_f16d31b7ff13.png\" alt=\"lessons\">\n\u003C\u002Fa>\n\n## 概览\n\n在本课程中，我们将从实验阶段（设计 + 开发）过渡到生产阶段（部署 + 迭代）。我们会通过迭代的方式，逐步引入构建一个**可靠**生产系统所需的各个组件。\n\n\u003Cblockquote>\n  \u003Cimg width=20 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F0\u002F09\u002FYouTube_full-color_icon_%282017%29.svg\u002F640px-YouTube_full-color_icon_%282017%29.svg.png\">&nbsp; 务必观看下方视频，快速了解我们将要构建的内容。\n\u003C\u002Fblockquote>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FAWgkt8H8yVo\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_04d2f4d3e599.jpg\" alt=\"Course overview video\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n- **💡 第一性原理（First principles）**：在直接写代码之前，我们先从第一性原理出发，深入理解每一个机器学习概念。\n- **💻 最佳实践（Best practices）**：在开发和部署机器学习模型的过程中，贯彻软件工程的最佳实践。\n- **📈 可扩展性（Scale）**：无需学习全新语言，即可轻松在 Python 中扩展机器学习工作负载（数据处理、训练、调参、服务等）。\n- **⚙️ MLOps**：在构建端到端机器学习系统的过程中，集成 MLOps 组件（如实验跟踪、测试、模型服务、任务编排等）。\n- **🚀 从开发到生产（Dev to Prod）**：学习如何在不更改代码或基础设施管理的前提下，快速且可靠地将模型从开发环境迁移到生产环境。\n- **🐙 CI\u002FCD**：学习如何构建成熟的持续集成\u002F持续部署（CI\u002FCD）工作流，以模块化方式持续训练和部署更优模型，并可与任意技术栈集成。\n\n## 目标受众\n\n机器学习并非一个独立的行业，而是一种强大的数据思维方式，并非仅限于某一类人群。\n\n- **👩‍💻 所有开发者**：无论是软件\u002F基础设施工程师还是数据科学家，机器学习正日益成为你所开发产品中的关键组成部分。\n- **👩‍🎓 应届毕业生**：学习业界所需的实用技能，弥合大学课程与行业实际需求之间的差距。\n- **👩‍💼 产品经理\u002F管理者**：希望打下技术基础，从而能够构建由机器学习驱动的卓越（且可靠）产品。\n\n## 环境设置\n\n请务必访问[课程网站](https:\u002F\u002Fmadewithml\u002F#course)，以获取对本仓库内容更详细的分步讲解。以下各节均提供本地笔记本电脑和 Anyscale 集群两种操作说明，请根据你的使用环境点击 ► 下拉菜单进行切换（默认显示 Anyscale 的说明）。如果你想通过 Anyscale 学习本课程，我们将为你提供**结构化内容**、**计算资源（GPU）** 和**社区支持**，助你在一天内掌握全部内容。欢迎报名参加我们的下一期实时课程 → [立即注册](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml)！\n\n### 集群设置\n\n首先，我们需要配置集群的环境和计算资源。\n\n\u003Cdetails>\n  \u003Csummary>本地（Local）\u003C\u002Fsummary>\u003Cbr>\n  你的个人笔记本电脑（单机）将作为集群使用，其中一颗 CPU 作为主节点（head node），其余部分 CPU 作为工作节点（worker nodes）。本课程中的所有代码均可在普通笔记本电脑上运行，但执行速度会比在大型集群上慢。\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  我们可以通过 [Anyscale Workspace](https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started) 的 [网页界面](https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fworkspaces\u002Fadd\u002Fblank) 创建工作区。\n\n  ```md\n  - Workspace name: `madewithml`\n  - Project: `madewithml`\n  - Cluster environment name: `madewithml-cluster-env`\n  # 勾选 `Select from saved configurations`\n  - Compute config: `madewithml-cluster-compute-g5.4xlarge`\n  ```\n\n  > 或者，也可以使用 [CLI](https:\u002F\u002Fdocs.anyscale.com\u002Freference\u002Fanyscale-cli) 通过命令 `anyscale workspace create ...` 创建工作区。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>其他（云平台、K8s、本地部署）\u003C\u002Fsummary>\u003Cbr>\n\n  如果你不想在本地或通过 Anyscale 学习本课程，还可以选择以下方式：\n\n  - 在 [AWS 和 GCP](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) 上部署。社区也提供了对 Azure 和阿里云的支持。\n  - 在 [Kubernetes](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fkubernetes\u002Findex.html#kuberay-index) 上部署，通过官方支持的 KubeRay 项目。\n  - 手动在[本地环境（on-prem）](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem)部署 Ray，或部署到[此处未列出的其他平台](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Fcommunity\u002Findex.html#ref-cluster-setup)。\n\n\u003C\u002Fdetails>\n\n### Git 设置\n\n请按以下步骤创建一个新仓库：[Create a new repository](https:\u002F\u002Fgithub.com\u002Fnew) → 仓库名称设为 `Made-With-ML` → 勾选 `Add a README file`（**非常重要**，这会自动创建 `main` 分支）→ 点击 `Create repository`（滚动到底部）\n\n现在我们可以克隆包含所有代码的仓库了：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\n```\n\n### 凭据配置\n\n```bash\ntouch .env\n```\n```bash\n# Inside .env\nGITHUB_USERNAME=\"CHANGE_THIS_TO_YOUR_USERNAME\"  # ← 请修改为你自己的用户名\n```\n```bash\nsource .env\n```\n\n### 虚拟环境（Virtual environment）\n\n\u003Cdetails>\n  \u003Csummary>本地（Local）\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  python3 -m venv venv  # 推荐使用 Python 3.10\n  source venv\u002Fbin\u002Factivate  # Windows 上使用：venv\\Scripts\\activate\n  python3 -m pip install --upgrade pip setuptools wheel\n  python3 -m pip install -r requirements.txt\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n  > 强烈建议使用 Python `3.10`，并使用 [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv)（mac）或 [pyenv-win](https:\u002F\u002Fgithub.com\u002Fpyenv-win\u002Fpyenv-win)（Windows）。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  我们在设置 Anyscale Workspace 时所使用的集群环境已经为我们配置好了合适的 Python 版本和相关库。因此我们只需运行以下命令：\n  ```bash\n  export PYTHONPATH=$PYTHONPATH:$PWD\n  pre-commit install\n  pre-commit autoupdate\n  ```\n\n\u003C\u002Fdetails>\n\n## Notebook\n\n首先通过 [Jupyter Notebook](notebooks\u002Fmadewithml.ipynb) 进行交互式探索，逐步了解核心的机器学习工作负载（workloads）。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_f635c53d8169.png\">\n\u003C\u002Fdiv>\n\n\u003Cdetails>\n  \u003Csummary>本地（Local）\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # 启动 notebook\n  jupyter lab notebooks\u002Fmadewithml.ipynb\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  点击 Anyscale Workspace 页面右上角的 Jupyter 图标 &nbsp;\u003Cimg width=15 src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F3\u002F38\u002FJupyter_logo.svg\u002F1200px-Jupyter_logo.svg.png\">&nbsp;，这将在新标签页中打开 JupyterLab 实例。然后导航至 `notebooks` 目录并打开 `madewithml.ipynb` notebook。\n\n\u003C\u002Fdetails>\n\n\n## 脚本（Scripts）\n\n接下来，我们将使用符合软件工程最佳实践（如测试、文档、日志记录、服务部署、版本控制等）的干净 Python 脚本来执行相同的工作负载。我们在 notebook 中实现的代码将被重构为以下脚本：\n\n```bash\nmadewithml\n├── config.py\n├── data.py\n├── evaluate.py\n├── models.py\n├── predict.py\n├── serve.py\n├── train.py\n├── tune.py\n└── utils.py\n```\n\n**注意**：请根据你的系统资源调整以下命令中的 `--num-workers`、`--cpu-per-worker` 和 `--gpu-per-worker` 参数值。例如，如果你在本地笔记本电脑上运行，一个合理的配置是 `--num-workers 6 --cpu-per-worker 1 --gpu-per-worker 0`。\n\n### 训练（Training）\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n### 调优（Tuning）\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\nexport INITIAL_PARAMS=\"[{\\\"train_loop_config\\\": $TRAIN_LOOP_CONFIG}]\"\npython madewithml\u002Ftune.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --initial-params \"$INITIAL_PARAMS\" \\\n    --num-runs 2 \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 1 \\\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftuning_results.json\n```\n\n### 实验跟踪（Experiment tracking）\n\n我们将使用 [MLflow](https:\u002F\u002Fmlflow.org\u002F) 来跟踪实验、存储模型，并通过 [MLflow Tracking UI](https:\u002F\u002Fwww.mlflow.org\u002Fdocs\u002Flatest\u002Ftracking.html#tracking-ui) 查看实验结果。目前我们将实验保存在本地目录中，但在实际生产环境中，我们会使用一个中心化的位置来存储所有实验数据。你可以轻松且低成本地搭建自己的 MLflow 服务器供团队成员使用，也可以选择托管解决方案，例如 [Weights & Biases](https:\u002F\u002Fwandb.ai\u002Fsite)、[Comet](https:\u002F\u002Fwww.comet.ml\u002F) 等。\n\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n\n\u003Cdetails>\n  \u003Csummary>本地（Local）\u003C\u002Fsummary>\u003Cbr>\n\n  如果你在本地笔记本电脑上运行此 notebook，请访问 \u003Ca href=\"http:\u002F\u002Flocalhost:8080\u002F\" target=\"_blank\">http:\u002F\u002Flocalhost:8080\u002F\u003C\u002Fa> 查看 MLflow 仪表板。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  如果你使用的是 \u003Ca href=\"https:\u002F\u002Fdocs.anyscale.com\u002Fdevelop\u002Fworkspaces\u002Fget-started\" target=\"_blank\">Anyscale Workspaces\u003C\u002Fa>，则需要先暴露 MLflow 服务器的端口。在 Anyscale Workspace 终端中运行以下命令，生成 MLflow 服务器的公开 URL。\n\n  ```bash\n  APP_PORT=8080\n  echo https:\u002F\u002F$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN\n  ```\n\n\u003C\u002Fdetails>\n\n### 评估（Evaluation）\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\nexport HOLDOUT_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fholdout.csv\"\npython madewithml\u002Fevaluate.py \\\n    --run-id $RUN_ID \\\n    --dataset-loc $HOLDOUT_LOC \\\n    --results-fp results\u002Fevaluation_results.json\n```\n```json\n{\n  \"timestamp\": \"June 09, 2023 09:26:18 AM\",\n  \"run_id\": \"6149e3fec8d24f1492d4a4cabd5c06f6\",\n  \"overall\": {\n    \"precision\": 0.9076136428670714,\n    \"recall\": 0.9057591623036649,\n    \"f1\": 0.9046792827719773,\n    \"num_samples\": 191.0\n  },\n...\n```\n\n### 推理（Inference）\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npython madewithml\u002Fpredict.py predict \\\n    --run-id $RUN_ID \\\n    --title \"Transfer learning with transformers\" \\\n    --description \"Using transformers for transfer learning on text classification tasks.\"\n```\n```json\n[{\n  \"prediction\": [\n    \"natural-language-processing\"\n  ],\n  \"probabilities\": {\n    \"computer-vision\": 0.0009767753,\n    \"mlops\": 0.0008223939,\n    \"natural-language-processing\": 0.99762577,\n    \"other\": 0.000575123\n  }\n}]\n```\n\n### 部署（Serving）\n\n\u003Cdetails>\n  \u003Csummary>本地（Local）\u003C\u002Fsummary>\u003Cbr>\n\n  ```bash\n  # 启动\n  ray start --head\n  ```\n\n  ```bash\n  # 设置\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  应用程序启动后，我们可以通过 cURL、Python 等方式调用它：\n\n  ```python\n  # 通过 Python 调用\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n  ```bash\n  ray stop  # 关闭\n  ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n  \u003Csummary>Anyscale\u003C\u002Fsummary>\u003Cbr>\n\n  在 Anyscale Workspaces 中，Ray 已经在运行，因此我们不需要像本地那样手动启动或关闭。\n\n  ```bash\n  # 设置\n  export EXPERIMENT_NAME=\"llm\"\n  export RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\n  python madewithml\u002Fserve.py --run_id $RUN_ID\n  ```\n\n  应用程序启动后，我们可以通过 cURL、Python 等方式调用它：\n\n  ```python\n  # 通过 Python 调用\n  import json\n  import requests\n  title = \"Transfer learning with transformers\"\n  description = \"Using transformers for transfer learning on text classification tasks.\"\n  json_data = json.dumps({\"title\": title, \"description\": description})\n  requests.post(\"http:\u002F\u002F127.0.0.1:8000\u002Fpredict\", data=json_data).json()\n  ```\n\n\u003C\u002Fdetails>\n\n### 测试（Testing）\n```bash\n# 代码测试\npython3 -m pytest tests\u002Fcode --verbose --disable-warnings\n\n# 数据测试\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\npytest --dataset-loc=$DATASET_LOC tests\u002Fdata --verbose --disable-warnings\n\n# 模型测试\nexport EXPERIMENT_NAME=\"llm\"\nexport RUN_ID=$(python madewithml\u002Fpredict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)\npytest --run-id=$RUN_ID tests\u002Fmodel --verbose --disable-warnings\n\n# 覆盖率测试\npython3 -m pytest tests\u002Fcode --cov madewithml --cov-report html --disable-warnings  # HTML 报告\npython3 -m pytest tests\u002Fcode --cov madewithml --cov-report term --disable-warnings  # 终端报告\n```\n\n## 生产环境（Production）\n\n从这一步开始，若要将我们的应用程序部署到生产环境，我们需要使用 Anyscale，或者使用你自行管理的 [云虚拟机（cloud VM）](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Findex.html#cloud-vm-index) \u002F [本地集群（on-prem cluster）](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Fvms\u002Fuser-guides\u002Flaunching-clusters\u002Fon-premises.html#on-prem)（需配合 Ray）。如果不在 Anyscale 上，命令会略有不同（参见 [文档](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Flatest\u002Fcluster\u002Frunning-applications\u002Fjob-submission\u002Findex.html)），但核心概念是一致的。\n\n> 如果你不想自己搭建所有这些基础设施，我们强烈推荐你加入我们即将推出的 [实时课程（live cohort）](https:\u002F\u002F4190urw86oh.typeform.com\u002Fmadewithml){:target=\"_blank\"}。我们将为你提供一个已配置好全部基础设施的环境，让你可以专注于机器学习本身。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_4fd7296b3b49.png\">\n\u003C\u002Fdiv>\n\n### 认证（Authentication）\n\n如果你使用的是 Anyscale Workspaces，以下凭据会**自动**为你设置。在 Workspaces 中，你**无需**显式设置这些凭据；但如果你在本地或 Anyscale Jobs 和 Services 配置范围之外的集群上运行，则需要手动设置。\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # 从 Anyscale 凭据页面获取\n```\n\n### 集群环境（Cluster environment）\n\n集群环境决定了我们的工作负载将在何处执行（操作系统、依赖项等）。我们已经为你创建了这个 [集群环境](.\u002Fdeploy\u002Fcluster_env.yaml)，但你也可以按如下方式自行创建或更新：\n\n```bash\nexport CLUSTER_ENV_NAME=\"madewithml-cluster-env\"\nanyscale cluster-env build deploy\u002Fcluster_env.yaml --name $CLUSTER_ENV_NAME\n```\n\n### 计算配置（Compute configuration）\n\n计算配置决定了我们的工作负载将在何种资源上执行。我们已经为你创建了这个 [计算配置](.\u002Fdeploy\u002Fcluster_compute.yaml)，但你也可以按如下方式自行创建：\n\n```bash\nexport CLUSTER_COMPUTE_NAME=\"madewithml-cluster-compute-g5.4xlarge\"\nanyscale cluster-compute create deploy\u002Fcluster_compute.yaml --name $CLUSTER_COMPUTE_NAME\n```\n\n### Anyscale Jobs\n\n现在我们已准备好执行机器学习工作负载。我们选择将所有任务合并到一个 [Job](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) 中，但也可以为每个任务（训练、评估等）分别创建 Job。首先，编辑 [`workloads.yaml`](.\u002Fdeploy\u002Fjobs\u002Fworkloads.yaml) 文件中的 `$GITHUB_USERNAME` 占位符：\n\n```yaml\nruntime_env:\n  working_dir: .\n  upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fjobs  # \u003C--- 修改为你的 GitHub 用户名（区分大小写）\n  env_vars:\n    GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- 修改为你的 GitHub 用户名（区分大小写）\n```\n\n此处的 `runtime_env` 表示应将当前 `working_dir` 上传至 S3 存储桶，以便执行 Anyscale Job 时所有工作节点都能访问代码。`GITHUB_USERNAME` 用于后续将工作负载的结果保存到 S3，以便之后检索（例如用于部署服务）。\n\n现在我们可以提交 Job 来执行机器学习工作负载了：\n```bash\nanyscale job submit deploy\u002Fjobs\u002Fworkloads.yaml\n```\n\n### Anyscale Services\n\n在机器学习工作负载执行完成后，我们就可以将模型部署到生产环境了。与 Anyscale Jobs 的配置类似，请确保修改 [`serve_model.yaml`](.\u002Fdeploy\u002Fservices\u002Fserve_model.yaml) 中的 `$GITHUB_USERNAME`：\n\n```yaml\nray_serve_config:\n  import_path: deploy.services.serve_model:entrypoint\n  runtime_env:\n    working_dir: .\n    upload_path: s3:\u002F\u002Fmadewithml\u002F$GITHUB_USERNAME\u002Fservices  # \u003C--- 修改为你的 GitHub 用户名（区分大小写）\n    env_vars:\n      GITHUB_USERNAME: $GITHUB_USERNAME  # \u003C--- 修改为你的 GitHub 用户名（区分大小写）\n```\n\n现在我们可以启动服务了：\n```bash\n# 部署服务\nanyscale service rollout -f deploy\u002Fservices\u002Fserve_model.yaml\n\n# 查询服务\ncurl -X POST -H \"Content-Type: application\u002Fjson\" -H \"Authorization: Bearer $SECRET_TOKEN\" -d '{\n  \"title\": \"Transfer learning with transformers\",\n  \"description\": \"Using transformers for transfer learning on text classification tasks.\"\n}' $SERVICE_ENDPOINT\u002Fpredict\u002F\n\n# 回滚（回退到服务的先前版本）\nanyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME\n\n# 终止服务\nanyscale service terminate --name $SERVICE_NAME\n```\n\n### CI\u002FCD（持续集成\u002F持续部署）\n\n我们不会在每次修改代码后都手动部署应用程序。相反，我们将使用 GitHub Actions 来自动化这一流程！\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_298a6c8e97a7.png\">\n\u003C\u002Fdiv>\n\n1. 创建一个新的 GitHub 分支来保存我们的更改并执行 CI\u002FCD 工作负载：\n```bash\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002F$GITHUB_USERNAME\u002FMade-With-ML.git  # \u003C-- 将此处替换为你的用户名\ngit checkout -b dev\n```\n\n2. 我们首先需要将必要的凭据添加到 GitHub 仓库的 [`\u002Fsettings\u002Fsecrets\u002Factions`](https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fsettings\u002Fsecrets\u002Factions) 页面中。\n\n``` bash\nexport ANYSCALE_HOST=https:\u002F\u002Fconsole.anyscale.com\nexport ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN  # 从 https:\u002F\u002Fconsole.anyscale.com\u002Fo\u002Fmadewithml\u002Fcredentials 获取\n```\n\n3. 现在我们可以修改代码（不要在 `main` 分支上），并将更改推送到 GitHub。但为了将代码推送到 GitHub，我们需要先使用凭据进行身份验证：\n\n```bash\ngit config --global user.name $GITHUB_USERNAME  # \u003C-- 将此处替换为你的用户名\ngit config --global user.email you@example.com  # \u003C-- 将此处替换为你的邮箱\ngit add .\ngit commit -m \"\"  # \u003C-- 将此处替换为你的提交信息\ngit push origin dev\n```\n\n此时系统会提示你输入用户名和密码（个人访问令牌）。请按以下步骤获取个人访问令牌：[新建 GitHub 个人访问令牌](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens\u002Fnew) → 输入名称 → 勾选 `repo` 和 `workflow` 权限 → 点击 `Generate token`（向下滚动）→ 复制生成的令牌，并在提示输入密码时粘贴该令牌。\n\n4. 接下来，我们可以从该分支向 `main` 分支发起 Pull Request（PR），这将触发 [workloads workflow](\u002F.github\u002Fworkflows\u002Fworkloads.yaml)。如果工作流（Anyscale Jobs）成功运行，它会直接在 PR 中添加包含训练和评估结果的评论。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_57eb341e3ec9.png\">\n\u003C\u002Fdiv>\n\n5. 如果我们对结果满意，就可以将 PR 合并到 `main` 分支。这将触发 [serve workflow](\u002F.github\u002Fworkflows\u002Fserve.yaml)，从而将我们的新服务部署到生产环境！\n\n### 持续学习（Continual learning）\n\n有了用于部署应用程序的 CI\u002FCD 工作流后，我们现在可以专注于持续改进模型。在此基础上，我们可以轻松扩展，例如连接定时任务（cron）、[数据管道](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fdata-engineering\u002F)、通过[监控](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fmonitoring\u002F)检测到的数据漂移、[在线评估](https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fevaluation\u002F#online-evaluation)等。我们还可以轻松添加更多上下文信息，例如在 PR 中直接比较任意实验与当前生产环境中的模型效果等。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_readme_35e790de5108.png\">\n\u003C\u002Fdiv>\n\n## 常见问题（FAQ）\n\n### Jupyter Notebook 内核（kernels）\n\n在配置 Jupyter Notebook 内核时遇到问题？默认情况下，Jupyter 会使用我们虚拟环境中的内核，但我们也可以手动将其添加到 Jupyter：\n```bash\npython3 -m ipykernel install --user --name=venv\n```\n现在我们可以打开一个 Notebook → 点击顶部菜单栏的 Kernel（内核）→ Change Kernel（更换内核）→ 选择 `venv`。如果以后想删除这个内核，可以执行以下命令：\n```bash\njupyter kernelspec list\njupyter kernelspec uninstall venv\n```","# Made-With-ML 快速上手指南\n\n## 环境准备\n\n- **操作系统**：Linux \u002F macOS \u002F Windows（推荐 Linux 或 macOS）\n- **Python 版本**：强烈推荐使用 **Python 3.10**\n- **依赖工具**：\n  - `git`\n  - `pip`\n  - 推荐使用 [`pyenv`](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv)（macOS\u002FLinux）或 [`pyenv-win`](https:\u002F\u002Fgithub.com\u002Fpyenv-win\u002Fpyenv-win)（Windows）管理 Python 版本\n- **可选加速**：如在国内，建议配置 PyPI 镜像源（如清华源）以加速依赖安装：\n  ```bash\n  pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n  ```\n\n## 安装步骤\n\n### 1. 克隆代码库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML.git .\n```\n\n### 2. 配置环境变量\n```bash\ntouch .env\n```\n编辑 `.env` 文件，填入你的 GitHub 用户名：\n```env\nGITHUB_USERNAME=\"你的GitHub用户名\"\n```\n然后加载环境变量：\n```bash\nsource .env\n```\n\n### 3. 创建并激活虚拟环境（本地开发）\n> 如果你使用 Anyscale 平台，此步可跳过。\n\n```bash\nexport PYTHONPATH=$PYTHONPATH:$PWD\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows: venv\\Scripts\\activate\npython3 -m pip install --upgrade pip setuptools wheel\npython3 -m pip install -r requirements.txt\npre-commit install\npre-commit autoupdate\n```\n\n## 基本使用\n\n### 方式一：交互式 Notebook（快速体验）\n```bash\njupyter lab notebooks\u002Fmadewithml.ipynb\n```\n打开后按顺序运行单元格，即可交互式体验数据处理、训练、调优等核心 ML 工作流。\n\n### 方式二：命令行脚本（生产就绪实践）\n\n#### 训练模型（示例）\n```bash\nexport EXPERIMENT_NAME=\"llm\"\nexport DATASET_LOC=\"https:\u002F\u002Fraw.githubusercontent.com\u002FGokuMohandas\u002FMade-With-ML\u002Fmain\u002Fdatasets\u002Fdataset.csv\"\nexport TRAIN_LOOP_CONFIG='{\"dropout_p\": 0.5, \"lr\": 1e-4, \"lr_factor\": 0.8, \"lr_patience\": 3}'\npython madewithml\u002Ftrain.py \\\n    --experiment-name \"$EXPERIMENT_NAME\" \\\n    --dataset-loc \"$DATASET_LOC\" \\\n    --train-loop-config \"$TRAIN_LOOP_CONFIG\" \\\n    --num-workers 1 \\\n    --cpu-per-worker 3 \\\n    --gpu-per-worker 0 \\  # 若无 GPU，请设为 0\n    --num-epochs 10 \\\n    --batch-size 256 \\\n    --results-fp results\u002Ftraining_results.json\n```\n\n> **注意**：根据你的机器资源调整 `--num-workers`、`--cpu-per-worker` 和 `--gpu-per-worker` 参数。普通笔记本建议设置 `--num-workers 4 --cpu-per-worker 1 --gpu-per-worker 0`。\n\n#### 启动实验跟踪（MLflow）\n```bash\nexport MODEL_REGISTRY=$(python -c \"from madewithml import config; print(config.MODEL_REGISTRY)\")\nmlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY\n```\n启动后访问 [http:\u002F\u002Flocalhost:8080](http:\u002F\u002Flocalhost:8080) 查看实验记录与模型版本。\n\n---\n\n现在你已准备好开始构建端到端的机器学习系统！更多内容请访问课程官网：[https:\u002F\u002Fmadewithml.com\u002F](https:\u002F\u002Fmadewithml.com\u002F)","一家电商公司的数据科学团队正在开发一个商品推荐模型，希望从实验阶段快速推进到线上服务，但缺乏将模型可靠部署和持续迭代的工程经验。\n\n### 没有 Made-With-ML 时\n- 团队仅关注模型准确率，忽略了数据版本、特征一致性等生产环境关键问题，导致上线后效果大幅下降。\n- 缺乏统一的代码结构和工程规范，模型训练、评估和服务代码耦合严重，难以复用或调试。\n- 部署依赖手动操作，每次更新模型都需要运维介入，发布周期长达数周。\n- 没有集成监控和回滚机制，模型性能退化无法及时发现，影响用户体验。\n- 团队成员对 MLOps 概念模糊，协作效率低，软件工程师与数据科学家沟通成本高。\n\n### 使用 Made-With-ML 后\n- 借鉴其“设计→开发→部署→迭代”全流程框架，团队在早期就引入数据验证、特征存储等生产级实践，保障线上线下一致性。\n- 采用课程提供的模块化项目结构（如 config、data、models、serve 等目录），代码清晰可维护，新人也能快速上手。\n- 利用其 CI\u002FCD 和模型服务示例，实现自动化训练与一键部署，模型上线周期缩短至1天内。\n- 集成 MLflow 跟踪和 Prometheus 监控方案，实时观测模型表现，异常时自动触发告警或回滚。\n- 团队通过共同学习 Made-With-ML 的最佳实践，建立起统一的技术语言，跨角色协作更加顺畅。\n\nMade-With-ML 帮助团队将学术级模型真正转化为稳定、可迭代的生产系统，大幅降低从实验到落地的工程门槛。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FGokuMohandas_Made-With-ML_80cc66a6.png","GokuMohandas","Goku Mohandas","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FGokuMohandas_ebad4734.png","ml, bio, art, tennis, travel",null,"gokumd@gmail.com","https:\u002F\u002Fgithub.com\u002FGokuMohandas",[83,87,91,95],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",97.8,{"name":88,"color":89,"percentage":90},"Python","#3572A5",2.1,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.1,{"name":96,"color":97,"percentage":98},"Makefile","#427819",0,47108,7398,"2026-04-05T10:42:55","MIT","Linux, macOS, Windows","非必需，但推荐使用 NVIDIA GPU（如在 Anyscale 上使用 g5.4xlarge 实例），显存未明确说明，CUDA 版本未说明","未说明",{"notes":107,"python":108,"dependencies":109},"支持本地运行或通过 Anyscale 平台运行；本地运行时建议使用 Python 3.10 和虚拟环境；项目依赖 Ray 进行分布式计算，MLflow 用于实验跟踪；首次运行需配置 .env 文件并安装依赖。","3.10（推荐）",[110,111,112,113,114,115,116,117],"torch","transformers","ray","mlflow","jupyter","pre-commit","setuptools","wheel",[26,54,51,53,13],[120,121,122,123,124,125,126,127,128,129,130,112,131],"machine-learning","deep-learning","pytorch","natural-language-processing","data-science","python","mlops","data-engineering","data-quality","distributed-ml","llms","distributed-training",167,"2026-03-27T02:49:30.150509","2026-04-06T02:42:43.389183",[136,141,146,151,156,161],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},636,"在本地机器上运行 trainer.fit() 时出现内存不足（OOM）错误，如何解决？","内存占用主要与模型大小和批处理大小（batch_size）线性相关。建议减小 batch_size 和 num_workers 参数。例如，将 batch_size 设为 32、num_workers 设为 1 可显著降低内存使用。每个数据加载工作进程（worker）都会复制一份模型，因此减少 num_workers 也能节省内存。如果仅使用 CPU 训练且 RAM 有限（如 8GB），这些调整通常是必要的。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F253",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},637,"在 EDA 中使用 sns.barplot 报错或无法显示图表，怎么办？","较新版本的 Seaborn 要求显式指定 x 和 y 参数。应将代码从 `sns.barplot(list(tags), list(tag_counts))` 改为 `sns.barplot(x=list(tags), y=list(tag_counts))`，否则会抛出 TypeError 并无法生成图表。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F220",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},638,"某些课程页面（如“多层感知机”）返回 404 错误，如何访问旧内容？","部分旧课程链接已失效。可通过 Wayback Machine（互联网档案馆）访问历史版本，例如：https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20221201110239\u002Fhttps:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fpackaging\u002F。此外，部分内容已迁移至新路径，如 FastAPI 部署教程现位于 https:\u002F\u002Fmadewithml.com\u002Fcourses\u002Fmlops\u002Fapi\u002F。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F234",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},639,"逻辑回归课程中的散点图缺少图例（只显示 'malignant'），如何修复？","原代码使用单次 scatter 绘制两类数据导致图例丢失。正确做法是分别绘制良性（benign）和恶性（malignant）数据点，并为 each scatter 显式设置 label。示例代码：\n```python\nax.scatter(X_benign[:, 0], X_benign[:, 1], c=\"blue\", s=25, edgecolors=\"k\", label=\"benign\")\nax.scatter(X_malignant[:, 0], X_malignant[:, 1], c=\"red\", s=25, edgecolors=\"k\", label=\"malignant\")\nax.legend(loc=\"upper right\")\n```","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F229",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},640,"是否有 Colab 和 Binder 之外的云端运行环境推荐？","可以使用 RMOTR Notebooks（https:\u002F\u002Fnotebooks.rmotr.com\u002F），它支持一键克隆 GitHub 仓库并自动安装 requirements.txt 中的依赖，适合教学和快速启动。项目维护者已确认该平台可用，并已在 README 中添加了相关入口。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F90",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},641,"更新后部分 MLOps 主题（如 Optuna、Airflow、Feature Store）消失，还能找到吗？","这些内容并未删除，而是被整合到其他课程中。例如，优化相关内容已融入训练和超参调优章节，版本控制内容可在版本管理课程中找到。建议浏览现有课程结构或通过 Wayback Machine 查看旧版目录以定位具体知识点。","https:\u002F\u002Fgithub.com\u002FGokuMohandas\u002FMade-With-ML\u002Fissues\u002F233",[167],{"id":168,"version":169,"summary_zh":170,"released_at":171},100235,"v1.1.0","Release prior to agentic workflow updates","2026-03-04T23:24:54"]