- 电子工业出版社
- 9787121516702
- 1-1
- 568564
- 平塑
- 16开
- 2025-09
- 429
- 244
- 工学
- 计算机类
- 数据科学与大数据技术
- 本科 研究生及以上
内容简介
大数据技术的落地依赖“工具链掌握+场景化应用”的双重能力。从 Python 爬虫、Hive数据分析到 Flink 实时计算、数仓架构设计,技能点的综合应用能力已成为企业招聘的核心标准。本书以“真实项目驱动实训”为核心思路,精选 4 个典型实训项目构建阶梯式训练体系,涵盖离线处理、实时计算、数仓设计等核心场景,强化工程思维;整合 Python 爬虫、Hive、Flink、Kafka 等多种主流工具,覆盖数据采集、清洗、存储、分析、可视化全流程;融入大数据竞赛考点,衔接岗位技能需求。 本书适合作为高等学校大数据相关专业的实训教材,也可为数据工程从业者提供实践参考。
目录
第 1 章 历史天气数据分析项目································································································· 1
任务一 需求分析·················································································································· 1
任务二 技术架构分析及设计 ······························································································ 2
任务三 历史天气数据采集 ·································································································· 5
任务四 导入天气数据至 Hive···························································································· 13
任务五 历史天气数据分析 ································································································ 22
任务六 结果指标表导出···································································································· 33
任务七 数据可视化············································································································ 36
第 2 章 音乐推荐系统··············································································································· 44
任务一 需求分析················································································································ 44
任务二 技术架构分析及设计 ···························································································· 45
任务三 数据集合和项目概述 ···························································································· 47
任务四 数据加载模块········································································································ 52
任务五 数据统计模块········································································································ 55
任务六 离线推荐模块········································································································ 59
任务七 实时推荐模块········································································································ 65
第 3 章 电商离线数仓··············································································································· 72
任务一 需求分析················································································································ 72
任务二 数仓概述及架构分析 ···························································································· 73
任务三 数据源···················································································································· 75
任务四 数仓建设················································································································ 77
任务五 工作流调度···········································································································117
任务六 数据可视化·········································································································· 128
第 4 章 智慧社区实时数仓····································································································· 136
任务一 需求分析·············································································································· 136
任务二 技术架构分析及设计 ·························································································· 137
任务三 数据源与预处理·································································································· 140
任务四 实时计算框架配置 ······························································································ 153
任务五 DIM 层构建········································································································· 155
任务六 ODS 层构建········································································································· 169
任务七 DWD 层构建 ······································································································· 174
任务八 DWS 层构建········································································································ 182
任务一 需求分析·················································································································· 1
任务二 技术架构分析及设计 ······························································································ 2
任务三 历史天气数据采集 ·································································································· 5
任务四 导入天气数据至 Hive···························································································· 13
任务五 历史天气数据分析 ································································································ 22
任务六 结果指标表导出···································································································· 33
任务七 数据可视化············································································································ 36
第 2 章 音乐推荐系统··············································································································· 44
任务一 需求分析················································································································ 44
任务二 技术架构分析及设计 ···························································································· 45
任务三 数据集合和项目概述 ···························································································· 47
任务四 数据加载模块········································································································ 52
任务五 数据统计模块········································································································ 55
任务六 离线推荐模块········································································································ 59
任务七 实时推荐模块········································································································ 65
第 3 章 电商离线数仓··············································································································· 72
任务一 需求分析················································································································ 72
任务二 数仓概述及架构分析 ···························································································· 73
任务三 数据源···················································································································· 75
任务四 数仓建设················································································································ 77
任务五 工作流调度···········································································································117
任务六 数据可视化·········································································································· 128
第 4 章 智慧社区实时数仓····································································································· 136
任务一 需求分析·············································································································· 136
任务二 技术架构分析及设计 ·························································································· 137
任务三 数据源与预处理·································································································· 140
任务四 实时计算框架配置 ······························································································ 153
任务五 DIM 层构建········································································································· 155
任务六 ODS 层构建········································································································· 169
任务七 DWD 层构建 ······································································································· 174
任务八 DWS 层构建········································································································ 182















