工作计划

刘超 5天前 ⋅ 4616 阅读   编辑

sparksql:https://blog.csdn.net/olizxq/article/details/81266476
  https://www.cnblogs.com/johnny666888/category/1491918.html

https://dongkelun.com/2020/01/16/sparkHivePartitionOverwrite/

kafka:https://github.com/hiddenzzh/kafka_book_demo

sparksql内核:https://code-monkey.top/2020/06/16/SparkSQL%E5%86%85%E6%A0%B8%E8%A7%A3%E6%9E%90-%E6%89%A7%E8%A1%8C%E5%85%A8%E8%BF%87%E7%A8%8B%E6%A6%82%E8%BF%B0/

spark:https://zhmin.github.io/page/2/

https://cloud.tencent.com/developer/column/4944

https://github.com/ljc520313/DocHub

https://mp.weixin.qq.com/mp/homepage?__biz=MzIzMDMwNTg3MA==&hid=2&sn=e20850e74a71e2b1ab69f43bef5c0056&scene=1&devicetype=iOS13.5&version=17000a2d&lang=zh_CN&nettype=WIFI&ascene=7&session_us=gh_49a1542a49bd&fontScale=100&wx_header=1

https://zhuanlan.zhihu.com/p/154379047

数据质量:https://www.jianshu.com/p/4d5a86118004
分析窗口:https://www.jianshu.com/p/2ed2c8ef0143
数仓:https://www.jianshu.com/nb/35418713
    http://blog.leanote.com/cate/yongjian/01.-%e6%95%b0%e6%8d%ae%e5%bb%ba%e6%a8%a1?page=2
http://shzhangji.com/cnblogs/2017/08/13/extract-data-from-mysql-with-binlog-and-canal/

同事反馈 airflow1.10.1无法识别none_failed 和 none_skipped但是可以识别one_success,???在测试以下

https://blog.csdn.net/zpf336/article/details/86482036

https://www.jianshu.com/p/f6042288a6e3

https://www.jianshu.com/p/bef2ec1c361f

https://www.jianshu.com/p/7449e34a4bf6

https://www.jianshu.com/p/9ae1d2974304

https://www.jianshu.com/p/e701e7ecdc08

https://www.jianshu.com/p/cfb8c1386f69

https://www.jianshu.com/p/2c02b7c5970a

https://www.jianshu.com/p/5ffd8730aad8

https://www.jianshu.com/p/e9cac0e3673d

https://www.jianshu.com/p/8523fce76ee7

https://www.jianshu.com/p/3f385e4e7f95

https://www.jianshu.com/p/c19ea51c46f2

https://www.jianshu.com/p/7ee464c40b04

https://www.jianshu.com/p/26f412ecfee2

https://www.jianshu.com/p/a62fa483ff54

https://www.jianshu.com/p/358c3ab3e4a3

https://www.jianshu.com/p/d5a452466375

https://enjoyment.cool/page/7/
使用yarn-cluster模式启动就可以,使用yarn-client模式就报错,有时间在研一下

https://mungingdata.com/apache-spark/maptype-columns/

https://community.cloudera.com/t5/Support-Questions/how-to-use-map-flatmap-function-to-manupulate-dataframe/td-p/166398

https://medium.com/@mrpowers/working-with-spark-arraytype-and-maptype-columns-4d85f3c8b2b3

https://stackoverflow.com/questions/47450253/how-to-embed-spark-dataframe-columns-to-a-map-column

https://sparkbyexamples.com/spark/spark-sql-map-functions/

https://sparkbyexamples.com/spark/spark-dataframe-map-maptype-column/
https://medium.com/@sfranks/i-had-trouble-finding-a-nice-example-of-how-to-have-an-udf-with-an-arbitrary-number-of-function-9d9bd30d0cfc

sql解析 java
https://github.com/melin/bigdata-sql-parser/tree/master/src

https://github.com/lccbiluox2/Antlr4-sqlParser

sql解析 javascript
https://github.com/forward/sql-parser
https://github.com/godmodelabs/flora-sql-parser
https://github.com/JavaScriptor/js-sql-parser
https://github.com/steveyen/sqld3
https://github.com/dsferruzza/simpleSqlParser
https://github.com/camilojd/sequeljs
https://github.com/justinkenel/js-sql-parse

mvcc多版本控制

https://pan.baidu.com/disk/home?#/all?vmode=list&path=%2F资源%2F刘吉超%2F用户画像解决方案

中国计算技术职业资格网:http://www.ruankao.org.cn/book

MySQL实战45讲

     https://time.geekbang.org/column/intro/139

webpack

https://pan.baidu.com/disk/home#/search?key=webpack&flag=0&vmode=list

typescript

  https://pan.baidu.com/disk/home?#/all?vmode=list&path=%2FIT%E6%8A%80%E6%9C%AF%2F%E5%BC%80%E5%8F%91%2F%E5%89%8D%E7%AB%AF%E5%BC%80%E5%8F%91%2F%E5%9F%BA%E7%A1%80%2Fhtml%2FHTML5%E8%AF%AD%E8%A8%80%E5%B7%A5%E7%A8%8B%E5%B8%88%2F%E7%AC%AC11%E9%98%B6%E6%AE%B5%20TypeScript%E5%9F%BA%E7%A1%80

sparksql
    https://github.com/curtishoward/sparkudfexamples

      https://github.com/search?o=desc&q=sparksql&s=stars&type=Repositories

      https://github.com/teeyog/IQL

      https://github.com/Stratio/sparta

      https://github.com/ZhuXS/Spring-Shiro-Spark

flink视频:https://pan.baidu.com/disk/home?#/all?vmode=list&path=%2F%E8%B5%84%E6%BA%90%2F%E5%88%98%E5%90%89%E8%B6%85%2FFlink%E5%AE%9E%E6%97%B6%E8%AE%A1%E7%AE%97%E6%A1%86%E6%9E%B6%E5%8E%9F%E7%90%86%E4%B8%8E%E6%A1%88%E4%BE%8B%E5%AE%9E%E6%88%98%2F01-Flink%E5%85%A5%E9%97%A8%E5%8F%8A%E5%AE%9E%E6%88%98(%E4%B8%8A)

TIDB

https://pingcap.com/blog-cn/percolator-and-txn/

https://pan.baidu.com/disk/home?#/search?key=tidb&flag=0&vmode=list
https://pingcap.com/blog-cn/tidb-source-code-reading-1/

airflow

https://airflow.apache.org/docs/stable/howto/operator/bash.html

http://chace.in/tags/Airflow/

https://blog.csdn.net/python_tty/article/details/79585525

https://blog.csdn.net/python_tty/article/details/78820546

https://hieast.github.io/2018/03/26/Airflow%20工作原理/

https://cloud.tencent.com/developer/information/airflow源码详解

https://juejin.im/post/5b7ba247e51d4538d42ab6a0

https://www.codeleading.com/article/646317722/

https://wemp.app/posts/124f4b01-d776-492e-bcc0-b132894c6ab7

http://www.dockone.io/article/9364

http://javaquan.com/post/30394_1_1.html

http://blog.githuber.cn/posts/3803

项目代码

疑问

  优化Flink应用的4种方式:https://blog.csdn.net/dev_csdn/article/details/78330867
  Flink如何用窗格来优化窗口:https://blog.csdn.net/yanghua_kobe/article/details/52705632
  实时计算 Flink性能调优:https://blog.csdn.net/yunqiinsight/article/details/83752210
  spark:https://blog.csdn.net/u012102306/article/category/6266992
  美团技术博客:https://tech.meituan.com/

数据湖:https://www.slideshare.net/databricks/designing-etl-pipelines-with-structured-streaming-and-delta-lakehow-to-architect-things-right

spark相关项目:https://github.com/search?p=100&q=spark&type=Repositories

  flink相关项目

公司

   scala:‡scala向导‰ˆ.pdf  本地

     万字长文解读电商搜索——如何让你买得又快又好

     【面试必问】不可不说的 Java “锁”事

https://www.jianshu.com/p/2404a1eae315

https://zhuanlan.zhihu.com/p/20731808

项目代码:http://myclusterbox.com/view/968

  课程表

   梳理业务:http://myclusterbox.com/view/541  

    hive分析函数

    https://www.cnblogs.com/yejibigdata/p/6376409.html

      spark逻辑执行计划翻译成自然语言

      https://gitee.com/CaiXiaoBai5566/Spark-2.3.1?_from=gitee_search

  spark job 调度

  http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup

  sparkstreaming

    水位

    https://github.com/lw-lin/CoolplaySpark/blob/master/Structured%20Streaming%20源码解析系列/4.2%20Structured%20Streaming%20之%20Watermark%20解析.md

airflow

  研究一下这几个例子 https://github.com/apache/airflow/tree/master/airflow/example_dags

  研究一下airflow可视化调度;以及怎么发布上线、改为新需求时,又怎么配合测试做测试(版本管理功能,线上版本、开发版本v1、v2等等)

hive、sparksql:内存管理、物理计划、执行计划

raft协议:https://myclusterbox.com/view/1309

flink python udf:https://mp.weixin.qq.com/s/-vKmJqBgyGtKHCd_mDp_Jw

一、复习

  mysql索引:http://myclusterbox.com/view/408

书籍:https://www.bookstack.cn/explore?cid=66&tab=popular

kafka 压测:https://blog.csdn.net/weixin_40596016/article/details/80576774

hive优化:https://www.cnblogs.com/mobiwangyue/p/9081672.html

二、学习计划

spark 标签

  G:\Data\spark\course\精准广告推送之Spark实战用户画像及离线报表分析\精准广告推送之Spark实战用户画像及离线报表分析

    2.大数据实战之精准广告推送实战---几个思考三通it学院-www.santongit.com-

推荐

    G:\Data\推荐系统\慕课学院 推荐系统算法工程师-从入门到就业

广告算法

  https://github.com/ouwenjie03/tencent-ad-game

  https://github.com/YouChouNoBB/2018-tencent-ad-competition-baseline

  https://github.com/wangle1218/Advertising-algorithm-competition

  https://github.com/bettenW/Tencent2019_Finals_Rank1st

  https://github.com/freelzy/Tencent_Social_Ads

  https://github.com/ColaDrill/tx_competition

  https://github.com/Walter000/tencent_competition

  https://github.com/Dojocat-GO/Tencent2017_Final_Rank28_code

  https://github.com/jiarenyf/TencentSPA02-PreA

  https://github.com/zsyandjyhouse/TencentAD_contest

  https://github.com/loyalzc/tencent_ad

  https://github.com/keyunluo/Tencent2018_Lookalike_Rank10th

  https://github.com/ouwenjie03/tencent-ad-game

  https://github.com/linzhouzhi/spark_recommend

https://github.com/labuladong/fucking-algorithm

推荐算法

  

笔记

  mapreduce

  mr哪些步骤是非常消耗内存的?

  hive

jvm mysql优化

   https://pan.baidu.com/disk/home?#/all?vmode=list&path=%2F资源%2F其他%2F咕泡学院-性能优化

   https://pan.baidu.com/disk/home?#/all?vmode=list&path=%2F资源%2F其他%2F高级JAVA架构师之路%2F03.深入JVM内核—原理、诊断与优化

 hbase

    1、read、write、compact

    2、源码文章

https://open.weixin.qq.com/connect/oauth2/authorize?appid=wxb41ab7a2a4702cdd&redirect_uri=https%3a%2f%2fh5.hzmedia.com.cn%2ffreshReading%2farticle%3fid%3dp_5d29561cddffd_LhFNBSaE%26uuid%3dxiandu%26chapterid%3d10255%26price%3d99.00&response_type=code&scope=snsapi_userinfo&state=123&connect_redirect=1#wechat_redirect

视频

  架构\年薪40万(深入大数据架构师之路)\*

         063hbase事务之mvcc详解以及和sql数据库的对比第64hbase物理存储原理解析.mp4

   Go

    Go编程基础

G:\Data\hive\course\04.大数据技术-SQL优化

flink 

   https://github.com/zhisheng17/flink-learning

    1、官网:https://flink.apache.org/
2、GitHub: https://github.com/apache/flink
3、https://blog.csdn.net/column/details/apacheflink.html
4、https://blog.csdn.net/lmalds/article/category/6263085
5、http://wuchong.me/
6、https://blog.csdn.net/liguohuabigdata/article/category/7279020

数仓:

  E:\Workspace\Eclipse\notesummary\数据仓库\数据仓库原理和实践.pdf

  https://space.bilibili.com/215850638?spm_id_from=333.788.b_765f7570696e666f.2

2、知识星球 jvm

    https://wx.zsxq.com/dweb/#/index/825821582152    (https://wx.zsxq.com/dweb/#/index/822528481482)

    jvm:https://apppukyptrl1086.pc.xiaoe-tech.com/detail/i_5d0efcdae26e1_FYwCKAxM/1

    计算广告:https://wx.zsxq.com/dweb2/index/group/5581582454

    深入理解JVM

   flink 

    https://wx.zsxq.com/dweb/#/index/822245485852

微信公共号文章

TensorFlow

https://github.com/fendouai/Awesome-TensorFlow-Chinese

https://github.com/fendouai/Chinese-Text-Classification

极客时间

    java技术36讲

    https://time.geekbang.org/column/82

     深入拆解 Java 虚拟机

     https://time.geekbang.org/column/108

    从0开始学架构

    https://time.geekbang.org/column/81

    机器学习40讲

     https://time.geekbang.org/column/97

    技术与商业案例解读

    https://time.geekbang.org/column/42

go

BitMap(go)

二、工作计划

  平台打包

指导方针

1、rocksdb/redis/mongodb/hbase/tidb/cockroachdb/scylladb/neo4j/etcd;
2、熟悉SQL协议、Paxos、Raft等一致性协议者优先
3、Kubernetes / Mesos / Yarn 
4、有过大型集群资源调度的实践经验

go 开发工具 liteide


注意:本文归作者所有,未经作者允许,不得转载

全部评论: 0

    我有话说: