RenjieZhu

Hard work will pay off.

Context Free Grammars

ist664-week5

Context Free Grammars(CFG) 上下文无关文法 属于Syntactic Analysis,即分析words怎么组成一个sentence 应用:对文本进行结构化的建模,基于语言的实际语法 CFG Definition: a set of recursive rewriting rules (or productions) used to gener...

Apache Spark 简介(上)

ist718-week4

Apache Spark 简介和rdds相关操作(1.6版本) 本文图较多。。主要懒得打字 先简单复习一下data science 的工作流程 Spark可以参与其中哪些环节呢? 举例:通过访问日志分析网站访问信息 假设我们要通过访问的日期来预测访问的形式,spark的工作流程: 似乎和hadoop没什么差别? 我们先回顾一下hadoop Hadoop ...

Decision tree入门

ist707-week5

Decision Tree 707-5 算法 树的分叉方式: 两分叉或者多分叉 1.Categorical Attributes 2.Continuous Attributes 树选择哪个属性作为node? Entropy (熵) 类似理工科中的熵,形容混乱的程度 measure the impurity of a data set (Nois...

Model Evaluation

ist707-week5

Model Evaluation Model Overfitting Model fits the training data very well, but generalizes to unseen data poorly. Test error-HIGH Training error-LOW Higher model complexity -> lower tr...

pos词性标注 tagging入门

nlp664-week5

Part-of-Speech(POS) tagging入门 nlp664-week5 pos,词性标注。对文中的单词进行词性标注,可以给我们这个单词及相邻单词的信息。 英文词性有8种基本形式,这里简单把课上ppt的内容截出来 像是高中英语语法课。。。 进入正题:POS 三种常用的pos文本库 课程中主要用penn treebank, 具...

Hadoop简介

ist718-week3

Hadoop简介 Apache Hadoop includes a storage system called the Hadoop file system (HDFS), and a computing system called MapReduce Hadoop = HDFS + MapReduce HDFS: (Hadoop file system)使用普通硬件低...

Distribution system and CAP

718-lecture3

Distributed systems and CAP 718-lecture3 Distributed systems A distributed system is a collection of independent computers that appear to the users as a single coherent system 两个程序分别运行在两个...

Association Rules(关联规则)入门

ist707_2

Association Rules(关联规则)入门 Frequent Pattern Analysis Association rule mining: Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of...

聚类kmeans入门

ist707_3

Clustering Techniques 本文资料来源:ist707 Finding groups of objects. (unsupervised learning) Minimize the intra-cluster distance Maximize the inter-cluster distance Apply: finding customers are...

NLP 正则表达式

Regular Expression

IST664-NLP Week 4 Regular Expression! Resources Python Regex Cheatsheet 测试regex网站 regex cheatsheet Lecture Content Regular Expression Regular Expression: a tiny, highly specialized progr...