博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
R-Organize Data(step 2)
阅读量:6335 次
发布时间:2019-06-22

本文共 7315 字,大约阅读时间需要 24 分钟。

[I]Missing Value

1.identify missing value

x is.na(x) is.nan(x) is.infinite(x)
x<-NA TRUE FALSE FALSE
x<-0/0 TRUE TRUE FALSE
x<-1/0 FALSE FALSE TRUE
  • is.na(x):missing value
  • is.nan(x):impossible value
  • is.infinite(x):infinite value

complete.cases():missing values are NA and NaN;Inf and -Inf are valid values

> mydata(data,package="")        #loading> data[complete.cases(data),]        #no missing value row> data[!complete.cases(data),]        #one or more missing value row

2.missing pattern

Pattern Package Function Description
list mice md.pattern(x) 0:missing value;1:no missing value
graphic VIM aggr(x,prop=FALSE,number=TRUE) number=FALSE(default):delete numberical label
graphic VIM matrixplot(x,pch=,col=) light color:small value,dark color:great value,red:default missing value
related none none x<-as.data.frame(abs(is.na(data)))
head(x,5)
y<-x[which(apply(x,2,sum)>0)]
cor(data,y,use="pairwise.complete.obs")

3.processing missing value

Method Description
raw delete newdata<-na.omit(mydata)
MI library(mice)
imp<-mice(data,m)
fit<-with(imp,analysis)
pooled<-pool(fit)
summary(pooled)
mvnmle maximum likelihood estimation of missing values in multivariate normal distribution data
cat multiple interpolation of multi-category variable in log-linear models
arraryImpute
arraryMissPattern
Seqknn
microarrary missing data
longitudinalData related function list
kmi multiple interpolation Kaplan-Meier
mix multiple interpolation mixed type data with continuous data
pan multi-panel data or cluster data

[II]Date Value

Function Description
date() output current date and time
Sys.Date() output current date
as.Date(x,"input_format") character convert to date
as.character(dates) date covert to character
difftime(date1,date2,units=) time interval,units="weeks"/"days"/"hours"/"minutes"/seconds"
format(x,format="output_format") output date in the specified format

input/output format

Symbol Description Example
%d 0~31 01
%a abbreviated week name Mon
%A non-abbreviated week name Monday
%m month 01
%b abbreviated month Jan
%B non-abbreviated month January
%y two-digit year 19
%Y four-digit year 2019

[III]Type Conversion

Judgement Conversion
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.date.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()

[IV]Data Sorting

> newdata<-dataframe[order(x1,x2),]        #x_i=x,ascending;x_i=-x,descending

[V]Data merging

> total<-merge(dataframeA,dataframeB,by="x1")        #column> total<-cbind(dataframeA,dataframeB)        #direct column merger> total<-rbind(dataframeA,dataframeB)        #direct row merger

[VI]Subset of Dataset

> newdata<-dataframe[row indices,column indices]        #save variable> dataframe$x1<-dataframe$x2<-NULL        #delete variable x1,x2> newdata
mysample<-dataframe[sample(1:nrow(dataframe),extracting elements,replace=),] #replace=FALSE/TRUE(put/back)

[VII]Processing Functions

1.math functions

Function Description
abs(x) absolute value
sqrt(x) squart root
ceiling(x) minimum integer not less than x
floor(x) maximum integer not greater than x
trunc(x) integer part from 0 to x
round(x,digits=n) specified n is the decimal number of x
signif(x,digits=n) specified n is the effective number of x
cos(x),sin(x),tan(x) cosine,sine,tangent
acos(x),asin(x),atan(x) arccosine,arcsine,arctangent
cosh(x),sinh(x),tanh(x) hyperbolic cosine,hyperbolic sine,hyperbolic tangent
acosh(x),asinh(x),atanh(x) inverse hyperbolic cosine,inverse hyperbolic sine,inverse hyperbolic tangent
log(x,base=n) base=n,logarithm of x;log(x):base value=e;log10(x):base value=10
exp(x) exponential function

2.statistical function

Function Description
mean(x) mean
madian(x) madian
sd(x) standard deviation
var(x) variance
mad(x) median absolute deviation
quantile(x,probs) quantile
range(x) range
sum(x) summary
diff(x,lag=n) hysteresis difference
min(x) minimum
max(x) maximum
scale(x,center=TRUE,scale=TRUE) centralization:center=TRUE;standardization:center=TRUE,scale=TRUE

3.probability function

> [d/p/q/r]distribution_abbreviation()
  • d=density
  • p=distribution function
  • q=quantile function
  • r=random function
Distribution Abbreviation
Beta beta
Binomial binom
Cauchy caushy
Chi-square chisq
Exponential exp
F f
Gamma gamma
Geometric geom
Hypergeometric hyper
Logarithm normal lnorm
Logistic logis
Multiple multinom
Negative Binomial nbinom
Normal norm
Poission pois
Wilcoxon signrank
T t
Uniform unif
Weibull weibull
Wilcoxon wilcox

4.character processing function

Function Description
nchar(x) character amount of x
substr(x,start,stop) extract or replace a substring in a character vetor
grep(pattern,x,ignore,case=FALSE,fixed=FALSE) search for a pattern in x.Regular Expression:fixed= FALSE;Text string:fixed=TRUE
sub(pattern,replacement,x,ignore,case=FALSE,fixed=FALSE) search for a pattern in x and replacing by text replacement
strsplit(x,split,fixed=FALSE) separate x in split
paste(...,sep="") connection string with separator sep
toupper(x) convert to uppercase
tolower(x) convert to lowercase

5.others

Function Description
length(x) the length of x
seq(from,to,by) generate a sequence
rep(x,n) repeat x times n times
cut(x,n) separate x into n parts
pretty(x,n) create beautiful split points
cat(...,file="mylife",append=FALSE) connection ... and output a file
apply(x,MARGIN,FUN,...) x:data,MARGIN:subscript of dimension,FUN:specified function

Homemade function

> myfunction<-function(arg1,arg2,..){       statements      return(object)  }

[VIII]Control Flow

Description Function
Repeat and Loop for(var in seq) statement
while (cond) statement
Conditional Execution if (cond) statement
if (cond) statement1 else statement2
ifelse(cond,statement1,statement2)
switch(expr,...)

[IX]Aggregate and Reshape

Function Description
t(x) Transpose
aggregate(x,by,FUN) x=data,by:a list of variable name,FUN:function
melt(x,variance) reshape2 package,data melt
dcast(md,formula,fun.aggregate) md:melted data,formula:variance1~variance i,fun.aggregate:aggregate function

[X]component analysis

step analysis diagram of principal component/exploratory factor

1

principal component analysis

1.determine the number of principal

> library(psych)> fa.parallel(Harman23.cor$cov,n.obs=302,fa="pc",n.iter=100,                  show.legend=FALSE,main="Scree plot with parallel analysis")

2.extracting the main component

> pc<-principal(r=USJudgeRatings[,-1],nfactors=1,rotate=,scores=)        #r:data,rotate default:maximum,scores default:no need

3.principal component rotation

> rc<-principal(Harman23.cor$cov,nfactors=2,rotate="varimax")

4.get the score of the principal component

> round(unclass(rc$weights),2)

exploratory factor analysis

1.determine the number of common factors

> library(psych)> covariances<-ability.cov$cov> correlations<-cov2cor(covariances)> fa.parallel(correlations,nobs=112,fa="both",n.iter=100,                   main="Scree plots with parallel analysis")

2.extracting common factor

> fa<-fa(correlations,nfactors=2,rotate="none",fm="pa")

3.factor rotation

> fa.varimax<-fa(correlations,nfactors=2,rotate="varimax",fm="pa")        #orthogonal> fa.promax<-fa(correlations,nfactors=2,rotate="promax",fm="pa")        #oblique> factor.plot(fa.promax,labels=rownames(fa.promax$loadings))> fa.diagram(fa.promax,simple=FALSE)

4.factor score

> fa.promax$weights

转载地址:http://yksoa.baihongyu.com/

你可能感兴趣的文章
用Matlab与c++程序生成的数据文件绘制sin函数
查看>>
数据结构之病毒感染检测问题
查看>>
文件流
查看>>
北航软件李卫国:打造高端实用型国际化IT人才
查看>>
授权定位
查看>>
99 Lisp Problems 列表处理(P1~P28)
查看>>
RabbitMQ Exchange中的fanout类型
查看>>
Linux系列笔记 - vim相关记录
查看>>
[公告]这里的博客将不再更新,最新博客请移步至blog.coderzh.com
查看>>
OSG程序设计之Hello World 2.0
查看>>
计算机网络之---物理层
查看>>
Jdk的java.util.concurrent包中已经实现了一个Semaphore类(信号量的用法),主要用于多线程情况下控制某个方法的并发数...
查看>>
[Axapta]"Journal name may not be changed when vouchers have been created"的问题
查看>>
2014025654《嵌入式系统程序设计》第二周学习总结
查看>>
三步学好Java,掌握Java编程思想
查看>>
获取windows版本信息的做法
查看>>
chrome developer tool—— 断点调试篇
查看>>
[100_Python学习笔记]001_Python环境安装配置(Windows 7)
查看>>
关于 platform的文章
查看>>
OC中自定义属性与系统属性冲突解决方法
查看>>