deleteBogusRows.RdIf cases are mostly missing, delete them. It often happens that when data is imported from other sources, some noise rows exist at the bottom of the input. Anything that is missing in more than, say, 90% of cases is probably useless information. We invented this to deal with problem that MS Excel users often include a marginal note at the bottom of a spread sheet.
deleteBogusRows(dframe, pm = 0.9, drop = FALSE, verbose = TRUE, n = 25)A data frame or matrix
"proportion missing data" to be tolerated.
Default FALSE: if data frame result is reduced to one row, should R's default drop behavior "demote" this to a column vector.
Default TRUE. Should a report be printed summarizing information to be delted?
Default 25: limit on number of values to print in verbose diagnostic output. If set to NULL or NA, then all of the column values will be printed for the bogus rows.
a data frame, invisibly
mymat <- matrix(rnorm(10*100), nrow = 10, ncol = 100,
dimnames = list(1:10, paste0("x", 1:100)))
mymat <- rbind(mymat, c(32, rep(NA, 99)))
mymat2 <- deleteBogusRows(mymat)
#> deleteBogusRows Diagnostic
#> These rows from the data frame: mymat
#> are being purged: 11
#> The bad content was
#> x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22
#> 32 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x23 x24 x25
#> NA NA NA
mydf <- as.data.frame(mymat)
mydf$someFactor <- factor(sample(c("A", "B"), size = NROW(mydf), replace = TRUE))
mydf2 <- deleteBogusRows(mydf, n = "all")
#> deleteBogusRows Diagnostic
#> These rows from the data frame: mydf
#> are being purged: 11
#> The bad content was
#> x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22
#> 32 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x23 x24 x25 x26 x27 x28 x29 x30 x31 x32 x33 x34 x35 x36 x37 x38 x39 x40 x41
#> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x42 x43 x44 x45 x46 x47 x48 x49 x50 x51 x52 x53 x54 x55 x56 x57 x58 x59 x60
#> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x61 x62 x63 x64 x65 x66 x67 x68 x69 x70 x71 x72 x73 x74 x75 x76 x77 x78 x79
#> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x80 x81 x82 x83 x84 x85 x86 x87 x88 x89 x90 x91 x92 x93 x94 x95 x96 x97 x98
#> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> x99 x100 someFactor
#> NA NA A