Skip to contents

Information on schools players attended, by player

Usage

data(CollegePlaying)

Format

A data frame with 17687 observations on the following 3 variables.

playerID

Player ID code

schoolID

school ID code

yearID

Year player attended school

Details

This data set reflects a change in the Lahman schema for the 2015 version. The old SchoolsPlayers table was replaced with this new table called CollegePlaying.

According to the documentation, this change reflects advances in the compilation of this data, largely led by Ted Turocy. The old table reported college attendance for major league players by listing a start date and end date. The new version has a separate record for each year that a player attended. This allows us to better account for players who attended multiple colleges or skipped a season, as well as to identify teammates.

Source

Lahman, S. (2026) Lahman's Baseball Database, 1871-2025, 2026 version, https://sabr.org/lahman-database/

Examples

data(CollegePlaying)
head(CollegePlaying)
#>    playerID schoolID yearID
#> 1 birkbmi01    akron   1980
#> 2 birkbmi01    akron   1981
#> 3 birkbmi01    akron   1982
#> 4 birkbmi01    akron   1983
#> 5 dilauja01    akron   1962
#> 6 malasma01    akron   1998

## Q: What are the top universities for producing MLB players?
SPcount <- table(CollegePlaying$schoolID)
SPcount[SPcount>50]
#> 
#>    alabama    arizona  arizonast   arkansas     auburn     baylor bostoncoll 
#>        155        162        236        108        122         99         71 
#>      brown        byu    cacerri california  calstfull    clemson  creighton 
#>        112         57         52        163        131        138         51 
#>  dartmouth       duke    florida  floridast    fordham   fresnost     gatech 
#>         64         94        138        152         99        103        137 
#> georgetown    georgia  holycross    houston   illinois    indiana  indianast 
#>         78         89        167         57        143         58         55 
#>       iowa     kentst   kentucky  longbeach loyolamary        lsu  manhattan 
#>         59         52         79         96         70        149         71 
#>   maryland    miamifl    miamioh   michigan michiganst  minnesota   missouri 
#>         60        113         63        192         71         87         72 
#>     missst    ncstate   nebraska  notredame   nwestern       ohio     ohiost 
#>        118         98         88        134         58         54        112 
#>   oklahoma    okstate    olemiss   oregonst     pennst pepperdine  princeton 
#>        135        132        108         70         58         87         66 
#>       rice    rutgers sandiegost santaclara  scarolina  setonhall  sillinois 
#>         83         52        103        112        119         70         57 
#>   stanford  stmarysca  tennessee      texas    texasam     tulane   txchrist 
#>        248         89         92        265        129         74         80 
#>     txtech       ucla      umass        unc      upenn        usc      vandy 
#>         53        180         52        154         88        250         65 
#>  villanova   virginia       wake     washst  wichitast  wisconsin  wmichigan 
#>         86         95         81         74        107         56         68 
#>       yale 
#>         59 

library("lattice")
dotplot(SPcount[SPcount>50])

dotplot(sort(SPcount[SPcount>50]))


## Q: How many schools are represented in this dataset?
length(table(CollegePlaying$schoolID))
#> [1] 1122

# Histogram of the number of players from each school who played in MLB:
with(CollegePlaying, 
     hist(table(schoolID), xlab = "Number of players",
                           main = ""))