Column types

Currently, readr automatically recognises the following types of columns:

To recognise these columns, readr inspects the first 1000 rows of your dataset. This is not guaranteed to be perfect, but it’s fast and a reasonable heuristic. If you get a lot of parsing failures, you’ll need to re-read the file, overriding the default choices as described below.

You can also manually specify other column types:

Use the col_types argument to override the default choices. There are two ways to use it:

Column parsers

As well as specifying how to parse a column from a file on disk, each of the col_xyz() functions has an equivalent parse_xyz() that parsers a character vector. These are useful for testing and examples, and for rapidly experimenting to figure out how to parse a vector given a few examples.

Base types

parse_logical(), parse_integer(), parse_double(), and parse_character() are straightforward parsers that produce the corresponding atomic vector.

Make sure to read vignette("locales") to learn how to deal with doubles.

Numbers

parse_integer() and parse_double() are strict: the input string must be a single number with no leading or trailing characters. parse_number() is more flexible: it ignores non-numeric prefixes and suffixes, and knows how to deal with grouping marks. This makes it suitable for reading currencies and percentages:

parse_number(c("0%", "10%", "150%"))
#> [1]   0  10 150
parse_number(c("$1,234.5", "$12.45"))
#> [1] 1234.50   12.45

Note that parse_guess() will only guess that a string is a number if it has no leading or trailing characters, otherwise it’s too prone to false positives. That means you’ll typically needed to explicitly supply the column type for number columns.

str(parse_guess("$1,234"))
#>  chr "$1,234"
str(parse_guess("1,234"))
#>  num 1234

Date times

readr supports three types of date/time data:

readr will guess date and date time fields if they’re in ISO8601 format:

parse_datetime("2010-10-01 21:45")
#> [1] "2010-10-01 21:45:00 UTC"
parse_date("2010-10-01")
#> [1] "2010-10-01"

Otherwise, you’ll need to specify the format yourself:

parse_datetime("1 January, 2010", "%d %B, %Y")
#> [1] "2010-01-01 UTC"
parse_datetime("02/02/15", "%m/%d/%y")
#> [1] "2015-02-02 UTC"

Factors

When reading a column that has a known set of values, you can read directly into a factor.

parse_factor(c("a", "b", "a"), levels = c("a", "b", "c"))
#> [1] a b a
#> Levels: a b c

readr will never turn a character vector into a factor unless you explicitly ask for it.