bool_field()function

Create a boolean column specification for use in a schema.

USAGE

bool_field(
    p_true=0.5,
    nullable=False,
    null_probability=0.0,
    unique=False,
    generator=None,
)

The bool_field() function defines the constraints and behavior for a boolean column when generating synthetic data with generate_dataset(). The p_true= parameter controls the probability of generating True values, which is useful for simulating real-world distributions where events may be rare or common (e.g., 5% fraud rate, 80% active users).

By default, True and False are equally likely (p_true=0.5). Setting p_true=0.0 produces all False values, and p_true=1.0 produces all True values.

Parameters

p_true : float = 0.5

Probability of generating True. Default is 0.5 (equal probability). Must be between 0.0 and 1.0.

nullable : bool = False

Whether the column can contain null values. Default is False.

null_probability : float = 0.0

Probability of generating a null value for each row when nullable=True. Must be between 0.0 and 1.0. Default is 0.0.

unique : bool = False

Whether all values must be unique. Default is False. Note that boolean columns can only have 2 unique non-null values, so n must be <= 2 when unique=True (or <= 3 with nullable=True).

generator : Callable[[], Any] | None = None

Custom callable that generates values. When provided, this overrides all other constraints. The callable should take no arguments and return a single boolean value.

Returns

BoolField

A boolean field specification that can be passed to Schema().

Raises

: ValueError

If p_true is not between 0.0 and 1.0, or if null_probability is not between 0.0 and 1.0.

Examples


The p_true= parameter controls the distribution of True/False values, allowing you to simulate different probabilities:

import pointblank as pb

schema = pb.Schema(
    is_active=pb.bool_field(p_true=0.8),
    is_premium=pb.bool_field(p_true=0.2),
    is_verified=pb.bool_field(),
)

pb.preview(pb.generate_dataset(schema, n=100, seed=23))
PolarsRows100Columns3
is_active
Boolean
is_premium
Boolean
is_verified
Boolean
1 False False False
2 False False False
3 False False False
4 True True True
5 True False False
96 True False True
97 True False True
98 False False False
99 False False False
100 False False False

Optional boolean flags can be simulated by combining nullable=True with null_probability=:

schema = pb.Schema(
    opted_in=pb.bool_field(p_true=0.6),
    has_referral=pb.bool_field(
        p_true=0.3,
        nullable=True, null_probability=0.25,
    ),
)

pb.preview(pb.generate_dataset(schema, n=50, seed=42))
PolarsRows50Columns2
opted_in
Boolean
has_referral
Boolean
1 False None
2 True True
3 True None
4 True True
5 False False
46 True True
47 True None
48 True True
49 False False
50 True None

Boolean fields can be combined with other field types in a realistic schema:

schema = pb.Schema(
    user_id=pb.int_field(min_val=1, unique=True),
    name=pb.string_field(preset="name"),
    email_verified=pb.bool_field(p_true=0.9),
    is_admin=pb.bool_field(p_true=0.05),
)

pb.preview(pb.generate_dataset(schema, n=30, seed=10))
PolarsRows30Columns4
user_id
Int64
name
String
email_verified
Boolean
is_admin
Boolean
1 300544187282452692 Jackson Oliver True False
2 4450845842995915857 Adam Schultz True False
3 136805169105849017 Adalyn Webb True False
4 4266552081017657985 Thomas Duffy True False
5 4531409703019074138 Denise Hill True False
26 4055035012371305120 Brandon Hoffman False False
27 3461005733469027473 Cecilia Klein True False
28 5376594645785815587 Ryan Whitaker True True
29 2173523867229340168 Theodore Bennett True True
30 1798126872665261570 Walker Ryan True False