float_field()function

Create a floating-point column specification for use in a schema.

USAGE

float_field(
    min_val=None,
    max_val=None,
    allowed=None,
    nullable=False,
    null_probability=0.0,
    unique=False,
    generator=None,
    dtype='Float64',
)

The float_field() function defines the constraints and behavior for a floating-point column when generating synthetic data with generate_dataset(). You can control the range of values with min_val= and max_val=, restrict values to a specific set with allowed=, enforce uniqueness with unique=True, and introduce null values with nullable=True and null_probability=. The dtype= parameter lets you choose between "Float32" and "Float64" precision.

When both min_val= and max_val= are provided, values are drawn from a uniform distribution across that range. If neither is specified, values are drawn uniformly from a large default range. If allowed= is provided, values are sampled from that specific list.

Parameters

min_val : float | None = None

Minimum value (inclusive). Default is None (no minimum).

max_val : float | None = None

Maximum value (inclusive). Default is None (no maximum).

allowed : list[float] | None = None

List of allowed values (categorical constraint). When provided, values are sampled from this list. Cannot be combined with min_val=/max_val=.

nullable : bool = False

Whether the column can contain null values. Default is False.

null_probability : float = 0.0

Probability of generating a null value for each row when nullable=True. Must be between 0.0 and 1.0. Default is 0.0.

unique : bool = False

Whether all values must be unique. Default is False. When True, the generator will retry until it produces n distinct values.

generator : Callable[[], Any] | None = None

Custom callable that generates values. When provided, this overrides all other constraints. The callable should take no arguments and return a single float value.

dtype : str = 'Float64'

Float dtype. Default is "Float64". Options: "Float32", "Float64".

Returns

FloatField

A float field specification that can be passed to Schema().

Raises

: ValueError

If min_val is greater than max_val, if allowed is an empty list, if null_probability is not between 0.0 and 1.0, or if dtype is not a valid float type.

Examples


The min_val= and max_val= parameters define the generated value ranges:

import pointblank as pb

schema = pb.Schema(
    price=pb.float_field(min_val=0.01, max_val=9999.99),
    probability=pb.float_field(min_val=0.0, max_val=1.0),
    temperature=pb.float_field(min_val=-40.0, max_val=50.0),
)

pb.preview(pb.generate_dataset(schema, n=100, seed=23))
PolarsRows100Columns3
price
Float64
probability
Float64
temperature
Float64
1 9248.64401895442 0.9248652516259452 43.23787264633508
2 9486.04880781621 0.9486057779931771 45.37452001938594
3 8924.325591818912 0.8924333440485793 40.31900096437214
4 835.5150972932996 0.08355067683068362 -32.48043908523847
5 5920.270428312815 0.5920272268857353 13.282450419716177
96 4446.926385790886 0.4446925279641446 0.022327516773010814
97 3427.7653590611476 0.3427762214585577 -9.150140068729808
98 8923.280842563525 0.8923288689140904 40.309598202268134
99 8137.5531808932155 0.8137559456012128 33.238035104109144
100 8951.80870117522 0.8951816604808429 40.56634944327587

It’s also possible to restrict values to a discrete set with allowed=, which is useful for fixed pricing tiers or measurement levels:

schema = pb.Schema(
    discount=pb.float_field(allowed=[0.05, 0.10, 0.15, 0.20, 0.25]),
    weight_kg=pb.float_field(min_val=0.5, max_val=100.0),
)

pb.preview(pb.generate_dataset(schema, n=50, seed=42))
PolarsRows50Columns2
discount
Float64
weight_kg
Float64
1 0.05 64.12296644655943
2 0.05 2.9885701446553603
3 0.15 27.865417177727366
4 0.1 22.709468445807865
5 0.1 73.77888580931923
46 0.25 23.662693192922504
47 0.05 10.549642226268046
48 0.2 28.158373509454165
49 0.05 63.75060220430782
50 0.25 36.800801807523385

We can simulate missing measurements by introducing null values:

schema = pb.Schema(
    reading=pb.float_field(
        min_val=0.0, max_val=500.0,
        nullable=True, null_probability=0.2,
    ),
    calibration=pb.float_field(min_val=0.9, max_val=1.1),
)

pb.preview(pb.generate_dataset(schema, n=30, seed=7))
PolarsRows30Columns2
reading
Float64
calibration
Float64
1 161.91638241658117 0.9647665529666325
2 75.42458696225096 0.9301698347849005
3 None 1.0301868946079709
4 36.21814333377138 0.9144872573335086
5 None 1.007176400861338
26 58.89611903918418 0.9235584476156737
27 154.24091205096718 0.9616963648203869
28 408.0631795600157 1.0632252718240063
29 90.36318996196874 0.9361452759847875
30 290.8000818312331 1.0163200327324933

Setting dtype="Float32" gives reduced precision, and a custom generator= provides full control over value generation:

import random, math

rng = random.Random(0)

schema = pb.Schema(
    sensor_value=pb.float_field(min_val=-10.0, max_val=10.0, dtype="Float32"),
    log_value=pb.float_field(generator=lambda: math.log(rng.uniform(1, 1000))),
)

pb.preview(pb.generate_dataset(schema, n=20, seed=99))
PolarsRows20Columns2
sensor_value
Float64
log_value
Float64
1 -1.9204385011266734 6.738836419047254
2 -5.998491108501092 6.630942519000257
3 -6.4239535882677545 6.042991461173114
4 -5.031373629980624 5.559364739458459
5 5.197548730161559 6.237862500009073
16 -2.520404335509925 5.526471683068103
17 -2.237762978450628 6.813264923713322
18 3.647376330582926 6.890408378292458
19 -6.95446931399654 6.697536613756536
20 3.2113579182328227 6.804906921310479