[2024-03-07 13:44:20,370] INFO: Will use single-gpu: NVIDIA RTX A6000 [2024-03-07 13:44:20,370] INFO: using dtype=torch.bfloat16 [2024-03-07 13:44:20,374] INFO: using attention_type=efficient [2024-03-07 13:44:20,374] INFO: using attention_type=efficient [2024-03-07 13:44:20,376] INFO: using attention_type=efficient [2024-03-07 13:44:20,376] INFO: using attention_type=efficient [2024-03-07 13:44:20,378] INFO: using attention_type=efficient [2024-03-07 13:44:20,378] INFO: using attention_type=efficient [2024-03-07 13:44:20,380] INFO: using attention_type=efficient [2024-03-07 13:44:20,380] INFO: using attention_type=efficient [2024-03-07 13:44:20,383] INFO: using attention_type=efficient [2024-03-07 13:44:20,383] INFO: using attention_type=efficient [2024-03-07 13:44:20,385] INFO: using attention_type=efficient [2024-03-07 13:44:20,385] INFO: using attention_type=efficient [2024-03-07 13:44:21,151] INFO: MLPF( (nn0): Sequential( (0): Linear(in_features=42, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=256, bias=True) ) (conv_id): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (conv_reg): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (nn_id): Sequential( (0): Linear(in_features=810, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=9, bias=True) ) (nn_pt): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_eta): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_sin_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_cos_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_energy): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_charge): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=3, bias=True) ) (nn_probX): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=1, bias=True) ) ) [2024-03-07 13:44:21,151] INFO: MLPF( (nn0): Sequential( (0): Linear(in_features=42, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=256, bias=True) ) (conv_id): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (conv_reg): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (nn_id): Sequential( (0): Linear(in_features=810, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=9, bias=True) ) (nn_pt): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_eta): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_sin_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_cos_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_energy): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_charge): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=3, bias=True) ) (nn_probX): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=1, bias=True) ) ) [2024-03-07 13:44:21,152] INFO: Trainable parameters: 4139031 [2024-03-07 13:44:21,152] INFO: Trainable parameters: 4139031 [2024-03-07 13:44:21,152] INFO: Non-trainable parameters: 0 [2024-03-07 13:44:21,152] INFO: Non-trainable parameters: 0 [2024-03-07 13:44:21,152] INFO: Total parameters: 4139031 [2024-03-07 13:44:21,152] INFO: Total parameters: 4139031 [2024-03-07 13:44:21,157] INFO: Modules Trainable params Non-tranable params Trainable Parameters Non-tranable Parameters nn0.0.weight NaN NaN 10752.0 - nn0.0.bias NaN NaN 256.0 - nn0.2.weight NaN NaN 256.0 - nn0.2.bias NaN NaN 256.0 - nn0.4.weight NaN NaN 65536.0 - nn0.4.bias NaN NaN 256.0 - conv_id.0.mha.in_proj_weight NaN NaN 196608.0 - conv_id.0.mha.in_proj_bias NaN NaN 768.0 - conv_id.0.mha.out_proj.weight NaN NaN 65536.0 - conv_id.0.mha.out_proj.bias NaN NaN 256.0 - conv_id.0.norm0.weight NaN NaN 256.0 - conv_id.0.norm0.bias NaN NaN 256.0 - conv_id.0.norm1.weight NaN NaN 256.0 - conv_id.0.norm1.bias NaN NaN 256.0 - conv_id.0.seq.0.weight NaN NaN 65536.0 - conv_id.0.seq.0.bias NaN NaN 256.0 - conv_id.0.seq.2.weight NaN NaN 65536.0 - conv_id.0.seq.2.bias NaN NaN 256.0 - conv_id.1.mha.in_proj_weight NaN NaN 196608.0 - conv_id.1.mha.in_proj_bias NaN NaN 768.0 - conv_id.1.mha.out_proj.weight NaN NaN 65536.0 - conv_id.1.mha.out_proj.bias NaN NaN 256.0 - conv_id.1.norm0.weight NaN NaN 256.0 - conv_id.1.norm0.bias NaN NaN 256.0 - conv_id.1.norm1.weight NaN NaN 256.0 - conv_id.1.norm1.bias NaN NaN 256.0 - conv_id.1.seq.0.weight NaN NaN 65536.0 - conv_id.1.seq.0.bias NaN NaN 256.0 - conv_id.1.seq.2.weight NaN NaN 65536.0 - conv_id.1.seq.2.bias NaN NaN 256.0 - conv_id.2.mha.in_proj_weight NaN NaN 196608.0 - conv_id.2.mha.in_proj_bias NaN NaN 768.0 - conv_id.2.mha.out_proj.weight NaN NaN 65536.0 - conv_id.2.mha.out_proj.bias NaN NaN 256.0 - conv_id.2.norm0.weight NaN NaN 256.0 - conv_id.2.norm0.bias NaN NaN 256.0 - conv_id.2.norm1.weight NaN NaN 256.0 - conv_id.2.norm1.bias NaN NaN 256.0 - conv_id.2.seq.0.weight NaN NaN 65536.0 - conv_id.2.seq.0.bias NaN NaN 256.0 - conv_id.2.seq.2.weight NaN NaN 65536.0 - conv_id.2.seq.2.bias NaN NaN 256.0 - conv_reg.0.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.0.mha.in_proj_bias NaN NaN 768.0 - conv_reg.0.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.0.mha.out_proj.bias NaN NaN 256.0 - conv_reg.0.norm0.weight NaN NaN 256.0 - conv_reg.0.norm0.bias NaN NaN 256.0 - conv_reg.0.norm1.weight NaN NaN 256.0 - conv_reg.0.norm1.bias NaN NaN 256.0 - conv_reg.0.seq.0.weight NaN NaN 65536.0 - conv_reg.0.seq.0.bias NaN NaN 256.0 - conv_reg.0.seq.2.weight NaN NaN 65536.0 - conv_reg.0.seq.2.bias NaN NaN 256.0 - conv_reg.1.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.1.mha.in_proj_bias NaN NaN 768.0 - conv_reg.1.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.1.mha.out_proj.bias NaN NaN 256.0 - conv_reg.1.norm0.weight NaN NaN 256.0 - conv_reg.1.norm0.bias NaN NaN 256.0 - conv_reg.1.norm1.weight NaN NaN 256.0 - conv_reg.1.norm1.bias NaN NaN 256.0 - conv_reg.1.seq.0.weight NaN NaN 65536.0 - conv_reg.1.seq.0.bias NaN NaN 256.0 - conv_reg.1.seq.2.weight NaN NaN 65536.0 - conv_reg.1.seq.2.bias NaN NaN 256.0 - conv_reg.2.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.2.mha.in_proj_bias NaN NaN 768.0 - conv_reg.2.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.2.mha.out_proj.bias NaN NaN 256.0 - conv_reg.2.norm0.weight NaN NaN 256.0 - conv_reg.2.norm0.bias NaN NaN 256.0 - conv_reg.2.norm1.weight NaN NaN 256.0 - conv_reg.2.norm1.bias NaN NaN 256.0 - conv_reg.2.seq.0.weight NaN NaN 65536.0 - conv_reg.2.seq.0.bias NaN NaN 256.0 - conv_reg.2.seq.2.weight NaN NaN 65536.0 - conv_reg.2.seq.2.bias NaN NaN 256.0 - nn_id.0.weight NaN NaN 207360.0 - nn_id.0.bias NaN NaN 256.0 - nn_id.2.weight NaN NaN 256.0 - nn_id.2.bias NaN NaN 256.0 - nn_id.4.weight NaN NaN 2304.0 - nn_id.4.bias NaN NaN 9.0 - nn_pt.nn.0.weight NaN NaN 209664.0 - nn_pt.nn.0.bias NaN NaN 256.0 - nn_pt.nn.2.weight NaN NaN 256.0 - nn_pt.nn.2.bias NaN NaN 256.0 - nn_pt.nn.4.weight NaN NaN 512.0 - nn_pt.nn.4.bias NaN NaN 2.0 - nn_eta.nn.0.weight NaN NaN 209664.0 - nn_eta.nn.0.bias NaN NaN 256.0 - nn_eta.nn.2.weight NaN NaN 256.0 - nn_eta.nn.2.bias NaN NaN 256.0 - nn_eta.nn.4.weight NaN NaN 512.0 - nn_eta.nn.4.bias NaN NaN 2.0 - nn_sin_phi.nn.0.weight NaN NaN 209664.0 - nn_sin_phi.nn.0.bias NaN NaN 256.0 - nn_sin_phi.nn.2.weight NaN NaN 256.0 - nn_sin_phi.nn.2.bias NaN NaN 256.0 - nn_sin_phi.nn.4.weight NaN NaN 512.0 - nn_sin_phi.nn.4.bias NaN NaN 2.0 - nn_cos_phi.nn.0.weight NaN NaN 209664.0 - nn_cos_phi.nn.0.bias NaN NaN 256.0 - nn_cos_phi.nn.2.weight NaN NaN 256.0 - nn_cos_phi.nn.2.bias NaN NaN 256.0 - nn_cos_phi.nn.4.weight NaN NaN 512.0 - nn_cos_phi.nn.4.bias NaN NaN 2.0 - nn_energy.nn.0.weight NaN NaN 209664.0 - nn_energy.nn.0.bias NaN NaN 256.0 - nn_energy.nn.2.weight NaN NaN 256.0 - nn_energy.nn.2.bias NaN NaN 256.0 - nn_energy.nn.4.weight NaN NaN 512.0 - nn_energy.nn.4.bias NaN NaN 2.0 - nn_charge.0.weight NaN NaN 209664.0 - nn_charge.0.bias NaN NaN 256.0 - nn_charge.2.weight NaN NaN 256.0 - nn_charge.2.bias NaN NaN 256.0 - nn_charge.4.weight NaN NaN 768.0 - nn_charge.4.bias NaN NaN 3.0 - nn_probX.0.weight NaN NaN 209664.0 - nn_probX.0.bias NaN NaN 256.0 - nn_probX.2.weight NaN NaN 256.0 - nn_probX.2.bias NaN NaN 256.0 - nn_probX.4.weight NaN NaN 256.0 - nn_probX.4.bias NaN NaN 1.0 - [2024-03-07 13:44:21,157] INFO: Modules Trainable params Non-tranable params Trainable Parameters Non-tranable Parameters nn0.0.weight NaN NaN 10752.0 - nn0.0.bias NaN NaN 256.0 - nn0.2.weight NaN NaN 256.0 - nn0.2.bias NaN NaN 256.0 - nn0.4.weight NaN NaN 65536.0 - nn0.4.bias NaN NaN 256.0 - conv_id.0.mha.in_proj_weight NaN NaN 196608.0 - conv_id.0.mha.in_proj_bias NaN NaN 768.0 - conv_id.0.mha.out_proj.weight NaN NaN 65536.0 - conv_id.0.mha.out_proj.bias NaN NaN 256.0 - conv_id.0.norm0.weight NaN NaN 256.0 - conv_id.0.norm0.bias NaN NaN 256.0 - conv_id.0.norm1.weight NaN NaN 256.0 - conv_id.0.norm1.bias NaN NaN 256.0 - conv_id.0.seq.0.weight NaN NaN 65536.0 - conv_id.0.seq.0.bias NaN NaN 256.0 - conv_id.0.seq.2.weight NaN NaN 65536.0 - conv_id.0.seq.2.bias NaN NaN 256.0 - conv_id.1.mha.in_proj_weight NaN NaN 196608.0 - conv_id.1.mha.in_proj_bias NaN NaN 768.0 - conv_id.1.mha.out_proj.weight NaN NaN 65536.0 - conv_id.1.mha.out_proj.bias NaN NaN 256.0 - conv_id.1.norm0.weight NaN NaN 256.0 - conv_id.1.norm0.bias NaN NaN 256.0 - conv_id.1.norm1.weight NaN NaN 256.0 - conv_id.1.norm1.bias NaN NaN 256.0 - conv_id.1.seq.0.weight NaN NaN 65536.0 - conv_id.1.seq.0.bias NaN NaN 256.0 - conv_id.1.seq.2.weight NaN NaN 65536.0 - conv_id.1.seq.2.bias NaN NaN 256.0 - conv_id.2.mha.in_proj_weight NaN NaN 196608.0 - conv_id.2.mha.in_proj_bias NaN NaN 768.0 - conv_id.2.mha.out_proj.weight NaN NaN 65536.0 - conv_id.2.mha.out_proj.bias NaN NaN 256.0 - conv_id.2.norm0.weight NaN NaN 256.0 - conv_id.2.norm0.bias NaN NaN 256.0 - conv_id.2.norm1.weight NaN NaN 256.0 - conv_id.2.norm1.bias NaN NaN 256.0 - conv_id.2.seq.0.weight NaN NaN 65536.0 - conv_id.2.seq.0.bias NaN NaN 256.0 - conv_id.2.seq.2.weight NaN NaN 65536.0 - conv_id.2.seq.2.bias NaN NaN 256.0 - conv_reg.0.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.0.mha.in_proj_bias NaN NaN 768.0 - conv_reg.0.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.0.mha.out_proj.bias NaN NaN 256.0 - conv_reg.0.norm0.weight NaN NaN 256.0 - conv_reg.0.norm0.bias NaN NaN 256.0 - conv_reg.0.norm1.weight NaN NaN 256.0 - conv_reg.0.norm1.bias NaN NaN 256.0 - conv_reg.0.seq.0.weight NaN NaN 65536.0 - conv_reg.0.seq.0.bias NaN NaN 256.0 - conv_reg.0.seq.2.weight NaN NaN 65536.0 - conv_reg.0.seq.2.bias NaN NaN 256.0 - conv_reg.1.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.1.mha.in_proj_bias NaN NaN 768.0 - conv_reg.1.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.1.mha.out_proj.bias NaN NaN 256.0 - conv_reg.1.norm0.weight NaN NaN 256.0 - conv_reg.1.norm0.bias NaN NaN 256.0 - conv_reg.1.norm1.weight NaN NaN 256.0 - conv_reg.1.norm1.bias NaN NaN 256.0 - conv_reg.1.seq.0.weight NaN NaN 65536.0 - conv_reg.1.seq.0.bias NaN NaN 256.0 - conv_reg.1.seq.2.weight NaN NaN 65536.0 - conv_reg.1.seq.2.bias NaN NaN 256.0 - conv_reg.2.mha.in_proj_weight NaN NaN 196608.0 - conv_reg.2.mha.in_proj_bias NaN NaN 768.0 - conv_reg.2.mha.out_proj.weight NaN NaN 65536.0 - conv_reg.2.mha.out_proj.bias NaN NaN 256.0 - conv_reg.2.norm0.weight NaN NaN 256.0 - conv_reg.2.norm0.bias NaN NaN 256.0 - conv_reg.2.norm1.weight NaN NaN 256.0 - conv_reg.2.norm1.bias NaN NaN 256.0 - conv_reg.2.seq.0.weight NaN NaN 65536.0 - conv_reg.2.seq.0.bias NaN NaN 256.0 - conv_reg.2.seq.2.weight NaN NaN 65536.0 - conv_reg.2.seq.2.bias NaN NaN 256.0 - nn_id.0.weight NaN NaN 207360.0 - nn_id.0.bias NaN NaN 256.0 - nn_id.2.weight NaN NaN 256.0 - nn_id.2.bias NaN NaN 256.0 - nn_id.4.weight NaN NaN 2304.0 - nn_id.4.bias NaN NaN 9.0 - nn_pt.nn.0.weight NaN NaN 209664.0 - nn_pt.nn.0.bias NaN NaN 256.0 - nn_pt.nn.2.weight NaN NaN 256.0 - nn_pt.nn.2.bias NaN NaN 256.0 - nn_pt.nn.4.weight NaN NaN 512.0 - nn_pt.nn.4.bias NaN NaN 2.0 - nn_eta.nn.0.weight NaN NaN 209664.0 - nn_eta.nn.0.bias NaN NaN 256.0 - nn_eta.nn.2.weight NaN NaN 256.0 - nn_eta.nn.2.bias NaN NaN 256.0 - nn_eta.nn.4.weight NaN NaN 512.0 - nn_eta.nn.4.bias NaN NaN 2.0 - nn_sin_phi.nn.0.weight NaN NaN 209664.0 - nn_sin_phi.nn.0.bias NaN NaN 256.0 - nn_sin_phi.nn.2.weight NaN NaN 256.0 - nn_sin_phi.nn.2.bias NaN NaN 256.0 - nn_sin_phi.nn.4.weight NaN NaN 512.0 - nn_sin_phi.nn.4.bias NaN NaN 2.0 - nn_cos_phi.nn.0.weight NaN NaN 209664.0 - nn_cos_phi.nn.0.bias NaN NaN 256.0 - nn_cos_phi.nn.2.weight NaN NaN 256.0 - nn_cos_phi.nn.2.bias NaN NaN 256.0 - nn_cos_phi.nn.4.weight NaN NaN 512.0 - nn_cos_phi.nn.4.bias NaN NaN 2.0 - nn_energy.nn.0.weight NaN NaN 209664.0 - nn_energy.nn.0.bias NaN NaN 256.0 - nn_energy.nn.2.weight NaN NaN 256.0 - nn_energy.nn.2.bias NaN NaN 256.0 - nn_energy.nn.4.weight NaN NaN 512.0 - nn_energy.nn.4.bias NaN NaN 2.0 - nn_charge.0.weight NaN NaN 209664.0 - nn_charge.0.bias NaN NaN 256.0 - nn_charge.2.weight NaN NaN 256.0 - nn_charge.2.bias NaN NaN 256.0 - nn_charge.4.weight NaN NaN 768.0 - nn_charge.4.bias NaN NaN 3.0 - nn_probX.0.weight NaN NaN 209664.0 - nn_probX.0.bias NaN NaN 256.0 - nn_probX.2.weight NaN NaN 256.0 - nn_probX.2.bias NaN NaN 256.0 - nn_probX.4.weight NaN NaN 256.0 - nn_probX.4.bias NaN NaN 1.0 - [2024-03-07 13:44:21,178] INFO: Creating experiment dir /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_134356_433935 [2024-03-07 13:44:21,178] INFO: Creating experiment dir /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_134356_433935 [2024-03-07 13:44:21,178] INFO: Model directory /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_134356_433935 [2024-03-07 13:44:21,178] INFO: Model directory /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_134356_433935 [2024-03-07 13:44:21,578] INFO: train_dataset: cms_pf_ttbar, 80000 [2024-03-07 13:44:21,578] INFO: train_dataset: cms_pf_ttbar, 80000 [2024-03-07 13:44:22,147] INFO: train_dataset: cms_pf_qcd, 80000 [2024-03-07 13:44:22,147] INFO: train_dataset: cms_pf_qcd, 80000 [2024-03-07 13:44:22,173] INFO: valid_dataset: cms_pf_ttbar, 20000 [2024-03-07 13:44:22,173] INFO: valid_dataset: cms_pf_ttbar, 20000 [2024-03-07 13:44:22,192] INFO: valid_dataset: cms_pf_qcd, 20000 [2024-03-07 13:44:22,192] INFO: valid_dataset: cms_pf_qcd, 20000 [2024-03-07 13:44:22,246] INFO: Initiating epoch #1 train run on device rank=0 [2024-03-07 13:44:22,246] INFO: Initiating epoch #1 train run on device rank=0 [2024-03-07 20:30:20,678] INFO: Initiating epoch #1 valid run on device rank=0 [2024-03-07 20:30:20,678] INFO: Initiating epoch #1 valid run on device rank=0 [2024-03-07 21:49:45,623] INFO: Rank 0: epoch=1 / 30 train_loss=86.1825 valid_loss=76.6010 stale=0 time=485.39m eta=14076.3m [2024-03-07 21:49:45,623] INFO: Rank 0: epoch=1 / 30 train_loss=86.1825 valid_loss=76.6010 stale=0 time=485.39m eta=14076.3m [2024-03-07 21:49:45,828] INFO: Initiating epoch #2 train run on device rank=0 [2024-03-07 21:49:45,828] INFO: Initiating epoch #2 train run on device rank=0 [2024-03-08 04:30:31,077] INFO: Initiating epoch #2 valid run on device rank=0 [2024-03-08 04:30:31,077] INFO: Initiating epoch #2 valid run on device rank=0 [2024-03-08 05:06:57,265] INFO: Rank 0: epoch=2 / 30 train_loss=76.2726 valid_loss=75.7938 stale=0 time=437.19m eta=12916.2m [2024-03-08 05:06:57,265] INFO: Rank 0: epoch=2 / 30 train_loss=76.2726 valid_loss=75.7938 stale=0 time=437.19m eta=12916.2m [2024-03-08 05:06:58,285] INFO: Initiating epoch #3 train run on device rank=0 [2024-03-08 05:06:58,285] INFO: Initiating epoch #3 train run on device rank=0 [2024-03-08 11:44:25,832] INFO: Initiating epoch #3 valid run on device rank=0 [2024-03-08 11:44:25,832] INFO: Initiating epoch #3 valid run on device rank=0 [2024-03-08 12:21:06,176] INFO: Rank 0: epoch=3 / 30 train_loss=75.5241 valid_loss=75.3624 stale=0 time=434.13m eta=12210.6m [2024-03-08 12:21:06,176] INFO: Rank 0: epoch=3 / 30 train_loss=75.5241 valid_loss=75.3624 stale=0 time=434.13m eta=12210.6m [2024-03-08 12:21:07,208] INFO: Initiating epoch #4 train run on device rank=0 [2024-03-08 12:21:07,208] INFO: Initiating epoch #4 train run on device rank=0 [2024-03-08 18:57:31,616] INFO: Initiating epoch #4 valid run on device rank=0 [2024-03-08 18:57:31,616] INFO: Initiating epoch #4 valid run on device rank=0 [2024-03-08 19:33:54,798] INFO: Rank 0: epoch=4 / 30 train_loss=75.1428 valid_loss=76.1876 stale=1 time=432.79m eta=11632.0m [2024-03-08 19:33:54,798] INFO: Rank 0: epoch=4 / 30 train_loss=75.1428 valid_loss=76.1876 stale=1 time=432.79m eta=11632.0m [2024-03-08 19:33:56,680] INFO: Initiating epoch #5 train run on device rank=0 [2024-03-08 19:33:56,680] INFO: Initiating epoch #5 train run on device rank=0 [2024-03-09 02:19:52,967] INFO: Initiating epoch #5 valid run on device rank=0 [2024-03-09 02:19:52,967] INFO: Initiating epoch #5 valid run on device rank=0 [2024-03-09 02:56:29,168] INFO: Rank 0: epoch=5 / 30 train_loss=74.9303 valid_loss=76.0564 stale=2 time=442.54m eta=11160.6m [2024-03-09 02:56:29,168] INFO: Rank 0: epoch=5 / 30 train_loss=74.9303 valid_loss=76.0564 stale=2 time=442.54m eta=11160.6m [2024-03-09 02:56:30,712] INFO: Initiating epoch #6 train run on device rank=0 [2024-03-09 02:56:30,712] INFO: Initiating epoch #6 train run on device rank=0 [2024-03-09 09:32:01,979] INFO: Initiating epoch #6 valid run on device rank=0 [2024-03-09 09:32:01,979] INFO: Initiating epoch #6 valid run on device rank=0 [2024-03-09 10:12:35,821] INFO: Rank 0: epoch=6 / 30 train_loss=74.8482 valid_loss=77.0017 stale=3 time=436.09m eta=10672.9m [2024-03-09 10:12:35,821] INFO: Rank 0: epoch=6 / 30 train_loss=74.8482 valid_loss=77.0017 stale=3 time=436.09m eta=10672.9m [2024-03-09 10:12:36,531] INFO: Initiating epoch #7 train run on device rank=0 [2024-03-09 10:12:36,531] INFO: Initiating epoch #7 train run on device rank=0 [2024-03-09 16:48:01,311] INFO: Initiating epoch #7 valid run on device rank=0 [2024-03-09 16:48:01,311] INFO: Initiating epoch #7 valid run on device rank=0 [2024-03-09 17:24:25,811] INFO: Rank 0: epoch=7 / 30 train_loss=74.7315 valid_loss=77.9831 stale=4 time=431.82m eta=10185.9m [2024-03-09 17:24:25,811] INFO: Rank 0: epoch=7 / 30 train_loss=74.7315 valid_loss=77.9831 stale=4 time=431.82m eta=10185.9m [2024-03-09 17:24:27,368] INFO: Initiating epoch #8 train run on device rank=0 [2024-03-09 17:24:27,368] INFO: Initiating epoch #8 train run on device rank=0 [2024-03-10 00:25:21,425] INFO: Initiating epoch #8 valid run on device rank=0 [2024-03-10 00:25:21,425] INFO: Initiating epoch #8 valid run on device rank=0 [2024-03-10 01:33:14,495] INFO: Rank 0: epoch=8 / 30 train_loss=74.6442 valid_loss=80.2325 stale=5 time=488.79m eta=9869.4m [2024-03-10 01:33:14,495] INFO: Rank 0: epoch=8 / 30 train_loss=74.6442 valid_loss=80.2325 stale=5 time=488.79m eta=9869.4m [2024-03-10 01:33:16,233] INFO: Initiating epoch #9 train run on device rank=0 [2024-03-10 01:33:16,233] INFO: Initiating epoch #9 train run on device rank=0