[2024-03-07 13:31:40,111] INFO: Will use torch.nn.parallel.DistributedDataParallel() and 4 gpus [2024-03-07 13:31:40,114] INFO: NVIDIA GeForce GTX 1080 Ti [2024-03-07 13:31:40,114] INFO: NVIDIA GeForce GTX 1080 Ti [2024-03-07 13:31:40,114] INFO: NVIDIA GeForce GTX 1080 Ti [2024-03-07 13:31:40,114] INFO: NVIDIA GeForce GTX 1080 Ti [2024-03-07 13:31:49,534] INFO: using attention_type=efficient [2024-03-07 13:31:49,540] INFO: using attention_type=efficient [2024-03-07 13:31:49,546] INFO: using attention_type=efficient [2024-03-07 13:31:49,553] INFO: using attention_type=efficient [2024-03-07 13:31:49,559] INFO: using attention_type=efficient [2024-03-07 13:31:49,565] INFO: using attention_type=efficient [2024-03-07 13:31:53,236] INFO: DistributedDataParallel( (module): MLPF( (nn0): Sequential( (0): Linear(in_features=42, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=256, bias=True) ) (conv_id): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (conv_reg): ModuleList( (0-2): 3 x SelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (norm0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=256, out_features=256, bias=True) (3): ELU(alpha=1.0) ) (dropout): Dropout(p=0.3, inplace=False) ) ) (nn_id): Sequential( (0): Linear(in_features=810, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=9, bias=True) ) (nn_pt): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_eta): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_sin_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_cos_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_energy): RegressionOutput( (nn): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=2, bias=True) ) ) (nn_charge): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=3, bias=True) ) (nn_probX): Sequential( (0): Linear(in_features=819, out_features=256, bias=True) (1): ELU(alpha=1.0) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.3, inplace=False) (4): Linear(in_features=256, out_features=1, bias=True) ) ) ) [2024-03-07 13:31:53,238] INFO: Trainable parameters: 4139031 [2024-03-07 13:31:53,238] INFO: Non-trainable parameters: 0 [2024-03-07 13:31:53,238] INFO: Total parameters: 4139031 [2024-03-07 13:31:53,247] INFO: Modules Trainable params Non-tranable params Trainable Parameters Non-tranable Parameters module.nn0.0.weight NaN NaN 10752.0 - module.nn0.0.bias NaN NaN 256.0 - module.nn0.2.weight NaN NaN 256.0 - module.nn0.2.bias NaN NaN 256.0 - module.nn0.4.weight NaN NaN 65536.0 - module.nn0.4.bias NaN NaN 256.0 - module.conv_id.0.mha.in_proj_weight NaN NaN 196608.0 - module.conv_id.0.mha.in_proj_bias NaN NaN 768.0 - module.conv_id.0.mha.out_proj.weight NaN NaN 65536.0 - module.conv_id.0.mha.out_proj.bias NaN NaN 256.0 - module.conv_id.0.norm0.weight NaN NaN 256.0 - module.conv_id.0.norm0.bias NaN NaN 256.0 - module.conv_id.0.norm1.weight NaN NaN 256.0 - module.conv_id.0.norm1.bias NaN NaN 256.0 - module.conv_id.0.seq.0.weight NaN NaN 65536.0 - module.conv_id.0.seq.0.bias NaN NaN 256.0 - module.conv_id.0.seq.2.weight NaN NaN 65536.0 - module.conv_id.0.seq.2.bias NaN NaN 256.0 - module.conv_id.1.mha.in_proj_weight NaN NaN 196608.0 - module.conv_id.1.mha.in_proj_bias NaN NaN 768.0 - module.conv_id.1.mha.out_proj.weight NaN NaN 65536.0 - module.conv_id.1.mha.out_proj.bias NaN NaN 256.0 - module.conv_id.1.norm0.weight NaN NaN 256.0 - module.conv_id.1.norm0.bias NaN NaN 256.0 - module.conv_id.1.norm1.weight NaN NaN 256.0 - module.conv_id.1.norm1.bias NaN NaN 256.0 - module.conv_id.1.seq.0.weight NaN NaN 65536.0 - module.conv_id.1.seq.0.bias NaN NaN 256.0 - module.conv_id.1.seq.2.weight NaN NaN 65536.0 - module.conv_id.1.seq.2.bias NaN NaN 256.0 - module.conv_id.2.mha.in_proj_weight NaN NaN 196608.0 - module.conv_id.2.mha.in_proj_bias NaN NaN 768.0 - module.conv_id.2.mha.out_proj.weight NaN NaN 65536.0 - module.conv_id.2.mha.out_proj.bias NaN NaN 256.0 - module.conv_id.2.norm0.weight NaN NaN 256.0 - module.conv_id.2.norm0.bias NaN NaN 256.0 - module.conv_id.2.norm1.weight NaN NaN 256.0 - module.conv_id.2.norm1.bias NaN NaN 256.0 - module.conv_id.2.seq.0.weight NaN NaN 65536.0 - module.conv_id.2.seq.0.bias NaN NaN 256.0 - module.conv_id.2.seq.2.weight NaN NaN 65536.0 - module.conv_id.2.seq.2.bias NaN NaN 256.0 - module.conv_reg.0.mha.in_proj_weight NaN NaN 196608.0 - module.conv_reg.0.mha.in_proj_bias NaN NaN 768.0 - module.conv_reg.0.mha.out_proj.weight NaN NaN 65536.0 - module.conv_reg.0.mha.out_proj.bias NaN NaN 256.0 - module.conv_reg.0.norm0.weight NaN NaN 256.0 - module.conv_reg.0.norm0.bias NaN NaN 256.0 - module.conv_reg.0.norm1.weight NaN NaN 256.0 - module.conv_reg.0.norm1.bias NaN NaN 256.0 - module.conv_reg.0.seq.0.weight NaN NaN 65536.0 - module.conv_reg.0.seq.0.bias NaN NaN 256.0 - module.conv_reg.0.seq.2.weight NaN NaN 65536.0 - module.conv_reg.0.seq.2.bias NaN NaN 256.0 - module.conv_reg.1.mha.in_proj_weight NaN NaN 196608.0 - module.conv_reg.1.mha.in_proj_bias NaN NaN 768.0 - module.conv_reg.1.mha.out_proj.weight NaN NaN 65536.0 - module.conv_reg.1.mha.out_proj.bias NaN NaN 256.0 - module.conv_reg.1.norm0.weight NaN NaN 256.0 - module.conv_reg.1.norm0.bias NaN NaN 256.0 - module.conv_reg.1.norm1.weight NaN NaN 256.0 - module.conv_reg.1.norm1.bias NaN NaN 256.0 - module.conv_reg.1.seq.0.weight NaN NaN 65536.0 - module.conv_reg.1.seq.0.bias NaN NaN 256.0 - module.conv_reg.1.seq.2.weight NaN NaN 65536.0 - module.conv_reg.1.seq.2.bias NaN NaN 256.0 - module.conv_reg.2.mha.in_proj_weight NaN NaN 196608.0 - module.conv_reg.2.mha.in_proj_bias NaN NaN 768.0 - module.conv_reg.2.mha.out_proj.weight NaN NaN 65536.0 - module.conv_reg.2.mha.out_proj.bias NaN NaN 256.0 - module.conv_reg.2.norm0.weight NaN NaN 256.0 - module.conv_reg.2.norm0.bias NaN NaN 256.0 - module.conv_reg.2.norm1.weight NaN NaN 256.0 - module.conv_reg.2.norm1.bias NaN NaN 256.0 - module.conv_reg.2.seq.0.weight NaN NaN 65536.0 - module.conv_reg.2.seq.0.bias NaN NaN 256.0 - module.conv_reg.2.seq.2.weight NaN NaN 65536.0 - module.conv_reg.2.seq.2.bias NaN NaN 256.0 - module.nn_id.0.weight NaN NaN 207360.0 - module.nn_id.0.bias NaN NaN 256.0 - module.nn_id.2.weight NaN NaN 256.0 - module.nn_id.2.bias NaN NaN 256.0 - module.nn_id.4.weight NaN NaN 2304.0 - module.nn_id.4.bias NaN NaN 9.0 - module.nn_pt.nn.0.weight NaN NaN 209664.0 - module.nn_pt.nn.0.bias NaN NaN 256.0 - module.nn_pt.nn.2.weight NaN NaN 256.0 - module.nn_pt.nn.2.bias NaN NaN 256.0 - module.nn_pt.nn.4.weight NaN NaN 512.0 - module.nn_pt.nn.4.bias NaN NaN 2.0 - module.nn_eta.nn.0.weight NaN NaN 209664.0 - module.nn_eta.nn.0.bias NaN NaN 256.0 - module.nn_eta.nn.2.weight NaN NaN 256.0 - module.nn_eta.nn.2.bias NaN NaN 256.0 - module.nn_eta.nn.4.weight NaN NaN 512.0 - module.nn_eta.nn.4.bias NaN NaN 2.0 - module.nn_sin_phi.nn.0.weight NaN NaN 209664.0 - module.nn_sin_phi.nn.0.bias NaN NaN 256.0 - module.nn_sin_phi.nn.2.weight NaN NaN 256.0 - module.nn_sin_phi.nn.2.bias NaN NaN 256.0 - module.nn_sin_phi.nn.4.weight NaN NaN 512.0 - module.nn_sin_phi.nn.4.bias NaN NaN 2.0 - module.nn_cos_phi.nn.0.weight NaN NaN 209664.0 - module.nn_cos_phi.nn.0.bias NaN NaN 256.0 - module.nn_cos_phi.nn.2.weight NaN NaN 256.0 - module.nn_cos_phi.nn.2.bias NaN NaN 256.0 - module.nn_cos_phi.nn.4.weight NaN NaN 512.0 - module.nn_cos_phi.nn.4.bias NaN NaN 2.0 - module.nn_energy.nn.0.weight NaN NaN 209664.0 - module.nn_energy.nn.0.bias NaN NaN 256.0 - module.nn_energy.nn.2.weight NaN NaN 256.0 - module.nn_energy.nn.2.bias NaN NaN 256.0 - module.nn_energy.nn.4.weight NaN NaN 512.0 - module.nn_energy.nn.4.bias NaN NaN 2.0 - module.nn_charge.0.weight NaN NaN 209664.0 - module.nn_charge.0.bias NaN NaN 256.0 - module.nn_charge.2.weight NaN NaN 256.0 - module.nn_charge.2.bias NaN NaN 256.0 - module.nn_charge.4.weight NaN NaN 768.0 - module.nn_charge.4.bias NaN NaN 3.0 - module.nn_probX.0.weight NaN NaN 209664.0 - module.nn_probX.0.bias NaN NaN 256.0 - module.nn_probX.2.weight NaN NaN 256.0 - module.nn_probX.2.bias NaN NaN 256.0 - module.nn_probX.4.weight NaN NaN 256.0 - module.nn_probX.4.bias NaN NaN 1.0 - [2024-03-07 13:31:53,261] INFO: Creating experiment dir /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_133005_508651 [2024-03-07 13:31:53,261] INFO: Model directory /pfvol/experiments/MLPF_cms_Transformer_MET_Truepyg-cms-small_20240307_133005_508651 [2024-03-07 13:31:53,602] INFO: train_dataset: cms_pf_ttbar, 80000 [2024-03-07 13:31:53,870] INFO: train_dataset: cms_pf_qcd, 80000 [2024-03-07 13:31:53,949] INFO: valid_dataset: cms_pf_ttbar, 20000 [2024-03-07 13:31:53,993] INFO: valid_dataset: cms_pf_qcd, 20000 [2024-03-07 13:31:54,044] INFO: Initiating epoch #1 train run on device rank=0 [2024-03-07 20:17:39,230] INFO: Initiating epoch #1 valid run on device rank=0 [2024-03-07 20:56:42,921] INFO: Rank 0: epoch=1 / 30 train_loss=88.2472 valid_loss=85.1619 stale=0 time=444.81m eta=12899.6m [2024-03-07 20:56:43,054] INFO: Initiating epoch #2 train run on device rank=0 [2024-03-08 03:40:51,770] INFO: Initiating epoch #2 valid run on device rank=0 [2024-03-08 04:15:00,008] INFO: Rank 0: epoch=2 / 30 train_loss=84.4063 valid_loss=84.7685 stale=0 time=438.28m eta=12363.4m [2024-03-08 04:15:02,161] INFO: Initiating epoch #3 train run on device rank=0 [2024-03-08 10:57:42,456] INFO: Initiating epoch #3 valid run on device rank=0 [2024-03-08 11:27:52,837] INFO: Rank 0: epoch=3 / 30 train_loss=84.1642 valid_loss=84.7402 stale=0 time=432.84m eta=11843.8m [2024-03-08 11:27:54,340] INFO: Initiating epoch #4 train run on device rank=0 [2024-03-08 18:09:22,770] INFO: Initiating epoch #4 valid run on device rank=0 [2024-03-08 18:44:09,099] INFO: Rank 0: epoch=4 / 30 train_loss=84.1313 valid_loss=84.5474 stale=0 time=436.25m eta=11389.6m [2024-03-08 18:44:10,497] INFO: Initiating epoch #5 train run on device rank=0 [2024-03-09 01:38:12,887] INFO: Initiating epoch #5 valid run on device rank=0 [2024-03-09 02:14:51,326] INFO: Rank 0: epoch=5 / 30 train_loss=84.0361 valid_loss=84.3003 stale=0 time=450.68m eta=11014.8m [2024-03-09 02:14:53,185] INFO: Initiating epoch #6 train run on device rank=0 [2024-03-09 08:57:53,944] INFO: Initiating epoch #6 valid run on device rank=0 [2024-03-09 09:33:57,518] INFO: Rank 0: epoch=6 / 30 train_loss=83.9300 valid_loss=84.2394 stale=0 time=439.07m eta=10568.2m [2024-03-09 09:33:58,100] INFO: Initiating epoch #7 train run on device rank=0 [2024-03-09 16:19:53,997] INFO: Initiating epoch #7 valid run on device rank=0 [2024-03-09 16:58:09,280] INFO: Rank 0: epoch=7 / 30 train_loss=83.9154 valid_loss=84.3405 stale=1 time=444.19m eta=10140.5m [2024-03-09 16:58:10,720] INFO: Initiating epoch #8 train run on device rank=0 [2024-03-09 23:46:44,098] INFO: Initiating epoch #8 valid run on device rank=0 [2024-03-10 00:25:31,357] INFO: Rank 0: epoch=8 / 30 train_loss=83.8310 valid_loss=83.8502 stale=0 time=447.34m eta=9717.5m [2024-03-10 00:25:33,149] INFO: Initiating epoch #9 train run on device rank=0 [2024-03-10 07:07:20,642] INFO: Initiating epoch #9 valid run on device rank=0 [2024-03-10 07:40:49,765] INFO: Rank 0: epoch=9 / 30 train_loss=83.7263 valid_loss=83.7617 stale=0 time=435.28m eta=9260.8m [2024-03-10 07:40:52,267] INFO: Initiating epoch #10 train run on device rank=0