コンテンツにスキップ

Bundle QE+Allegro+LAMMPS E2E PASS(A100x4)

A100x4環境で実施したE2E検証の証跡です。Proof PackとSHA256を公開します。

FINAL_REPORT_20260226_154014.md(全文) # FINAL REPORT (20260226_154014) ## 1) Summary - E2E status: **PASS** (QE + Allegro + LAMMPS+Allegro) at TS=20260226_154014. - Environment: A100x4 host, bundle runbook under `tools/sg-bundle-qe-allegro-lammps` with isolated `SG_PREFIX` flow. - Proven: install -> verify(E2E) -> proofpack generation succeeds, including deployed-model-backed LAMMPS Allegro short MD. ## 2) Reproduction Commands (Install/Verify/Remove) typical isolated execution:
cd /home/dl/work/servergear-gpu-runbook/tools/sg-bundle-qe-allegro-lammps
export SG_PREFIX=/opt/sg/bundles/qe-allegro-lammps/20260226
./sg-install-bundle-qe-allegro-lammps
./sg-verify-bundle-qe-allegro-lammps
./sg-remove-bundle-qe-allegro-lammps
## 3) KPI (from `out/kpi/e2e_metrics.csv`)
ts,step,status,wall_sec,notes
20260226_154014,qe_single,PASS,2,
20260226_154014,allegro_infer,PASS,8,
20260226_154014,lammps_short,PASS,12,
## 4) Evidence Logs - `tools/sg-bundle-qe-allegro-lammps/out/logs/20260226_154014/qe_single.log` - `tools/sg-bundle-qe-allegro-lammps/out/logs/20260226_154014/allegro_infer.log` - `tools/sg-bundle-qe-allegro-lammps/out/logs/20260226_154014/lammps_short.log` ### qe_single.log (last 10)
   This run was terminated on:  15:40:17  26Feb2026            

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=
Warning: ieee_invalid is signaling
Warning: ieee_divide_by_zero is signaling
Warning: ieee_inexact is signaling
FORTRAN STOP
WALL 1.93
### allegro_infer.log (last 10)

### lammps_short.log (last 10)
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:           48 ave          48 max          48 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 48
Ave neighs/atom = 12
Neighbor list builds = 0
Dangerous builds = 0
Total wall time: 0:00:04
## 5) Change Diff ### git diff --stat (relevant files)
 tools/sg-lammps-allegro/sg-fetch-model | 91 ++++++++++++++++++++++++++++++++--
 1 file changed, 86 insertions(+), 5 deletions(-)
### Key points in `tools/sg-lammps-allegro/sg-fetch-model` - Added deployed-model path for LAMMPS Allegro workflow (not raw training checkpoint only). - Ensured compile/deploy path targets LAMMPS pair-allegro compatibility. - Injected/handled required TorchScript metadata (`_extra_files`) for LAMMPS loader expectations.
33:REQUIRE_PAIR_ALLEGRO="${REQUIRE_PAIR_ALLEGRO:-1}"
34:if [[ "$REQUIRE_PAIR_ALLEGRO" != "1" ]] && [[ -f "${HOME}/.cache/sg-allegro/compiled_allegro.nequip.pth" ]]; then
42:# 3) OAM zipを落として compile(ホストにnequip-compileが必要)
44:if [[ -x "${PREFIX_ROOT}/allegro/venv/bin/nequip-compile" ]]; then
45:  NEQUIP_COMPILE="${PREFIX_ROOT}/allegro/venv/bin/nequip-compile"
46:elif [[ -x "/opt/sg/allegro/venv/bin/nequip-compile" ]]; then
47:  NEQUIP_COMPILE="/opt/sg/allegro/venv/bin/nequip-compile"
48:elif command -v nequip-compile >/dev/null 2>&1; then
49:  NEQUIP_COMPILE="$(command -v nequip-compile)"
53:  echo "[ERROR] No MODEL_URL, no bundle model, no STK-015 cache, and no nequip-compile." >&2
84:echo "[compile] $NEQUIP_COMPILE $OAM_ZIP -> $OUT_MODEL (device=$DEV, target=pair_allegro)"
85:"$NEQUIP_COMPILE" "$OAM_ZIP" "$OUT_MODEL" --device "$DEV" --mode torchscript --target pair_allegro
87:# LAMMPS pair_allegro checks metadata keys embedded in TorchScript extra files.
116:type_names = " ".join(sym for _, sym in pairs)
117:r_max = "7"
119:    m = re.search(r"\br_max:\s*([0-9.]+)", line)
121:        r_max = m.group(1)
135:    "nequip_version": nequip_ver,
136:    "r_max": str(r_max),
137:    "n_species": str(len(pairs) if pairs else 0),
138:    "type_names": type_names,
145:torch.jit.save(model, model_path, _extra_files=extra)
## 6) Proofpack - zip: `tools/sg-bundle-qe-allegro-lammps/out/proofpack/proof_bundle_qe_allegro_lammps_20260226_154014.zip` - sha256: `c059b380e72f4cec04e892163b4d90aadd2b962fdb1c6120bc894b2d12951db3` Verification command:
sha256sum tools/sg-bundle-qe-allegro-lammps/out/proofpack/proof_bundle_qe_allegro_lammps_20260226_154014.zip
## 7) Known Caveats - LAMMPS+Allegro requires a **deployed** NequIP/Allegro model; raw .pth can fail with "did you forget to run nequip-deploy?". - Deployed model artifacts are produced in bundle output model directories (e.g. `out/models//deployed/`). - If model metadata is missing/incompatible, LAMMPS pair_allegro can fail even when `pair_style allegro*` itself is recognized.