사용자 정의 데이터 세트로 미세 조정하기

모델 준비

모델 이름	유형	파라미터	다운로드	크기
InternVL2-1B	MLLM	0.9B	🤗 HF 링크	1.8 GB
InternVL2-2B	MLLM	2.2B	🤗 HF 링크	4.2 GB
InternVL2-4B	MLLM	4.2B	🤗 HF 링크	7.8 GB
InternVL2-8B	MLLM	8.1B	🤗 HF 링크	16 GB
InternVL2-26B	MLLM	25.5B	🤗 HF 링크	48 GB
InternVL2-40B	MLLM	40.1B	🤗 HF 링크	75 GB
InternVL2-Llama3-76B	MLLM	76.3B	🤗 HF 링크	143 GB

두 번째 미세 조정을 시작하기 전에 제공된 사전 학습 모델을 다운로드하십시오.

cd pretrained/
# pip install -U huggingface_hub
# OpenGVLab/InternVL2-1B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-1B --local-dir InternVL2-1B
# OpenGVLab/InternVL2-2B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-2B --local-dir InternVL2-2B
# OpenGVLab/InternVL2-4B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-4B --local-dir InternVL2-4B
# OpenGVLab/InternVL2-8B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-8B --local-dir InternVL2-8B
# OpenGVLab/InternVL2-26B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-26B --local-dir InternVL2-26B
# OpenGVLab/InternVL2-40B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-40B --local-dir InternVL2-40B
# OpenGVLab/InternVL2-Llama3-76B 다운로드
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-Llama3-76B --local-dir InternVL2-Llama3-76B

디렉토리 구조는 다음과 같습니다:

pretrained
├── InternVL2-1B
├── InternVL2-2B
├── InternVL2-4B
├── InternVL2-8B
├── InternVL2-26B
├── InternVL2-40B
└── InternVL2-Llama3-76B

사용자 정의 학습 데이터 준비

사전 학습 모델을 다운로드한 후 사용자 정의 SFT(Supervised Fine-Tuning) 데이터를 준비합니다. internvl_chat/shell/data/에 이 예와 유사한 JSON 파일을 생성합니다.

JSON 파일 형식은 다음과 같아야 합니다.

{
  "your-custom-dataset-1": {
    "root": "이미지/경로/",
    "annotation": "jsonl/주석/경로",
    "data_augment": false,
    "repeat_time": 1,
    "length": "데이터 세트의 샘플 수"
  },
  ...
}

예:

{
  "sharegpt4v_instruct_gpt4-vision_cap100k": {
    "root": "playground/data/",
    "annotation": "playground/opensource/sharegpt4v_instruct_gpt4-vision_cap100k.jsonl",
    "data_augment": false,
    "repeat_time": 1,
    "length": 102025
  }
}

각 특정 JSONL(예: 일반 텍스트 데이터, 단일 이미지 데이터, 다중 이미지 데이터, 비디오 데이터)의 형식은 이 문서에 제공된 설명에 따라 구성될 수 있습니다.

제 제안은 오픈 소스 InternVL 1.2의 일반 데이터에 새로운 도메인 특정 데이터를 추가하는 것입니다. 이를 통해 기본 기술을 유지하면서 다운스트림 기능을 향상시킬 수 있습니다. 물론 필요에 따라 새 데이터에서만 미세 조정을 선택할 수도 있습니다.

2차 미세 조정 시작

문제가 발생하면 알려주세요. 사용성을 향상시키기 위해 학습 가이드를 업데이트하겠습니다.

인용

이 프로젝트가 연구에 유용하다고 생각되면 다음을 인용해 주십시오.

@article{chen2023internvl,
  title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
  journal={arXiv preprint arXiv:2312.14238},
  year={2023}
}
@article{chen2024far,
  title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
  author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
  journal={arXiv preprint arXiv:2404.16821},
  year={2024}
}