Abstract:GenAI technologies rely heavily on large-scale, diverse training datasets; however, the traditional copyright framework exhibits significant limitations when dealing with GenAI training data. On the one hand, the fair use regime is difficult to apply due to the complexity of purposes involved in using massive datasets and the ambiguity of copyright ownership; on the other hand, statutory licensing is hard to implement because of high licensing costs, inefficient regulation, and the dispersed nature of right holders. Accordingly, a "no-permission-needed with opt-out" copyright use mechanism could be adopted to optimize training data sourcing and the protection of authors'' rights; remuneration schemes for data use could be refined based on GenAI application scenarios, and new benefit-sharing models - such as data-sharing pools - could be explored; and output-side oversight should be strengthened to improve the quality of AIGC, thereby constructing a copyright regulation paradigm for training data in line with the digital era.