Migrate from a CSV to content entities with Paragraphs

This article will explain how to use migration templates with a CSV that contains Paragraphs data on several lines.

For Paragraphs we could have this first structure, inline: this case is covered by this excellent article Migration of CSV Data into Paragraphs.

ID	Host entity title	Paragraph1 field1	Paragraph1 field2	Paragraph2 field1	Paragraph2 field2
1	Jimi Hendrix	Axis: Bold as Love	https://www.deezer.com/fr/album/454044	Live At The Fillmore East	https://www.deezer.com/fr/album/454045
2	The Doors	Strange Days	https://www.deezer.com/fr/album/340880	L.A. Woman	https://www.deezer.com/fr/album/6415260

For our case, we will assume that our Paragraphs information are separated on several lines, so the structure is more looking like that:

ID	Host entity title	Paragraph field 1	Paragraph field 2
1	Jimi Hendrix	Axis: Bold as Love	https://www.deezer.com/fr/album/454044
2	Jimi Hendrix	Live At The Fillmore East	https://www.deezer.com/fr/album/454045
3	The Doors	Strange Days	https://www.deezer.com/fr/album/340880
4	The Doors	L.A. Woman	https://www.deezer.com/fr/album/6415260

We may say that the first structure seems ok to cover most use cases, but if we extend the discography example with more Albums or with Tracks migration, it could not fit so well. The second one will be more readable, especially if this list needs a round of manual edit/review before import.

We assume here we want to add a list of Albums with Tracks.

So our CSV file looks like:

id,album_title,track_title,track_url
1,Axis: Bold As Love,Exp,https://www.deezer.com/fr/track/4952828
2,Axis: Bold As Love,Up From The Skies,https://www.deezer.com/fr/track/4952829
3,Axis: Bold As Love,Spanish Castle Magic,https://www.deezer.com/fr/track/4952830
4,Axis: Bold As Love,Wait Until Tomorrow,https://www.deezer.com/fr/track/4952832
5,Axis: Bold As Love,Aint No Telling,https://www.deezer.com/fr/track/4952831
...

And we have this Drupal model:

Album media

Track (Paragraphs)
Name
(...)

Track paragraph

Link
Title
(...)

First thought: we might use a custom process plugin. This is not the best approach here because the migration will happen in two steps: first, the Tracks paragraphs then the Albums media.
So, it might lead to a second file creation, for the Albums, and we want to avoid this.

Second approach: re-use the same CSV for the Albums, but transform it with a data parser.

We will still use the Migrate Source CSV module to create the Tracks Paragraphs in a first template, as the original structure perfectly matches our use case.

migrate_plus.migration.track_paragraphs.yml

id: track_paragraphs
label: Track Paragraphs
migration_group: discography

source:
  plugin: csv
  path: modules/custom/migrate_discography/data/album_tracks.csv
  header_row_count: 1
  keys:
    - id

process:
  field_title: track_title
  field_link:
    plugin: urlencode
    source: track_url

destination:
  plugin: entity_reference_revisions:paragraph
  default_bundle: track

migration_dependencies:
  required: {}
  optional: {}

dependencies:
  enforced:
    module:
      - migrate_discography

Then, with a data parser, we will

Dedupe the entity id's to create one Media per album id
Change the structure so we can provide associative arrays to match what the Migrate Plus template expects.

We will extend the JSON data parser from Migrate Plus for that.

migrate_plus.migration.album_media.yml

id: album_media
label: Album Media
migration_group: discography

source:
  plugin: url
  data_fetcher_plugin: file
  # Make use of a custom parser here, to convert the CSV
  # into associative arrays.
  data_parser_plugin: album_parser
  track_changes: true
  urls: modules/custom/migrate_discography/data/album_tracks.csv
  item_selector: /albums
  fields:
    - name: album_title
      label: Album title
      selector: album_title
    - # This field does not exist as is in the CSV
      # and is provided by the data parser.
      name: tracks
      label: Tracks
      selector: tracks
  ids:
    album_title:
      type: string

process:
  # Media name.
  name: album_title
  # Paragraphs field.
  field_tracks:
    plugin: sub_process
    source: tracks
    process:
      temporary_ids:
        plugin: migration_lookup
        migration: track_paragraphs
        # The id is the one from the CSV,
        # used to get the right paragraph.
        source: id
      target_id:
        plugin: extract
        source: '@temporary_ids'
        index:
          - 0
      target_revision_id:
        plugin: extract
        source: '@temporary_ids'
        index:
          - 1

destination:
  plugin: entity:media
  default_bundle: album

migration_dependencies:
  required:
    - track_paragraphs
  optional: {}

dependencies:
  enforced:
    module:
      - migrate_discography

AlbumParser.php

<?php

namespace Drupal\migrate_discography\Plugin\migrate_plus\data_parser;

use Drupal\migrate_plus\Plugin\migrate_plus\data_parser\Json;

/**
 * Builds relations between Albums and Tracks
 * and dedupes Album entities from a flat CSV.
 * Then delegates to the Json data parser for the selectors.
 *
 * @DataParser(
 *   id = "album_parser",
 *   title = @Translation("Album parser")
 * )
 */
class AlbumParser extends Json {

  /**
   * {@inheritdoc}
   */
  protected function getSourceData($url) {
    // Get the CSV.
    $response = $this->getDataFetcherPlugin()->getResponseContent($url);
    // Convert the flat CSV into associative arrays.
    // 0 = Id
    // 1 = Album title
    // 2 = Track title
    // 3 = Track url
    $source_data = [
      'albums' => [],
    ];
    $lines = explode("\n", $response);
    // Exclude the first (header) row. Could be moved in config.
    array_shift($lines);
    $albumDetails = [];
    foreach ($lines as $line) {
      $csvLine = str_getcsv($line);
      if (!empty($csvLine[1])) {
        if (!array_key_exists($csvLine[1], $albumDetails)) {
          $albumDetails[$csvLine[1]] = [
            'album_title' => $csvLine[1],
            'tracks' => [],
          ];
        }
        $albumDetails[$csvLine[1]]['tracks'][] = [
          'id' => $csvLine[0],
        ];
      }
    }
    // In two times, to avoid key indexed results by product id.
    foreach ($albumDetails as $albumDetail) {
      $source_data['albums'][] = $albumDetail;
    }

    // Section from parent class.

    // Backwards-compatibility for depth selection.
    if (is_int($this->itemSelector)) {
      return $this->selectByDepth($source_data);
    }

    // Otherwise, we're using xpath-like selectors.
    $selectors = explode('/', trim($this->itemSelector, '/'));
    foreach ($selectors as $selector) {
      if (!empty($selector)) {
        $source_data = $source_data[$selector];
      }
    }
    return $source_data;
  }

}

Then we can check the status.

and import it

Here is the repository containing this example.