Azure Maps in Azure Data Factory Pipelines

In today’s data-driven world, geospatial data has become increasingly vital across various industries. From optimizing logistics to making informed decisions in urban planning, geospatial data offers valuable insights. Microsoft Azure offers robust solutions for geospatial data integration, and in this blog, we’ll explore how to use Azure Data Factory Dataflows to seamlessly connect with Azure Maps and extract geospatial information like city, state, latitude, and longitude from a CSV dataset stored in an Azure Storage Account.

Understanding Azure Data Factory Dataflows

Azure Data Factory Dataflows is a powerful cloud-based data integration service that empowers organizations to create, schedule, and manage data-driven workflows. These workflows, known as Dataflows, can connect to diverse data sources, transform and enrich data, and automate data processes, making it a versatile platform for data integration.

Leveraging Azure Maps for Geospatial Insights

Azure Maps, a cloud-based mapping platform within the Microsoft Azure ecosystem, offers an array of geospatial services, including geocoding, routing, and spatial analysis. By combining Azure Data Factory Dataflows with Azure Maps, organizations can efficiently extract, transform, and utilize geospatial data, providing location-aware insights that can fuel informed decision-making.

Seamless Geospatial Data Integration Using Azure Data Factory Dataflows

To seamlessly integrate Azure Maps in your Azure Data Factory Dataflows for extracting city, state, latitude, and longitude points from your CSV data, follow these steps:

Set Up Azure Maps Resource: Begin by configuring an Azure Maps resource within your Azure subscription. This resource provides access to the geospatial services required for extracting location-based information.

To establish an Azure Maps service and acquire the necessary subscription key for authenticating your Azure Data Factory (ADF) pipeline’s access, please refer to the instructions provided in the following links:

  1. Create your Azure Maps Service.
  2. Obtain the subscription key required for your Azure Maps service account using the steps outlined here:By following these links and storing the subscription key, you’ll ensure that your ADF pipeline is properly authenticated to utilize the Azure Maps service, enabling seamless integration of geospatial data into your workflows.

Save the main identifier i.e. (Primary Key) for later use in an Azure Data Factory linked service.

Upload Sample Data To Azure Datalake Storage Accounts: Configure Azure Storage Account as Datalake to set up the source and destination of data transformation.

  1. Create a data lake storage account with the name of your choice and create a container named ‘geospatial-repo’.
  2. Create two folders ‘\Source’ and ‘\Destination’ to map source and sink datasets respectively.
  3. Upload the sample.csv file to the folder ‘\Source’

Data to be used as source dataset in azure data factory

Establish an Azure Data Factory Dataflows: If you haven’t already created an Azure Data Factory, initiate the process and configure a Dataflows activity within it.

  1. Create an Azure Data Factory pipeline named ‘geospatial-integration’.
  2. Create a mapping dataflow activity named ‘dataflow_geospatial_integration’.

ADF pipeline with dataflow activity configured.

 

Data Source Configuration: Within your Dataflows, establish a source dataset linked to the CSV data stored in your Azure Storage Account.

  1. Click on the Dataflow activity in the pipeline canvas to open its settings.
  2. Inside the Dataflow activity settings, under the “Source” section, you can configure the source dataset.
  3. Select ‘+’ under the option available for dataset within source settings to create a new dataset.
  4. Create a new dataset named ‘source’ after creating a new linked service to access data from datalake storage account.
  5. Select the container ‘geospatial-repo’ and select the file by navigating into the folder ‘Source’.
  6. Click ‘Create’ to complete the process of setting up source dataset.
  7. Turn on Dataflow debug option in dataflow activity to see the stage-by-stage data tranformation within it.

 

Creating A New Dataset In Azure Data factory

Creating a linked service and configuring the dataset to read from the Source folder.

Integration via External Call Activity: Incorporate the “External Call” activity into your Dataflows. This activity enables you to communicate with Azure Maps services through HTTP requests.

Follow the steps below to establish connection between your datafactory dataflow and Azure Maps using linked services.

  1. Click ‘+’ available right below source and search for ‘External Call’ and select the option to add external api activity as a part of dataflow activity.
  2. Select the activity, navigate to the activity panel and click ‘+ New ’ on the linked service to create a new linked service to access azure maps.
  3. Name the linked service to ‘MapsAPI’ and in place of base url paste this following URL. https://atlas.microsoft.com/search/address/json
  4. Keep Authentication Type to “Basic”, Under the field “User Name” fill the value ‘API Key’ and in place of ‘Password’ paste the subscription key that you have copied while creating Azure Maps service.

 

Configuring External Call Activity: Configure the external call activity to use address from source dataset to query Azure Maps API for geospatial details by following the steps below.

  1. Select ‘Call transformation’s settings’ tab in the panel appearing after selecting ‘External Call’ api activity.
  2. Keep ‘Request Method’ to “GET”.
  3. Add a Query parameter ‘subscription-key’ and fill the subscription key copied while creating Azure Maps inside single quotes.
  4. Add a Query parameter ‘’api-key’’ and fill value ‘1.0’ against it with single quotes.
  5. Add a Query parameter ‘query’ and fill it with value Address without quotes to dynamically pass values from ‘Address’ column in the source dataset.
  6. Navigate to the output tab inside the panel and click on ‘Import Projection’ to add the api response to the dataflow activity.

 

Extract Attributes Using Derived Column Activity: To extract attributes like latitude, longitude, City, State and Country from the api response, follow the steps below.

  1. Click ‘+’ icon below the external call activity and select ‘Derived Column’ activity from the list appearing.
  2. Add a new column ‘latitude’ and click on dataflow expression builder.
  3. Inside dataflow expression builder add toDecimal(body.results[1].position.lat) to extract latitude point
  4. Click on ‘Create New’ under Derived Columns and add a new column ‘longitude’.
  5. Against longitude fill in the expression ‘toDecimal(body.results[1].position.lon)’ to extract longitude point from the response.
  6. Add a new field ‘city’ and fill the expression ‘body.results[1].address.localName’ to extract city.
  7. Add a new field ‘state’ and fill the expression ‘body.results[1].address.countrySubdivision’ to extract state.
  8. Add a new field ‘state’ and fill the expression ‘body.results[1].address.country’ to extract country.
  9. To preview the results against each of the attribute click ‘Data Preview’ option and click ‘Refresh’ to see the results.
  10. Once configured click on ‘Save & Finish’. Go back to the dataflow activity click ‘Derived Column’ activity and click ‘Data Preview’ to view the extracted fields ‘city’, ‘state’, ‘country’, ‘latitude’ and ‘longitude’ appearing in the output.

        1. Preview images while extracting attributes from api response

 

Publish The Output To Destination Sink: To publish the output of the pipeline follow the steps below.

  1. Click ‘+’ icon below the ‘Derived Column’ activity search ‘sink’ activity and select it.
  2. Under the panel appearing for sink activity click on ‘+’ to create a new destination dataset.
  3. Follow the steps we used to create source dataset except the target dataset will be pointing towards the folder ‘Destination’ instead of ‘Source’.
  4. Navigate to the ‘Mapping’ tab, uncheck ‘Auto Mapping’ and remove attributes ‘body’ and ‘headers’.
  5. Click ‘Save’ appearing next to the top of dataflow activity.
  6. Navigate to the pipeline and select ‘Debug’ to run the pipeline without publishing it and see whether the transformed data published to Azure Storage Account under the ‘Destination’ Folder.

 

Configuring Sink Mapping settings to remove ‘body’ and ‘headers’
Running pipeline in debug mode to publish results to storage accounts.

Snapshot from storage account showing output published from data factory.
Important Note: While viewing results within dataflow activity you might experience the following error. Ignore the error since it is not going to affect the completion of pipeline run.

Ignore this error while creating dataflow activity.


Conclusion

Leveraging Azure Data Factory Dataflows for geospatial data integration with Azure Maps opens new avenues for organizations to make data-driven decisions enriched with location-aware insights. This integration streamlines the extraction of geospatial data from a CSV dataset and ensures that organizations can harness this valuable information to optimize operations, enhance customer experiences, and gain a competitive edge. By combining the capabilities of Azure Data Factory Dataflows and Azure Maps, you can unlock the full potential of your geospatial data and empower your organization to thrive in today’s data-centric landscape.

Connect With Us!