New access arrangements to EMI datasets (retirement of anonymous FTP)

  • Discussion is sticky
  • 370 Views
  • Last post 09 November 2018
Matthew Keir posted this 16 October 2018

The old anonymous FTP access to the EMI datasets has been replaced by REST-based access to Microsoft Azure storage.  The existing FTP access and testing sites will be disestablished on 8 November 2018. Please ensure you have adjusted any scripts you rely on before then. 

These options are being provided following the testing and feedback received here and via email. Thank you to those users who participated in testing and provided feedback.

Three methods to access EMI datasets are supported

1. Web-based access via www.emi.ea.govt.nz

You can still use the EMI website to browse content as you always have. The main menu categories have a 'datasets' option in the drop-down that lets users browse folder content and download files one at a time. Each folder will usually be supported by a short paragraph describing its content.

This method is best for ad-hoc downloads of a few files every now and again and provides the most descriptive information to support the data.

2. Access via a storage client (Azure Storage Explorer) 

If you’d like to browse through the datasets as you’ve previously done with an FTP client, you can now use Azure Storage Explorer and connect to EMI datasets using this Shared Access Signature URI: 

https://emidatasets.blob.core.windows.net/publicdata?sv=2018-03-28&si=exp2019-12-31&sr=c&sig=RgEr3fnUCRgCg%2FGc%2BYus0OJHXpWQBZvUPpIDxsOtJQE%3D

This method is best if you are used to using a client or want to manually download whole folders or a large set of files.

Figure 1: Add the URI above in the connect dialogue box

The access token included in the URI above does expire and will be updated in the future. The expiry date is denoted as part of the URI “si=exp2019-12-31” we will update users by posting on the forum prior to this expiry date.

3. Programmatic access

If you want to automate access to the datasets, or you already have scripts that do this, they will need adjusting. The BLOB storage endpoint to use is the same as above or you can just directly access the storage container via https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list.

This method is best if you want to set up scripts to automatically download files into your own system.

The date modified of files can be used to find the latest files. This process will normally be fine, although some intermediate files may get refreshed multiple times. In addition, we may infrequently regenerate entire sets of files as we improve the data quality and align formats. We’ll aim to notify you on this forum of any such changes ahead of time.

Example 1: Access more list information

Each list request will contain a maximum of 5000 blobs. If there are more than 5000 blobs a marker value is included in the ‘NextMarker’ element at the end of the XML response. To return the next set of results, pass the value returned in the NextMarker tag as the marker parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&marker=2!144!MDAwMDY0IURhdGFzZXRzL1dob2xlc2FsZS9CaWRzQW5kT2ZmZXJzL09mZmVycy8yMDE2LzIwMTYxMjEwX09mZmVycy5jc3YhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh

Example 2: Access to a folder

Alternatively, you can narrow your search to a specific folder by using the 'prefix' parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/BidsAndOffers/Bids/2018

Example 3: Access to a file

https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/BidsAndOffers/Bids/2018/20181015_Bids.csv

Further assistance is available via the links below:

REST reference to access AZURE BLOB storage: https://docs.microsoft.com/en-us/rest/api/storageservices/Blob-Service-REST-API 

The following articles offer specific guidance in your language of choice:

Please note that the testing sites discussed in the previous post will be removed on 8 November 2018 along with the FTP access.

Order by: Standard | Newest | Votes
Josh Smith posted this 22 October 2018

Hi Matthew,

Just wondering if or when the SPD raw case files will be moved over to the Azure Storage platform?

I can't see any case files when I go to this XML page "https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list"

Thanks in advance!

Josh

Matthew Keir posted this 24 October 2018

Hi Josh,

Did you check the right location? I see you had a discussion with Phil earlier about the change for case files (https://www.emi.ea.govt.nz/Forum/thread/final-pricing-spd-case-files-and-vspd-gdx-files-location-about-to-change/)

In the Azure storage:

Yesterday's file is: https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/FinalPricing/CaseFiles/2018/MSS_211112018101100014_0X.ZIP

The whole folder is: https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/FinalPricing/CaseFiles/2018

We're still experiencing occasional issues with the latest modified date being overwritten. We should have this sorted soon.

Cheers,

Matthew

 

davidw posted this 05 November 2018

Hi Matthew,

I'm downloading the full XML index (https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list), and then parsing that to get the addresses of the files I want, but it doesn't seem to include all files. It's mostly files of the form "HydrologicalModellingDataset", "BidsAndOffers" or "AncillaryServices".

Any idea why that might be? Is there, perhaps, a limit on how many results get returned when looking at the whole structure? Or am I doing something stupid?

Cheers,

Dave

Edit - I've done a workaround by getting the XML file for each specific folder, and that seems to come under the record limit. But I'll leave this comment here in case anyone else had the same question.

Matthew Keir posted this 05 November 2018

Hi Dave,

That was news to me too. Yes, you are correct - 5000 blobs per list request. You'll need to use the 'NextMarker' element at the end of the XML in the first list in your next list request URI. I've edited the original post to include a brief description so all the info for users is in one place.

More info is in the REST reference to access AZURE BLOB storage available via the link in the post.

Hope that helps.

Matthew

davidw posted this 09 November 2018

Thanks. I'll also keep an eye out for the NextMarker element.

Close