This page was exported from Free Exam Dumps Collection [ http://free.examcollectionpass.com ] Export date:Wed Oct 23 15:34:39 2024 / +0000 GMT ___________________________________________________ Title: [May 20, 2022] ExamcollectionPass DP-203 Exam Practice Test Questions (Updated 239 Questions) [Q13-Q30] --------------------------------------------------- [May 20, 2022] ExamcollectionPass DP-203 Exam Practice Test Questions (Updated 239 Questions) Pass Microsoft DP-203 Exam Info and Free Practice Test Exam DP-203: Data Engineering on Microsoft Azure Candidates for this exam should have subject matter expertise integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions. Azure Data Engineers help stakeholders understand the data through exploration, and they build and maintain secure and compliant data processing pipelines by using different tools and techniques. These professionals use various Azure data services and languages to store and produce cleansed and enhanced datasets for analysis. Azure Data Engineers also help ensure that data pipelines and data stores are high-performing, efficient, organized, and reliable, given a set of business requirements and constraints. They deal with unanticipated issues swiftly, and they minimize data loss. They also design, implement, monitor, and optimize data platforms to meet the data pipelines needs. A candidate for this exam must have strong knowledge of data processing languages such as SQL, Python, or Scala, and they need to understand parallel processing and data architecture patterns. Part of the requirements for: Microsoft Certified: Azure Data Engineer Associate Download exam skills outline Skills measured Design and implement data storage (40-45%)Design and implement data security (10-15%)Monitor and optimize data storage and data processing (10-15%)Design and develop data processing (25-30%) Schedule exam Languages: English, Chinese (Simplified), Japanese, Korean Retirement date: none This exam measures your ability to accomplish the following technical tasks: design and implement data storage; design and develop data processing; design and implement data security; and monitor and optimize data storage and data processing.   NO.13 You have an Azure Data Lake Storage Gen2 account named adls2 that is protected by a virtual network.You are designing a SQL pool in Azure Synapse that will use adls2 as a source.What should you use to authenticate to adls2?  a shared access signature (SAS)  a managed identity  a shared key  an Azure Active Directory (Azure AD) user ExplanationManaged identity for Azure resources is a feature of Azure Active Directory. The feature provides Azure services with an automatically managed identity in Azure AD. You can use the Managed Identity capability to authenticate to any service that support Azure AD authentication.Managed Identity authentication is required when your storage account is attached to a VNet.Reference:https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/quickstart-bulk-load-copy-tsql-exaNO.14 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this scenario, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values.75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an Azure SQL data warehouse.You need to prepare the files to ensure that the data copies quickly.Solution: You modify the files to ensure that each row is more than 1 MB.Does this meet the goal?  Yes  No ExplanationInstead modify the files to ensure that each row is less than 1 MB.References:https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-dataNO.15 You are designing an application that will store petabytes of medical imaging data When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.You need to select a storage strategy for the data. The solution must minimize costs.Which storage tier should you use for each time frame? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one point. ExplanationFirst week: HotHot – Optimized for storing data that is accessed frequently.After one month: CoolCool – Optimized for storing data that is infrequently accessed and stored for at least 30 days.After one year: CoolNO.16 You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named Account 1.You plan to access the files in Accoun1l by using an external table.You need to create a data source in Pool1 that you can reference when you create the external table.How should you complete the Transact-SQL statement? To answer, select the appropriate options in the answer area.NOTE Each coned selection is worth one point. See the answer below in explanation.ExplanationAnswer as belowGraphical user interface, text, application Description automatically generatedNO.17 You need to implement an Azure Databricks cluster that automatically connects to Azure Data Lake Storage Gen2 by using Azure Active Directory (Azure AD) integration.How should you configure the new cluster? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one point. References:https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.htmlNO.18 You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the following columns.You need to recommend a solution that maximum query performance.What should you include in the recommendation?  In each table, create a column as a composite of the other two columns in the table.  In each table, create an IDENTITY column.  In the tables, use a hash distribution of ArriveDateTime and ReportDateTime.  In the tables, use a hash distribution of ArriveAirPortID and AirportID. NO.19 You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications:* Contain sales data for 20,000 products.* Use hash distribution on a column named ProduclID,* Contain 2.4 billion records for the years 20l9 and 2020.Which number of partition ranges provides optimal compression and performance of the clustered columnstore index?  40  240  400  2,400 ExplanationEach partition should have around 1 millions records. Dedication SQL pools already have 60 partitions.We have the formula: Records/(Partitions*60)= 1 millionPartitions= Records/(1 million * 60)Partitions= 2.4 x 1,000,000,000/(1,000,000 * 60) = 40Note: Having too many partitions can reduce the effectiveness of clustered columnstore indexes if each partition has fewer than 1 million rows. Dedicated SQL pools automatically partition your data into 60 databases. So, if you create a table with 100 partitions, the result will be 6000 partitions.Reference:https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-poolNO.20 You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DateKey, ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.Queries that use the data warehouse take a long time to complete.You plan to migrate the solution to use Azure Synapse Analytics. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.What should you recommend? To answer, select the appropriate options in the answer area.NOTE: Each correct selection is worth one point Reference:https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distributeNO.21 You are designing an Azure Stream Analytics solution that receives instant messaging data from an Azure Event Hub.You need to ensure that the output from the Stream Analytics job counts the number of messages per time zone every 15 seconds.How should you complete the Stream Analytics query? To answer, select the appropriate options in the answer are a.NOTE: Each correct selection is worth one point. Reference:https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functionsNO.22 You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select. 1 – CREATE EXTERNAL DATA SOURCE2 – CREATE EXTERNAL FILE FORMAT3 – CREATE EXTERNAL TABLE AS SELECTReference:https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tablesNO.23 You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format.You need to modify the job to accept data generated by the IoT devices in the Protobuf format.Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Reference:https://docs.microsoft.com/en-us/azure/stream-analytics/custom-deserializerNO.24 You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.Which type of integration runtime should you use?  Azure-SSIS integration runtime  self-hosted integration runtime  Azure integration runtime NO.25 You are monitoring an Azure Stream Analytics job.The Backlogged Input Events count has been 20 for the last hour.You need to reduce the Backlogged Input Events count.What should you do?  Drop late arriving events from the job.  Add an Azure Storage account to the job.  Increase the streaming units for the job.  Stop the job. General symptoms of the job hitting system resource limits include:If the backlog event metric keeps increasing, it’s an indicator that the system resource is constrained (either because of output sink throttling, or high CPU).Note: Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric implies that your job isn’t able to keep up with the number of incoming events. If this value is slowly increasing or consistently non-zero, you should scale out your job: adjust Streaming Units.Reference:https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-scale-jobshttps://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-monitoringNO.26 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this scenario, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an Azure SQL data warehouse.You need to prepare the files to ensure that the data copies quickly.Solution: You modify the files to ensure that each row is less than 1 MB.Does this meet the goal?  Yes  No ExplanationWhen exporting data into an ORC File Format, you might get Java out-of-memory errors when there are large text columns. To work around this limitation, export only a subset of the columns.References:https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-dataNO.27 You have a C# application that process data from an Azure IoT hub and performs complex transformations.You need to replace the application with a real-time solution. The solution must reuse as much code as possible from the existing application.  Azure Databricks  Azure Event Grid  Azure Stream Analytics  Azure Data Factory Azure Stream Analytics on IoT Edge empowers developers to deploy near-real-time analytical intelligence closer to IoT devices so that they can unlock the full value of device-generated data. UDF are available in C# for IoT Edge jobs Azure Stream Analytics on IoT Edge runs within the Azure IoT Edge framework. Once the job is created in Stream Analytics, you can deploy and manage it using IoT Hub.References:https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-edgeNO.28 You are designing an Azure Synapse Analytics workspace.You need to recommend a solution to provide double encryption of all the data at rest.Which two components should you include in the recommendation? Each coned answer presents part of the solution NOTE: Each correct selection is worth one point.  an Azure key vault that has purge protection enabled  an RSA key  an X509 certificate  an Azure Policy initiative  an Azure virtual network that has a network security group (NSG) NO.29 What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?  a server-level virtual network rule  a database-level virtual network rule  a database-level firewall IP rule  a server-level firewall IP rule Virtual network rules are one firewall security feature that controls whether the database server for your single databases and elastic pool in Azure SQL Database or for your databases in SQL Data Warehouse accepts communications that are sent from particular subnets in virtual networks.Server-level, not database-level: Each virtual network rule applies to your whole Azure SQL Database server, not just to one particular database on the server. In other words, virtual network rule applies at the serverlevel, not at the database-level.References:https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overviewNO.30 You have an Azure Databricks resource.You need to log actions that relate to changes in compute for the Databricks resource.Which Databricks services should you log?  clusters  workspace  DBFS  SSHE lobs ExplanationCloud Provider Infrastructure Logs.Databricks logging allows security and admin teams to demonstrate conformance to data governance standards within or from a Databricks workspace. Customers, especially in the regulated industries, also need records on activities like:- User access control to cloud data storage- Cloud Identity and Access Management roles- User access to cloud network and compute Azure Databricks offers three distinct workloads on several VM Instances tailored for your data analytics workflow-the Jobs Compute and Jobs Light Compute workloads make it easy for data engineers to build and execute jobs, and the All-Purpose Compute workload makes it easy for data scientists to explore, visualize, manipulate, and share data and insights interactively. Loading … Pass Your Microsoft Exam with DP-203 Exam Dumps: https://www.examcollectionpass.com/Microsoft/DP-203-practice-exam-dumps.html --------------------------------------------------- Images: https://free.examcollectionpass.com/wp-content/plugins/watu/loading.gif https://free.examcollectionpass.com/wp-content/plugins/watu/loading.gif --------------------------------------------------- --------------------------------------------------- Post date: 2022-05-20 02:11:57 Post date GMT: 2022-05-20 02:11:57 Post modified date: 2022-05-20 02:11:57 Post modified date GMT: 2022-05-20 02:11:57