This page was exported from Free Exam Dumps Collection [ http://free.examcollectionpass.com ] Export date:Wed Oct 23 9:37:37 2024 / +0000 GMT ___________________________________________________ Title: Dependable Databricks-Certified-Professional-Data-Engineer Exam Dumps to Become Databricks Certified [Q17-Q33] --------------------------------------------------- Dependable Databricks-Certified-Professional-Data-Engineer Exam Dumps to Become Databricks Certified Get Ready with Databricks-Certified-Professional-Data-Engineer Exam Dumps (2023) Databricks Certified Professional Data Engineer exam is a comprehensive certification exam that assesses an individual's knowledge and skills in working with big data and cloud computing technologies. Databricks-Certified-Professional-Data-Engineer exam is designed for data professionals who are proficient in using Databricks Unified Analytics Platform for managing and analyzing large volumes of data. Databricks-Certified-Professional-Data-Engineer exam covers a broad range of topics such as data engineering, data transformation, data modeling, and machine learning.   Q17. Which of the below SQL commands create a Global temporary view?  1.CREATE OR REPLACE TEMPORARY VIEW view_name2. AS SELECT * FROM table_name  1. CREATE OR REPLACE LOCAL TEMPORARY VIEW view_name2. AS SELECT * FROM table_name  1. CREATE OR REPLACE GLOBAL TEMPORARY VIEW view_name2. AS SELECT * FROM table_name(Correct)  1.CREATE OR REPLACE VIEW view_name2. AS SELECT * FROM table_name  1. CREATE OR REPLACE LOCAL VIEW view_name2. AS SELECT * FROM table_name Explanation1. CREATE OR REPLACE GLOBAL TEMPORARY VIEW view_name2. AS SELECT * FROM table_nameThere are two types of temporary views that can be created Local and Global*A session-scoped temporary view is only available with a spark session, so another note-book in the same cluster can not access it. if a notebook is detached and reattached local temporary view is lost.*A global temporary view is available to all the notebooks in the cluster but if a cluster re-starts a global temporary view is lost.Q18. The operations team is interested in monitoring the recently launched product, team wants to set up an email alert when the number of units sold increases by more than 10,000 units. They want to monitor this every 5 mins.Fill in the below blanks to finish the steps we need to take* Create ___ query that calculates total units sold* Setup ____ with query on trigger condition Units Sold > 10,000* Setup ____ to run every 5 mins* Add destination ______  Python, Job, SQL Cluster, email address  SQL, Alert, Refresh, email address  SQL, Job, SQL Cluster, email address  SQL, Job, Refresh, email address  Python, Job, Refresh, email address ExplanationThe answer is SQL, Alert, Refresh, email addressHere the steps from Databricks documentation,Create an alertFollow these steps to create an alert on a single column of a query.1.Do one of the following:*Click Create in the sidebar and select Alert.*Click Alerts in the sidebar and click the + New Alert button.2.Search for a target query.Graphical user interface, text, application Description automatically generatedTo alert on multiple columns, you need to modify your query. See Alert on multiple col-umns.3.In the Trigger when field, configure the alert.*The Value column drop-down controls which field of your query result is evaluated.*The Condition drop-down controls the logical operation to be applied.*The Threshold text input is compared against the Value column using the Condition you specify.NoteIf a target query returns multiple records, Databricks SQL alerts act on the first one. As you change the Value column setting, the current value of that field in the top row is shown beneath it.4.In the When triggered, send notification field, select how many notifications are sent when your alert is triggered:*Just once: Send a notification when the alert status changes from OK to TRIGGERED.*Each time alert is evaluated: Send a notification whenever the alert status is TRIGGERED regardless of its status at the previous evaluation.*At most every: Send a notification whenever the alert status is TRIGGERED at a spe-cific interval. This choice lets you avoid notification spam for alerts that trigger of-ten.Regardless of which notification setting you choose, you receive a notification whenever the status goes from OK to TRIGGERED or from TRIGGERED to OK. The schedule settings affect how many notifications you will receive if the status remains TRIGGERED from one execution to the next. For details, see Notification frequency.5.In the Template drop-down, choose a template:*Use default template: Alert notification is a message with links to the Alert configuration screen and the Query screen.*Use custom template: Alert notification includes more specific information about the alert.a.A box displays, consisting of input fields for subject and body. Any static content is valid, and you can incorporate built-in template variables:*ALERT_STATUS: The evaluated alert status (string).*ALERT_CONDITION: The alert condition operator (string).*ALERT_THRESHOLD: The alert threshold (string or number).*ALERT_NAME: The alert name (string).*ALERT_URL: The alert page URL (string).*QUERY_NAME: The associated query name (string).*QUERY_URL: The associated query page URL (string).*QUERY_RESULT_VALUE: The query result value (string or number).*QUERY_RESULT_ROWS: The query result rows (value array).*QUERY_RESULT_COLS: The query result columns (string array).An example subject, for instance, could be: Alert “{{ALERT_NAME}}” changed status to{{ALERT_STATUS}}.b.Click the Preview toggle button to preview the rendered result.ImportantThe preview is useful for verifying that template variables are rendered cor-rectly. It is not an accurate representation of the eventual notification content, as each alert destination can display notifications differently.c.Click the Save Changes button.6.In Refresh, set a refresh schedule. An alert’s refresh schedule is independent of the query’s refresh schedule.*If the query is a Run as owner query, the query runs using the query owner’s cre-dential on the alert’s refresh schedule.*If the query is a Run as viewer query, the query runs using the alert creator’s cre-dential on the alert’s refresh schedule.7.Click Create Alert.8.Choose an alert destination.ImportantIf you skip this step you will not be notified when the alert is triggered.Q19. Which of the following python statements can be used to replace the schema name and table name in the query?  1.table_name = “sales”2.schema_name = “bronze”3.query = f”select * from schema_name.table_name”  1.table_name = “sales”2.query = “select * from {schema_name}.{table_name}”  1.table_name = “sales”2.query = f”select * from {schema_name}.{table_name}”  1.table_name = “sales”2.query = f”select * from + schema_name +”.”+table_name” ExplanationThe answer is1.table_name = “sales”2.query = f”select * from {schema_name}.{table_name}”It is always best to use f strings to replace python variables, rather than using string concatenation.Q20. Your colleague was walking you through how a job was setup, but you noticed a warning message that said,“Jobs running on all-purpose cluster are considered all purpose compute”, the colleague was not sure why he was getting the warning message, how do you best explain this warning mes-sage?  All-purpose clusters cannot be used for Job clusters, due to performance issues.  All-purpose clusters take longer to start the cluster vs a job cluster  All-purpose clusters are less expensive than the job clusters  All-purpose clusters are more expensive than the job clusters  All-purpose cluster provide interactive messages that can not be viewed in a job ExplanationWarning message:Graphical user interface, text, application, email Description automatically generatedPricing for All-purpose clusters are more expensive than the job clusters AWS pricing(Aug 15th 2022)Graphical user interface Description automatically generatedBottom of FormTop of FormQ21. An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert, update, or delete) and the values for each field after the change. The source table has a primary key identified by the fieldpk_id.For auditing purposes, the data governance team wishes to maintain a full record of all values that have ever been valid in the source system. For analytical purposes, only the most recent value for each record needs to be recorded. The Databricks job to ingest these records occurs once per hour, but each individual record may have changed multiple times over the course of an hour.Which solution meets these requirements?  Create a separate history table for each pk_id resolve the current state of the table by running a union all filtering the history tables for the most recent state.  Use merge into to insert, update, or delete the most recent entry for each pk_id into a bronze table, then propagate all changes throughout the system.  Iterate through an ordered set of changes to the table, applying each in turn; rely on Delta Lake’s versioning ability to create an audit log.  Use Delta Lake’s change data feed to automatically process CDC data from an external system, propagating all changes to all dependent tables in the Lakehouse.  Ingest all log information into a bronze table; use merge into to insert, update, or delete the most recent entry for each pk_id into a silver table to recreate the current table state. ExplanationThis is the correct answer because it meets the requirements of maintaining a full record of all values that have ever been valid in the source system and recreating the current table state with only the most recent value for each record. The code ingests all log information into a bronze table, which preserves the raw CDC data as it is. Then, it uses merge into to perform an upsert operation on a silver table, which means it will insert new records or update or delete existing records based on the change type and the pk_id columns. This way, the silver table will alwaysreflect the current state of the source table, while the bronze table will keep the history of all changes. Verified References: [Databricks Certified Data Engineer Professional], under “Delta Lake” section; Databricks Documentation, under “Upsert into a table using merge” section.Q22. A new data engineer new.engineer@company.com has been assigned to an ELT project. The new dataengineer will need full privileges on the table sales to fully manage the project.Which of the following commands can be used to grant full permissions on the table to the new data engineer?  1. GRANT ALL PRIVILEGES ON TABLE new.engineer@company.com TO sales;  1. GRANT SELECT ON TABLE sales TO new.engineer@company.com;  1. GRANT ALL PRIVILEGES ON TABLE sales TO new.engineer@company.com;  1. GRANT USAGE ON TABLE sales TO new.engineer@company.com;  1. GRANT SELECT CREATE MODIFY ON TABLE sales TO new.engineer@company.com; Q23. You are currently working on reloading customer_sales tables using the below query1. INSERT OVERWRITE customer_sales2. SELECT * FROM customers c3. INNER JOIN sales_monthly s on s.customer_id = c.customer_idAfter you ran the above command, the Marketing team quickly wanted to review the old data that was in the table. How does INSERT OVERWRITE impact the data in the customer_sales table if you want to see the previous version of the data prior to running the above statement?  Overwrites the data in the table, all historical versions of the data, you can not time travel to previous versions  Overwrites the data in the table but preserves all historical versions of the data, you can time travel to previous versions  Overwrites the current version of the data but clears all historical versions of the data, so you can not time travel to previous versions.  Appends the data to the current version, you can time travel to previous versions  By default, overwrites the data and schema, you cannot perform time travel ExplanationThe answer is, INSERT OVERWRITE Overwrites the current version of the data but preserves all historical versions of the data, you can time travel to previous versions.1.INSERT OVERWRITE customer_sales2.SELECT * FROM customers c3.INNER JOIN sales s on s.customer_id = c.customer_idLet’s just assume that this is the second time you are running the above statement, you can still query the prior version of the data using time travel, and any DML/DDL except DROP TABLE creates new PARQUET files so you can still access the previous versions of data.SQL Syntax for Time travelSELECT * FROM table_name as of [version number]with customer_sales exampleSELECT * FROM customer_sales as of 1 — previous versionSELECT * FROM customer_sales as of 2 — current versionYou see all historical changes on the table using DESCRIBE HISTORY table_name Note: the main difference between INSERT OVERWRITE and CREATE OR REPLACE TABLE(CRAS) is that CRAS can modify the schema of the table, i.e it can add new columns or change data types of existing columns. By default INSERT OVERWRITE only overwrites the data.INSERT OVERWRITE can also be used to update the schema whenspark.databricks.delta.schema.autoMerge.enabled is set true if this option is not enabled and if there is a schema mismatch command INSERT OVERWRITEwill fail.Any DML/DDL operation(except DROP TABLE) on the Delta table preserves the historical ver-sion of the data.Q24. Create a schema called bronze using location ‘/mnt/delta/bronze’, and check if the schema exists before creating.  CREATE SCHEMA IF NOT EXISTS bronze LOCATION ‘/mnt/delta/bronze’  CREATE SCHEMA bronze IF NOT EXISTS LOCATION ‘/mnt/delta/bronze’  if IS_SCHEMA(‘bronze’): CREATE SCHEMA bronze LOCATION ‘/mnt/delta/bronze’  Schema creation is not available in metastore, it can only be done in Unity catalog UI  Cannot create schema without a database Explanationhttps://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html1.CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ LOCATION schema_directory ]Q25. What is the purpose of a silver layer in Multi hop architecture?  Replaces a traditional data lake  Efficient storage and querying of full and unprocessed history of data  A schema is enforced, with data quality checks.  Refined views with aggregated data  Optimized query performance for business-critical data ExplanationThe answer is, A schema is enforced, with data quality checks.Medallion Architecture – DatabricksSilver Layer:1.Reduces data storage complexity, latency, and redundency2.Optimizes ETL throughput and analytic query performance3.Preserves grain of original data (without aggregation)4.Eliminates duplicate records5.production schema enforced6.Data quality checks, quarantine corrupt dataExam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.Sorry I had to add the watermark some people in Udemy are copying my content.Q26. Which of the following results in the creation of an external table?  CREATE TABLE transactions (id int, desc string) USING DELTA LOCATION EX-TERNAL  CREATE TABLE transactions (id int, desc string)  CREATE EXTERNAL TABLE transactions (id int, desc string)  CREATE TABLE transactions (id int, desc string) TYPE EXTERNAL  CREATE TABLE transactions (id int, desc string) LOCATION ‘/mnt/delta/transactions’ ExplanationAnswer is CREATE TABLE transactions (id int, desc string) USING DELTA LOCATION‘/mnt/delta/transactions’Anytime a table is created using Location it is considered an external table, below is the current syntax.SyntaxCREATE TABLE table_name ( column column_data_type…) USING format LOCATION “dbfs:/”Q27. Which of the following SQL commands are used to append rows to an existing delta table?  APPEND INTO DELTA table_name  APPEND INTO table_name  COPY DELTA INTO table_name  INSERT INTO table_name  UPDATE table_name ExplanationThe answer is INSERT INTO table_nameInsert adds rows to an existing table, this is very similar to add rows a traditional Database or Da-tawarehouse.Q28. Which of the following SQL statements can replace a python variable, when the notebook is set in SQL mode1.table_name = “sales”2.schema_name = “bronze”  spark.sql(f”SELECT * FROM f{schema_name.table_name}”)  spark.sql(f”SELECT * FROM {schem_name.table_name}”)  spark.sql(f”SELECT * FROM ${schema_name}.${table_name}”)  spark.sql(f”SELECT * FROM {schema_name}.{table_name}”)  spark.sql(“SELECT * FROM schema_name.table_name”) ExplanationThe answer is spark.sql(f”SELECT * FROM {schema_name}.{table_name}”)Q29. The viewupdatesrepresents an incremental batch of all newly ingested data to be inserted or updated in the customerstable.The following logic is used to process these records.Which statement describes this implementation?  The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.  The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.  The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.  The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.  The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended. ExplanationThe logic uses the MERGE INTO command to merge new records from the view updates into the table customers. The MERGE INTO command takes two arguments: a target table and a source table or view. The command also specifies a condition to match records between the target and the source, and a set of actions to perform when there is a match or not. In this case, the condition is to match records by customer_id, which is the primary key of the customers table. The actions are to update the existing record in the target with the new values from the source, and set the current_flag to false to indicate that the record is no longer current; and to insert a new record in the target with the new values from the source, and set the current_flag to true to indicate that the record is current. This means that old values are maintained but marked as no longer current and new values are inserted, which is the definition of a Type 2 table. Verified References: [Databricks Certified Data Engineer Professional], under “Delta Lake” section; Databricks Documentation, under “Merge Into (Delta Lake on Databricks)” section.Q30. Which of the following benefits does Delta Live Tables provide for ELT pipelines over standard data pipelinesthat utilize Spark and Delta Lake on Databricks?  The ability to write pipelines in Python and/or SQL  The ability to declare and maintain data table dependencies  The ability to automatically scale compute resources  The ability to access previous versions of data tables  The ability to perform batch and streaming queries Q31. A data engineer has set up a notebook to automatically process using a Job. The data engineer’s manager wantsto version control the schedule due to its complexity.Which of the following approaches can the data engineer use to obtain a version-controllable con-figuration ofthe Job’s schedule?  They can download the JSON description of the Job from the Job’s page  They can submit the Job once on an all-purpose cluster  They can link the Job to notebooks that are a part of a Databricks Repo  They can submit the Job once on a Job cluster  They can download the XML description of the Job from the Job’s page Q32. A data engineer wants to create a relational object by pulling data from two tables. The relational object mustbe used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants toavoid copying and storing physical data.Which of the following relational objects should the data engineer create?  Delta Table  View  Temporary view  Spark SQL Table  Database Q33. You are asked to create a model to predict the total number of monthly subscribers for a specific magazine.You are provided with 1 year’s worth of subscription and payment data, user demographic data, and 10 yearsworth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for buildinga predictive model for subscribers?  Linear regression  Logistic regression  Decision trees  TF-IDF  Loading … Databricks Certified Professional Data Engineer exam covers a wide range of topics, including data engineering concepts, Databricks architecture, data ingestion and processing, data storage and management, and data security. Databricks-Certified-Professional-Data-Engineer exam consists of 60 multiple-choice questions and participants have 90 minutes to complete it. Passing the exam requires a score of 70% or higher, and successful candidates receive a certificate that validates their expertise in building and managing data pipelines on the Databricks platform.   Download Exam Databricks-Certified-Professional-Data-Engineer Practice Test Questions with 100% Verified Answers: https://www.examcollectionpass.com/Databricks/Databricks-Certified-Professional-Data-Engineer-practice-exam-dumps.html --------------------------------------------------- Images: https://free.examcollectionpass.com/wp-content/plugins/watu/loading.gif https://free.examcollectionpass.com/wp-content/plugins/watu/loading.gif --------------------------------------------------- --------------------------------------------------- Post date: 2023-11-02 11:43:51 Post date GMT: 2023-11-02 11:43:51 Post modified date: 2023-11-02 11:43:51 Post modified date GMT: 2023-11-02 11:43:51